December 15, 2011

Social data is one of the many new frontiers within the spectrum of Business Intelligence. If you stop and think about the amount of data that Facebook and Twitter generate on a daily or even hourly basis it is staggering! Facebook states it have over 30,000 servers and generates 25 terabytes of log files a day!  Twitter does analytics of its data, like this example from the last Super Bowl, and considering that to date Twitter has received 7 billion tweets that consist of over 104 billion words you can only imagine the potential value of that data to a company if they want to measure social acceptance of a brand for example. You want “Big Data”? You got it!

The topic of social data fascinates me, partly because of the real-time nature that the data possess, but also because of the mere size of the data sets available. I recent;y lectured at Kent State University and in discussing the world of BI as it stood today, I also wanted to give them a glimpse of the future and show them analytics on a “non-traditional” data set. Microsoft developed an application for Excel 2010 called PowerPivot, and the Microsoft BI Team has been able to do some really amazing things with it for analyzing data. They developed a project called “Analytics for Twitter” which is basically it’s an aggregator for Twitter data that will return any mention, hashtag or general search term based on the criteria provided.  In searching around for a good example to show the students at Kent I decided to look at tweets mentioning Walt Disney World based on what I already know from their BI practices. Considering they track many of their metrics in real-time from the parks I figured that it wouldn’t be a far stretch to consider Twitter data from inside the park, resorts and surrounding area.

A note about Analytics for Twitter, it comes pre-configured to assess a “mood” score based on key words and a 10 point sliding scale. After returning the data set for all things Disney I noticed that there was a strong slant to “negative” tweets. Having been to Disney many times, I couldn’t for the life of me figure out what would be so bad that I would be compelled to tweet negatively about it. I started to parse through the data and I didn’t see anything that stands out as overly negative. Then I moved to examine the mood scale and key words and noticed that the emoticon ” : ( ” was set to a negative value and there was many Disney tweets that stated something to the effect of  ” Last day at Disney : ( “. A quick change to the mood values and everything fell in to place and looked much better. Analytically, this is a good thing based on the context in which it was said. being sad to leave somewhere verses being upset that the line for Space Mountain is too long are vastly different.

One of the biggest hurdles to analyzing social data is understanding the audience, the data source and the sentiment that is being expressed. This process will have to be somewhat manual until the “rules” for social data are defined and you can analyze the trends that you are seeing from your “data set”. Don’t forget that depending on the age of your audience, the slang and general language rules change and they will have to be constantly updated; when was the last time you heard a kid use the words “Gnarly”, “Rad”, or “Bogus”?

Social data analysis is the wave of the future as more and more consumers move to technology and cyberspace to express their opinions about everything from snack food to cars. BI Professionals who fail to expand their view into social data run the risk of being left behind.