Making Sense of Social Data: Visualization of Twitter Interactions
Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and group behavior in online communities and social networks.
Michael was voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics and its application to Social CRM.He's a regular blogger on the Lithosphere's Building Community blog and previously wrote in the Analytic Science blog. You can follow him on Twitter or Google+.
Hello and welcome back. It’s been 3 weeks since I last blogged. I’ve been very busy with the other half of my life (i.e. the engineer/architect side)--busy developing a framework for social analytics and doing some architecture work for that project. And I’m still working on it, so this will be a short piece.
The Engagement Side of My Work
As the end of the year approaches, I’m reflecting on my blogging and the more external-facing side of my work (the side of my life that most of you know). Come to think of it, I’ve only been blogging for about 2 and half years and although I’d written a few blog posts earlier, I never had my own blog until about 2.5 years ago. My first official blog article (Introducing Analytic Science at Lithium) was published in May 4th, 2009. Although I don’t think of myself as very prolific, I have produced a modest 87 articles + 376 comments, and received 467 kudos (see the statistics in my profile) over the past 2.5 years.
Last time I wrapped-up Chapter 1 on Gamification. So 3 chapters have now been complied:
- My Chapter on Influencers
- My Chapter on Relationships: The R in Social CRM
- My Chapter on Gamification: From Behavior Model to Business Strategy
So, I was wondering what I should to focus on for 2012. I decided that I want to put a little more emphasis on the less known side of my work. Specifically, on data and social analytics.
The Technical Side of My work
As a teaser, I’d like to share a quick analysis of the Twitter conversation that occurred around the “The Science of Social” webcast that Paul and I gave about 3 weeks ago. Here are the details:
- The raw data: Tweets containing the #LithCast hashtag
- The goal: Get a sense of who’s talking about the webcast, who’s talking to whom, and how much
- The analysis: Threading the tweets ordered by time into conversations and connecting people who either retweeted or mentioned each other.
- The derived data: A communication graph where the edges represent retweets and mentions
- The computation: Compute the edge weight based on the communication frequency between each pair of tweeters, and the eigenvector centrality for each tweeter based on the communication graph
- The result: see figure below
This visualization is intuitive and beautiful (and yes, as a data scientist, I think patterns in data are very beautiful). I mapped the communication frequency to the color and width of the edges: Thicker red edges represent higher frequency of mentions and retweets. I mapped the size of the avatar to the eigenvector centrality (which is a measure of authority in a network much like Google’s PageRank): The bigger avatar, the more authoritative the tweeterwith respect to the webcast (#LithCast). Pretty intuitive, right?
Now, when interpreting this graph, it’s important to put the data in context. Does a bigger avatar mean more influential? Yes, but only with respect to this particular webcast. Can we say anything about the big avatar’s influence on other topics? For example, can we conclude that Paul Greenberg is an influencer in social CRM? The answer is not based on the available data. Although we know Paul is an influencer in social CRM, we cannot make that conclusion based on these data. If the raw data is all the information we have, then we simply don’t know if Paul is influential on social CRM, because we did not explicitly collect any Twitter data on social CRM. The only reason we know Paul is a social CRM influencer is because we have other information beyond the raw data I collected for this analysis.
One must be very careful when making conclusions based on data. They’d better be just based on the data! Alright, I know someone will probably ask what tool(s) I used to produce this graph.... It’s produced by NodeXL created by Marc Smith, the Chief Social Scientist of Connected Action Consulting Group and an old friend of mine.
Speaking of Lithium webcasts, there is another great one coming tomorrow: Variance in the Social Brand Experience: Timely Opportunities for Social Business Advantage. The CMO Council and our CMO (Katy Keim) will discuss what consumers want and expect from social media, and how marketers are and aren’t filling those appetites. Registration is free, so if you have the time, just tuned in Dec. 8th @ 11am PST.
OK, told you this will be a short piece. Stay tuned for more data analysis blog. And let me know if you like the idea of putting more emphasis on data and analysis for 2012. And if there is a topic that you like me to cover, feel free to let me know as well. Discussions are always welcome here. See you next time.
You must be a registered user to add a comment here. If you've already registered, please log in. If you haven't registered yet, please register and log in.