Science of Social blog

Showing results for 
Search instead for 
Do you mean 

Making Sense of Social Data: Visualization of Twitter Interactions

By MikeW

Making Sense of Social Data: Visualization of Twitter Interactions

by Lithium Guru ‎12-07-2011 12:54 PM - edited ‎09-07-2012 04:08 PM

Dr Michael WuMichael Wu, Ph.D. is 927iC9C1FD6224627807Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and group behavior in online communities and social networks.

 

Michael was voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics and its application to Social CRM.He's a regular blogger on the Lithosphere's Building Community blog and previously wrote in the Analytic Science blog. You can follow him on Twitter or Google+.

 


 

Hello and welcome back. It’s been 3 weeks since I last blogged. I’ve been very busy with the other half of my life (i.e. the engineer/architect side)--busy developing a framework for social analytics and doing some architecture work for that project. And I’m still working on it, so this will be a short piece.

 

The Engagement Side of My Work

LithosphereBlogStat.PNGAs the end of the year approaches, I’m reflecting on my blogging and the more external-facing side of my work (the side of my life that most of you know). Come to think of it, I’ve only been blogging for about 2 and half years and although I’d written a few blog posts earlier, I never had my own blog until about 2.5 years ago. My first official blog article (Introducing Analytic Science at Lithium) was published in May 4th, 2009. Although I don’t think of myself as very prolific, I have produced a modest 87 articles + 376 comments, and received 467 kudos (see the statistics in my profile) over the past 2.5 years.

 

Last time I wrapped-up Chapter 1 on Gamification. So 3 chapters have now been complied:

  1. My Chapter on Influencers
  2. My Chapter on Relationships: The R in Social CRM
  3. My Chapter on Gamification: From Behavior Model to Business Strategy

So, I was wondering what I should to focus on for 2012. I decided that I want to put a little more emphasis on the less known side of my work. Specifically, on data and social analytics.

 

The Technical Side of My work

As a teaser, I’d like to share a quick analysis of the Twitter conversation that occurred around the “The Science of Social” webcast that Paul and I gave about 3 weeks ago. Here are the details:

  • The raw data: Tweets containing the #LithCast hashtag
  • The goal: Get a sense of who’s talking about the webcast, who’s talking to whom, and how much
  • The analysis: Threading the tweets ordered by time into conversations and connecting people who either retweeted or mentioned each other.
  • The derived data: A communication graph where the edges represent retweets and mentions
  • The computation: Compute the edge weight based on the communication frequency between each pair of tweeters, and the eigenvector centrality for each tweeter based on the communication graph
  • The result: see figure below

 twitter_SocGraph_LithCast_Wakita_resize.png

 

This visualization is intuitive and beautiful (and yes, as a data scientist, I think patterns in data are very beautiful). I mapped the communication frequency to the color and width of the edges: Thicker red edges represent higher frequency of mentions and retweets. I mapped the size of the avatar to the eigenvector centrality (which is a measure of authority in a network much like Google’s PageRank): The bigger avatar, the more authoritative the tweeterwith respect to the webcast (#LithCast). Pretty intuitive, right?

 

Now, when interpreting this graph, it’s important to put the data in context. Does a bigger avatar mean more influential? Yes, but only with respect to this particular webcast. Can we say anything about the big avatar’s influence on other topics? For example, can we conclude that Paul Greenberg is an influencer in social CRM? The answer is not based on the available data. Although we know Paul is an influencer in social CRM, we cannot make that conclusion based on these data. If the raw data is all the information we have, then we simply don’t know if Paul is influential on social CRM, because we did not explicitly collect any Twitter data on social CRM. The only reason we know Paul is a social CRM influencer is because we have other information beyond the raw data I collected for this analysis.

 

Conclusion

One must be very careful when making conclusions based on data. They’d better be just based on the data! Alright, I know someone will probably ask what tool(s) I used to produce this graph.... It’s produced by NodeXL created by Marc Smith, the Chief Social Scientist of Connected Action Consulting Group and an old friend of mine.

 

CMO_WebCast.jpg

 

Speaking of Lithium webcasts, there is another great one coming tomorrow: Variance in the Social Brand Experience: Timely Opportunities for Social Business Advantage. The CMO Council and our CMO (Katy Keim) will discuss what consumers want and expect from social media, and how marketers are and aren’t filling those appetites. Registration is free, so if you have the time, just tuned in Dec. 8th @ 11am PST.

 

OK, told you this will be a short piece. Stay tuned for more data analysis blog. And let me know if you like the idea of putting more emphasis on data and analysis for 2012. And if there is a topic that you like me to cover, feel free to let me know as well. Discussions are always welcome here. See you next time. 

 

 

Comments
by Frequent Commentator ‎12-07-2011 01:48 PM

Hi Mike!

Great to see you moved in big data direction! I wait for interesting master lessons :smileywink:

 

Why do you mean webcast tweets graph is influence detection tool?

Does it show still communication intensity given topic about? And what about more strong influnce markers like retweets (for facebook - likes)?

 

by Lithium Guru ‎12-08-2011 04:04 AM

Hello Andrei,

 

Glad to see you back again.

I will still write on the less technical topics that are more accessible to the wider audience. But I also want to write more on the technical, analytical side too. I found that people are not understanding what I do at Lithium.

 

The point is that the webcast twitter graph is NOT to be confused with a measure of influence. Just becasue eigenvector centrality is a measure of authority in one context doesn't mean it measures influence with the current context of our data. Precise what you said, it ignores other channels too, such as facebook likes.

 

A more complete model of influence can be found in the chapter I wrote on Influencers. The first 4 articles linked at the bottom talks about the model. 

 

Thanks for the comment and see you again next time.

 

by Drew Love ‎12-08-2011 05:33 AM
Thank you for writing this article Mike. It's great to see some very deep thinking being done about social media. One of the questions I've been thinking about lately in regards to social media is the anatomy of an influential tweet. What is it about tweets that causes them to be retweeted? Is it timing, repetitious resending of the original tweet, hash tag use? And how do other variables, such as follower numbers and strong ties to other influencers affect retweetability?
by Frequent Commentator ‎12-08-2011 06:29 AM

Mike, 

Glad that you glad to see me back again. 

 

I only think that it may be more interesting to separate retweets and tweets graphs to understand influence in this case. Retweet as action does still better indicate influence per se (now I speek only about Twitter-ecosystem rules). Does it matter?

by Lithium Guru ‎12-08-2011 07:24 AM

Hello Drew,

 

Thank you for taking the time to comment on my blog.

 

I'm not sure whether you have any statistics background, but the problem you described can be analyzed empirically.

 

You can simply collect tweets from, say 1000 user and separate them into tweets that has been retweeted and tweets that have not been retweeted. These can simply be indicator variables on the right hand side of the regression equation.

 

Then you can set up a covariate matrix of the variable you want to test. They may consist of the repetition count of # of times the same tweet has been sent, # of hashtags used in the tweet, # of follower of the tweeter, # of strong ties, can be inferred by the frequency of communication (e.g. @mention), etc. Timing might be a bit difficult because you might need to devide the day into several time windows and segment the tweets into those windows and use that as one of your covariate.

 

Then the rest is just turning the crank of the regression machinery. What you will get is a correlation coefficient for each covariate variable you test. Then you can compute the square of the correlation coefficient to see how much of the variance is accounted for by each of the covariate variables.

 

Hope this make sense.

Thanks for the comment and hope see you again next time. 

 

 

by Lithium Guru ‎12-08-2011 07:30 AM

Hello Andrei,

 

The retweets you mentioned is accounted for by the analysis. 

 

"The analysis: Threading the tweets ordered by time into conversations and connecting people who either retweeted or mentioned each other."

 

I collected all the tweets that mention #LithCast. But I only connect two person if they have been retweeted or mentioned by another users. So the graph you see is only the result of the retweeting and @mention actions performed by the other users. You can tweet a million tweets containing #LithCast, but if nobody retweet you or mention you in any of their tweets, then you won't even show up on the graph. If you been mentioned or RT once, you will probably be a very tiny avatar.

 

Does this make sense? The data consist of all the tweets, but the analysis accounts for the fact that whether people has been retweeted or mentioned. 

 

OK, thanks for the question and see you next time.

 

by Frequent Commentator ‎12-08-2011 01:31 PM

Mike!

 

I understand what has been done on graph! 

There are retweets AND mentions edges together.

 

I propose to devide it in 2 graphs - retweet graph and mentions graph and compare it.

In your graph is possible that Conversationalist can "win" better "place" with 5 mentions against Think Leader with 4 retweets (of him clever tweet). From post I conclude you have not wighted tweets and retweets edges (as custom variable)...

 

Does it matter? 

by Lithium Guru ‎12-08-2011 01:43 PM - edited ‎12-10-2011 04:14 PM

Hello Andrei,

 

I see. In your original question, you only mention separating tweets from the retweets.

 

OK, you can certain do that. And you can also weight the RT more (or less) depending on what you trying to accomplish. If you value conversation more then mentions should be weighted more, where as if you value information spread, you should weight the RTs more.

 

You can always do anything you want to the data. That is simply data manipulation. The more important question is whether it addresses your problem.

 

My goal is only very simple:

"The goal: Get a sense of who’s talking about the webcast, who’s talking to whom, and how much"

 

With that goal, there is no need to separate the RT from the mention.

 

If you want to achieve something else, then you can, you can even separate it by user, by time, by geographical location, by follower counts of the tweeter. Data are very versatile. If you have the skill, you can pretty much do anything with it, but that doesn't mean you should. That depends on your goal.

 

When I teach studenets, I always say "You CAN always (I really meant ALWAYS) do more with the data, but you SHOULD stop at some point."

 

Alright. Hope this addresses your question.

Thanks for asking though. And see you next time.