Data Science

What Makes It Hot?

michaelwu.jpg Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.


He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.


You can follow him on Twitter at mich8elwu.



The Lithium Network Conference (LiNC2010) starts this week (in fact, tomorrow), so I am taking a little detour from our journey through social network analysis (SNA). I will use this opportunity to address a question that I often receive about our currently beta release of the Engagement Center (EC).


The engagement center (EC) provides a number of analytics dashboards that surface important analytics about your community as well as the conversations that is happen beyond the community. One of the new pieces of functionality that is provided in the EC is the hot topics (or hot threads) widget. This widget tells you which topic of conversation is hot in the community; moreover, it assigns a hotness score to each topic, allowing you to rank them.


690i2C7D609C9E904982The question that I often get is “What makes a topic hot and what is the hotness score?” Perhaps, you would say it is something that is new, popular and heavily in demand. These are all correct! In fact the hotness score is based on two components, which I will refer to as popularity and recentness. So a topic that is either recent or popular alone is insufficient to be hot. It must score high in both popularity and recentness in order to be a hot topic.


A. Popularity

The popularity component of the hotness score consist of four factors.

  1. Rate of Participation: How fast are posts being accumulated in the thread
    Note: The absolute number of post is not important, it is the speed (or velocity) at which the post count is increasing that matters.
  2. Rate of Consumption: How frequent is the thread being viewed
    Note: Again, it is the speed (or velocity) at which page views increase rather than the absolute number of page view that matters.
  3. Total Number of Unique Participants
  4. Total Number of Kudos


B. Recentness

The recentness component of the hotness score is a down-weighting factor that attenuates the popularity depending on how recent the thread is. An older thread will be attenuated more significantly than a younger thread. So the recentness component is implemented as an exponential-decay function (don’t worry if you don’t know what this is). In essence, it depends non-linearly on four factors.

  1. Time of Last Participation: This is the post time of the last message in the thread. This gives us a rough estimate of when people stop participating and this is when the attenuation begins. We don’t want to attenuate the popularity of a thread when people are still participating.
  2. Thread Lifespan: This is the length of time between the first post and the last post in the thread. A thread with longer life span suggested that it is probably more useful, so its attenuation rate should be slower (i.e. its popularity is suppressed at a slower rate). This allows the thread to be relatively hot for a longer period of time.
  3. Rate of Participation: This is the same factor we’ve seen in the popularity component. But here it is used to keep threads that have long lifespan but low participation rate in check (e.g. a thread that is posted a year ago but have no more posting activity until now). If we simply look at this thread’s lifespan (1 year), we would’ve come to the wrong conclusion that this thread must be extremely useful. So we must include the participation rate to keep the lifespan factor more honest.
  4. Solved Thread: Finally, threads that are solved are generally more useful, so their popularity will be attenuated at a slower rate.


There are two more important ingredients that go into computing the hotness score and determining which topics are hot.


C. Hotness Depends on the Intended Interaction

What’s hot may be different depending on what the intended interaction is. For example, forums are intended for discussion, so a hot thread in a forum should have a lot of messages. Although page views and kudos do help, posts are the most important measure of a hot thread. In contrary, blogs are meant to be read, so a hot blog article is one that has lots of page view, even if there is not much comment or kudos. Similarly, idea exchange is meant for voting, so kudos count is the most important measure of a hot idea. A heavily voted (kudoed) idea should be considered hot even if there aren’t very many comments.


So how we compute the hotness score is different for forums, blogs, and idea exchanges. With this logic in place the algorithm is now ready to compute the hotness score for all topics within a community.


D. What’s Hot Depends on Everyone Else’s Scores

After each topic is assigned a hotness score, we are still left with the problem of setting a proper threshold for selecting topics that are statistically hot. Intuitively, a hot topic should be one that stands out among the crowd. This is precisely what the algorithm does. It analyzes the hotness score for all topics (based on theory of Lorenz Curve) then selects those that are significantly hotter than the norm. Therefore, in a hypothetical community where every topic receives a very high hotness score, then there is really no hot topic, because no one topic really stands out.



This is a very stringent criterion for scoring and identifying hot topics. If this algorithm flags a topic hot, then it really is statistically significantly hot, and you should really pay attention to it. Don’t let a topic go viral beyond your community without noticing it.


Alright, I hope this blog entry gave you a better understanding of what makes a topic hot and how do we compute the hotness score. I'm sure all of you must have heard of the hot topic today. If you haven't, check it out here. You wouldn't want to miss any hot topic, would you? Next week I will resume our SNA journey if there aren’t any more questions from the LiNC2010. But if you do have any questions, please don’t be shy. Let me know at LiNC2010 or leave me a comment here.





Nice post, Michael. Love how you seem to have taken every conceivable element into account, and factored it into your hotness formula!

Data Science

Hello Mike,


Thanks for the comment.


The hotness score is derived to be accurate and predictive, so I need to take into account of all variable that can potentially influence hotness. This also prevents people from gaming the system. For example, a user can certainly generatea a lot of useless posts in a thread talking to himself, but if there is no other users involve, not many kudos or views, it is probably not hot.


Few people seem to value accuracy in social media data, but if we want to use these numbers to analyze other phenomena for predictive analysis, it must be as accurate as possible. Otherwise the error propagation will amplify the error in successive stages of modeling, and eventually the model will lose all it's predictive power.



Community Management

Dr. Wu,


Awesome write up of the considerations that go into this system.   From using this function over the last three weeks, I would say that 80% of the threads surfaced are those which I'm actively involved in and would have independently flagged as being the discussions of interest / concern in the community - so the results certainly pass my "gut check".  The remaining 20% of the threads weren't misses, they were just threads that might have been on private boards and not necessarily my top of mind focus.


Frankly, It is amazing how well this  feature actually works since  the platform can presently only know about user behavior and not what they are talking about.  



Data Science

Hello Mark,


Welcome back to Lithosphere after LiNC and it was very nice to meet you there.


Thanks for the comment and the heavy weight kudo Smiley Happy  I'm glad to hear that the algorithm is working well for you and passes your gut check. All the variables or factors in the hot topics algorithms are selected for their predictive power, and combined in such way to give optimal prediction on a test data set. The statistics works out very well, because all hot threads have some stereotypic characteristics.


To your comment about the remaining 20%... I actually have a private thread indicator stored. But I thought that a hot thread is a hot thread, it should be flagged whether it is in a private board or not. But if you think otherwise, please do let us know, we have the data to separate the private threads aside. So let us know how important this is to you?


Community Management

Dr. Wu,


I'm not suggesting that we supress consideration of hot threads on private boards, just noting that sometimes the system eliminates a personal bias which might discount some discussions - and that's a good thing.   


I've personally always struggled with the degree of subjectivity that came when personally communicating about what discussions were most important in the community at various points in time, and why they were. This system provides total objectivity.   It doesn't care whether the discussion occurs on a public or private board, or whether or not it is a "known" subject or not - and that is one of the best things about it.   No, I like it as it is!


I will also continue to be interested in what new dimensions might be possible in the future given the Scout Labs technology aquisition - could qualitative content be considered - drawn from a "watch list" of subjects entered on an admin panel, or perhaps even factoring in sentiment analysis ?


You've taken some big steps here already and I'm intrigued by the possibilities of how this could evolve in the future.



Data Science

Hello Mark,


Alright, glad to know that you like the algorithm as it is.


There are definitely a lot of possibilities with the addition of Scout Labs' capability. Sentiment is definitely a feature that we like to integrate as MatthewT has demonstrated at LiNC. Personally, I am interested in document clustering and automatic topic discovery and classification. But to do that accurately is a pretty difficult problem. Although I know how to do it using Latent Dirichlet Allocation, it might not be a scalable to the whole social web. If I do tackle this problem, I know I'll have to make some approximation and tradeoff.


Thanks Dr. Wu for this topic and the rest for your comments!


I have a burning question for you - while the Engagement Centers measures "hotness" in this very apt way, does the Lithium Platform do the same?  I only see an Admin Setting for "Hot Topics Threshold" which is why I would like verification.



Data Science

Hello Brian,


Glad to be able to help.


To answer your question, I must clarify a few things first. There are several items in our app that goes by similar names as hot topics, hot idea, hot threads, etc. Do NOT confuse them, because they are all different and almost completely unrelated, depite the similar names.


I believe the hot topic threshold in the admin in our app is simply a threshold beyond which the color of the thread/icon changes color (e.g change to red for lithosphere). It is purely based on replies. It has nothing to do with the hot threads algorithm in the Engagement Center, because it does not take into account of the recentness of the topic.


We also have hot ideas in our ideas application. In that case, the ideas are scored and ranked based on kudos and recentness of the idea. So, even though it is yet a animal, it can be seen as a simplified approximation of the full hot threads algorithm that I developed (described in this blog).


There are several reason that we currently don’t apply my algorithm to all the different types of threads in our apps now:


1. My algorithm is quite computationally intensive. So computing that in real time is likely to slow down our server substantially and result in poor user experience.

2. The hot topics algorithm in the app is a coarse approximation of my algorithm. It is less sophisticated, but allows for faster computation that won’t slow down our servers.

3. We are able to do it in the Engagement Center (EC), because hot threads are computed in delay time, once every week. Not real time instantaneously.

4. We have some plans to include my algorithm in the app for all threads. It won’t be real time though. But it will be computed daily (instead of weekly as in the EC). However, this does requires some engineering to optimize the algorithm for computational load, and it is not a high priority item right now.


Alright, I hope this address at least some of your concerns.


Thanks for clarifying.  This is why we turned off the Hot Topic feature in the Admin Tool, because we felt that the "hotness" criteria for Boards was not relevant enough to actually be useful. 


I am heartened by point 4 above, that the future will see an algorithmic solution for Board Topics, although disappointed it is not currently on the roadmap for implementation. 


Might I also suggest, to lessen the computational load, the option of a checkbox to turn on/off per Board?  We might not want/need it for every one of them.

Data Science

Hello Brian,


Thanks for the response.


The Hot Topic feature may not be relevent enough to be useful for the business owner (our clients), they may still be useful for the end user. It is a simple way for the users to see what topic has been heavily discussed. Simply speaking, it's a way to entice the user to check out a topic. This is specially useful for new visitors or new members who don't know the community well. They do help a bit.


Actually restricting it to a board or category is not going to reduce the computational load much. It has to do with the way we store the messages and thread information currently. Regardless, we will need to go through every message to determine whether we need to score them or not. This is actually the time consuming part. Moreover, we have to that for all the different tables that store the different data for the hotness computation. Simply speaking, getting all the data is the heavy part. Once we get all the data, the actual computation of the hotness score is actually not bad at all.


That is why this is a big re-architecturing project that is not so simple. Believe me, I want it in the app too, but I constantly have to fight to get more analytics into the app. You are certainly welcome to voice your needs. And it will certainly help our product managers to prioritize what gets build. The trouble is that there are usually too many request and too little resources for us to build them.



Mike, thanks for the reply and the insight into the system.  


Being in a software company, I can certainly understand there are always more items on the plate than resources can attend to, that it is a balancing act of fixing bugs and offering new features that will appeal to the widest audience.

Data Science

Hello Brian,


Thank you for your understanding.


I would still suggest that you try turning ON the hot topic feature in the admin. They to give the end users a better experience and drive more traffic and activity to the more popular threads (threads that have lot of response). This could indirectly help the hot topics algorithm in the Engagement Center to pick up better signals to computing the hotness score of a thread.


Thank you for the comment and see you again. Are you coming to LiNC in May 18-20? Hope to see you there if you are going to attend.


Coming back to this interesting thread.  One issue of concern is that if you highlight Hot Topics, won't that by nature ensure their "hotness" because once someone sees and clicks on it, then it raises its rating even higher?


I guess the answer is "no" because it appears on the Hot List after it has been deemed "hot" already, right?  So when do the ""super hot" items move up off the list to allow the "newly hot" ones to surface?





Data Science

Hello Brian,


Thank for the question.


That is a well know effect in economics call externality. In general it is pretty hard to normalize for it. Some researchers even claim that you can't normalize for them. And you can't get rid of them. This kind of phenomenon happens all the time in life. The rich gets richer. In network science and analysis of the WWW, preferential attachment works just exactly the same way. That is what give rise to all the power-law distributions we see all over the place in nature.


Externality not only happens when you show some threads are hot. Merely showing that one thread is view more, will cause more people to click on it and make it hotter. It is everywhere, and we cannot completely get rid of it unless we shut down all interaction and stop the flow of information.


We did several things to reduce the effect of externality. We often take the log transform of the raw metrics to linearize the power-law distribution.


The fact that we list hot threads as HOT after they start showing signs of being hot, is not enough to get rid of the externality effect. But it does mitigate it's effect.


The problem is that externality is transient. After you see a thread is hot 10 times, will you continue to click on it just because it's show to be the hottest thread in the community 10 weeks in a row? Probably not. From all the data I've analyzed, I've not seen any thread that continue to be hot indefinitely. Every thread has a stereotypic rise and drop, much like a gamma distribution. In fact, I model hot threads with the gamma distribuiton.


How it moves off the hot list is because of the recentness factor. That will decay the hotness over time. And that is how we model the transient nature of these externality effects.


Hope this is not too technical. I've been so busy with conferences, teaching at Universities, and webcast that I didn't have much time to try to explain this in any simpler term. So I apologize in advance if this actually confuse more than clarify.


Thanks for the question and see you again next time.


No I get it, just not all the source references.  I appreciate your explanation and look forward to seeing you again soon!




Lithy Awards 2017

The winners in digital CX have been crowned!

See the winners!!