# Science of Social blog

Showing results for
Do you mean

By MikeW

## The 90-9-1 Rule in Reality

by ‎03-18-2010 03:28 PM - edited ‎09-14-2012 10:19 PM

Dr. Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.

He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.

You can follow him on Twitter at mich8elwu.

If you've ever managed a community you've probably heard of the "90-9-1 rule". If you have observed a community closely, you have probably seen it in action.

Soon after a community launches, users begin to participate, but each user participates at a different rate. The minute difference in participation levels is accentuated over time, leading to a small number of hyper-contributors in the community who produce most of the community content.

The 90-9-1 rule simply states that:

• 90% of all users are lurkers. They read, search, navigate, and observe, but don't contribute
• 9% of all users contribute occasionally
• 1% of all users participate a lot and account for most of the content in the community

But how real is this rule? Do all communities follow this rule consistently? If not, how far off is the deviation? Is the proportion really 90:9:1, or is it more like 70:25:5, or 80:19.99:0.01? Let's find out...

Lithium has accumulated over 10 years of user participation data across 200+ communities, so we can address this question empirically with rigorous statistics. Rather than complicating the issue with the lurkers, I choose to analyze only the contributors (i.e. the 9% occasional-contributors and the 1% hyper-contributors). The proportion between these two groups of participants should be 9:1 or equivalently 90:10 according to the 90-9-1 rule.

The 9-1 Part of the 90-9-1 Rule

So the 90-9-1 rule excluding the lurkers says that:

• 90% of the contributors (which is 9% of all users) are occasional-contributors.
• 10% of the contributors (which is 1% of all users) are hyper-contributors, who generate most of the community content.

What does the data tell us? On average, the top 10% of contributors (the hyper-contributors) generate 55.95% of the community content, and the rest of the 90% (the occasional-contributors) produces the remaining 44.05% of the content.

With my statistician hat, you know I can't possibly be satisfied with just the average! So I plotted the distribution of content contributed by occasional-contributors versus the hyper-contributors across all communities. The standard deviation is 13.02%.

Please note: The reason you only see 143 communities here, is because I've excluded communities that are less than 3 month old (these communities are too young that their participation dynamics are not stable enough for the analysis).

As you can see from the data, the hyper-contributors can contribute anywhere from about 30% to nearly 90% of the community content with an average of 55.95%. This is certainly a substantial percentage (considering the fact that it is generated by only 10% of the contributors), so the 90-9-1 rule "sort of" holds. But, to be rigorous, it depends on what do you mean by "most" of the community content.

If "most" meant at least 30% of the community content, then the 9-1 part of the 90-9-1 rule holds for 99.30% of our communities. If you meant at least 40% of the community content, then 89.51% of our communities satisfy this rule. But if "most" meant at least 50% of the community content, then only 65.73% of our communities are described by this rule.

Turning the Problem Around

This gives us a convenient spot to turn the problem around and look at the 90-9-1 rule from another perspective. We can define rigorously what "most" means (e.g. at least 30% of the community content), then calculate the fraction of contributors who generated these content and treat them as the hyper-contributors. We can then compare and see how far off we are from the expected ratio of 9:1.

Averaging across 143 communities, we see that if we define "most of the community content" to be "at least 30% of the total content," then the fraction of participants who contributed this amount ranges from 0.32% to 5.14% with an average of 2.73%. That means, on average, hyper-contributors consist of roughly 2.73% of the contributing population, so the remaining 97.27% of the participants are occasional contributors. And the ratio of hyper- to occasional-contributors is about 97:3, far from the expected value of 9:1.

If instead, we define "most" to be "at least 40%" of total content, then we get roughly 5.07% hyper-contributors on average across 143 communities. Now the ratio of hyper- to occasional-contributors is about 19:1, which is closer but still quite far off the expected ratio of 9:1.

If we defined "most" to be "at least 50%" of the total content, then the group that contributed this amount (which qualifies them to be hyper-contributors) is about 9.35% of the participants. This gives us a ratio that is much closer to the expected value of 9:1 on average. However, the variability is also very large. Even under this simple criterion of contributing at least 50%, the fraction of participants who contributed this amount may vary from less than 1% to ~18% of the participants. That means the ratio between hyper- and occasional-contributors may be anywhere from 99:1 to about 5:1.

So is 90-9-1 a hard and fast rule? Definitely not! Not even the 9-1 part of it. But it is certainly a great rule of thumb, when looking at or explaining community data. And it tells us that participation in communities is highly skewed and unequal, and there is a small fraction of hyper-contributors who produce a substantial amount of the community contents.

Next time I am going to start to dive deeper into the contribution level of the hyper-contributors, your community's real superusers.

by
‎03-19-2010 05:14 AM
Community Management
Expert

Dr. Wu,

Thanks for this  in-depth examination of user contributions and content across 143 communities.     This re-assures me a bit about our community, where I find that roughly 65 of the registered members have contributed at least one post, and far less than 1% have contributed a significant portion  of the content.  And, if one delves a bit further and looks at solutions or some other way to qualify the relative value of the contribution, I think the number gets smaller yet.

Looking at the top 40 contributors in our community sorted by solutions contributed, I find some variations in the proportion of posts to solutions, suggesting that some may be more effecient in some regards. Perhaps the top 14 were responsible for something like 30% of the total solutions.   Looking at the social graph, some of the top influencers may have a different position in say the top 20 than those with the most solutions or kudos.

So is volume of contribution the most dominant factor in categorizing one a "super" contributor?

How would you factor in the qualitative aspects of solutions, kudos, and perhaps, diversity of participation (engaging in discussions with the most number of unique other members ) ?

Are there other ways to be "super"?

Often, as I talk about super users in our community, I'm asked "how many", and then that leads to how we qualify / set the bar or line  above which one is deemed "super".     Of course, this naturally leads to discussion on how we could cultivate more super users whether by optimization of ranking formulas, or by other means like a more formalized recognition program around the community.

by Esteban Kolsky
‎03-19-2010 07:33 AM

Michael,

Great analysis!  and supports what  I have seen in my practice: 1) the 90-9-1 rule is not always the norm, but it is always a good rule-of-thumb for planning purposes (as you aptly point out), and 2) more importantly - each community develops its own dynamics over time and it is far more important to focus on how to support and incentive those behaviors than to stick to a specific metric as a goal.

I have seen communities where 30-40% of the content is generated by just one individual and others where it si generated jointly by the majority of the people (btw, that range 30-40% is what i use to define most of the content --- works well for me, but it is a little undefined -- i know you cannot go by that number).

This is great research, thanks a lot for doing it!

by
‎03-19-2010 09:20 AM

Good questions, Mark. I'd love to see Dr. Wu break that down in a future post. Maybe it's not 'how super you are', but 'how you are super'?

As always, enjoyed the post Dr. Wu!

by
‎03-19-2010 12:05 PM

This is exciting guys.  I am looking forward to learning more about this, because it's so true, no one community is alike, though share attributes.

Exciting times with new evaluations and studies surfacing..

Thanks Michael Wu, you are genus.

by HeatherCaldwell
‎03-19-2010 05:20 PM

Thanks for this great data.  I had heard recently that the 1/9/90 rule is more applicable in B2C communities.  Any variations of your data if we think of it from a B2B perspective?

by
‎03-19-2010 05:58 PM
Gamification
Expert
Data Science
Expert

Hello Mark,

Thanks for the comment. I think you meant 65% and missed that percent sigh right? One question for you is did you look at the data cumulatively throughout the lifetime of the community? As a scientist, I felt responsible to point out 2 issues that you might want to watch out for, because I’ve been through these pitfalls myself and do not want others to repeat my mistake.

1. Lurkers actually include people who just passively consume community content without ever registering. So 65% of the registered member contributed at least once (which I will call participants) does not mean that there are 35% lurkers. If you really count everyone who visited the community and normalize the participants to the total unique visitors, then I think the numbers could be much smaller. Maybe I will do an analysis on that.

2. This is a subject of debate, but I will point it out anyway. Some people believe that the 90-9-1 rule is NOT a cumulative measure. It is more of a point in time measure, so that measurement of participants and lurkers should be restricted to a relatively small window of time. To some extent I agree, but the problem is that there is no objective way of choosing the proper window length. And the reason that I believe this had some validity is that lurkers will eventually delurk (when s/he finds a post that spur s/he passion or have a question that s/he needs answers). This depletes the 90% of lurkers and increases the fraction of participants. But at the same time new visitors keeps coming to the community (some of these will lurk, some will participate). For the90-9-1 rule to remain true cumulatively, there need to be a very precise balance between member acquisition rate, the delurking rate, the probability of new member lurking, or participating, and the acceleration of participation for participants. Yet there is nothing to constrain these quantities to maintain that delicate balance. In fact, they are quite independent. So even if the 90-9-1 rule is true for a given point in time, they may quickly fall out of that precise balance. That is why I believe that participation metrics should be windowed. I actually did both cumulatively and windowed calculation using 15 days. 30days and 60days. And it does appear that the number comes out closer to 90:9:1 even though the variability is still very large.

Although we like to think of superusers as more than just those who post a lot, I am speaking with respect to 90-9-1 rule here and that is purely based on posting activity. In fact that is why I’ve use the term hyper-contributor instead of superuser in my post. But we can definitely incorporate kudos, accepted solutions when identifying superusers.

However, with respect to the social graph, it is meant for identifying influencers, which is different from superusers. They are quite correlated most of the time, but superusers are not always the same as influencers. Influencers are identified by social network analysis of the social graph, which is built based on who talked to who in the community. Because every communication, there is a potential for influence, identifying the important nodes in the social graph, which is really a communication network within the community, we identify influencers. And these influencers may or may not be superusers.

by
‎03-19-2010 06:17 PM
Gamification
Expert
Data Science
Expert

Hello Esteban,

Thank you for dropping by and leaving me a reassuring comment.

So it seems like that you have analyzed participation data and defined 30%-40% as “most.” So I have the same question for you as I did for Mark (See my response to Mark).
1.    Did you include all visitors to the community who never register, or were you just using the registered users.
2.    And did you analyze these metrics cumulatively, or did you use a running window and then average the metrics over all the windows over all time. If you did use a window, what was your window length?
I’m curious to see if you also find the data to be closer to 90:9:1 when you use a window.

by
‎03-19-2010 06:19 PM - edited ‎03-19-2010 06:24 PM
Gamification
Expert
Data Science
Expert

Hello MikeTD and JennyB,

Thank you for your interest. I'm glad that you enjoy the post.

From the discussion and kudos, as well as the discussion on LinkedIn group, I get a sense that people are interested in this topic. So I'd be happy to dive deeper in the next couple of blogs, and maybe even give you some behind the scene analysis that I've done.

And MikeTD, I like that "Maybe it's not "how super you are", but "how you are super" That is in fact what I try to do with user translucence in the social translucence research. I will find some opportunities to blog about that in the future.

by
‎03-19-2010 06:31 PM
Gamification
Expert
Data Science
Expert

Hello Heather,

Thanks for the comment. This prompts for another blog on the analysis of participation data under different segmentation of communities. I am already thinking the following:

1. B2B vs. B2C communities.
2. Support, vs. Marketing/Sales vs. Innovation communities.
3. Maybe even some broad coarse segmentation of industries.

4. First year vs. Older communities

Let me know what else you like to see.

by
‎03-20-2010 05:34 AM
Community Management
Expert

Dr. Wu,

Yes, I meant 65% - thanks for catching that.

Your points are well made - when I considered the 65% , it was the percentage of registered users who had posted at least once.   I agree that if one were to consider all the unregistered user sessions in the whole of the community (which would greatly increase the denominator) then the percentage of contributors and hyper contributors would shrink significantly.  I elected not to look at it that way for two reasons:

1) I wanted to restrict my consideration to those had taken the time to register an ID and log in - taking the first step to joining the community and in some way demonstrating their intent formally to do so.

2) As registered users (myself included) may visit and browse the community at times without logging in, I considered that I might be double counting these actions, and incorrectly attributing them to the pool of lurkers since there is no way to know without the person having logged in, or some complex IP address lookup, whether or not they were already a member in the community, and whether or not they had posted at least once.

I did my "back of the envelope" analysis based on a long term cumulative view of the community - all users over time.   I played a bit with the date function on the social graph tool and went back to the beginning of the community and browsed through at various points in time over several years.  I could see some of the super contributors on the graph, and a few have fallen off if they left the community, having not contributed enough to make it into the all time top 20, as seen the cumulative view.

So, through the aperture of a point in time, the percentage contributions could change quite a bit depending on the overall size of the community and the number and characteristics of those hyper-contributors.  Both views - cumulative vs point in time have their own merits.  I'm going with cumulative overall when discussing characteristics of our community.  However the point in time view can help spot emerging talent / rising stars, so keeping a 30-60-90 day view is handy as well..

Mark

by
‎03-20-2010 10:48 AM
Gamification
Expert
Data Science
Expert

Hello Mark,

Thanks again for the discussion.

Yes that is true, some register users can still visit the community without login. Did you activate the auto-login option in your community. I have that activated and everytime I come back to the community, it just automatically log me in, so that the times when I have to visit the community from another computer, that require me to login manually is actually very small fraction of the time. Maybe I should look at how many registered user actually leave this option on or turn it off.

I totally agree. Certainly that both cumulative and the point-in-time summaries have their own merit. But the subject of academic debate is whether the 90-9-1 rule pertains to the cumulative or the point-in-time. And it seems that the point-in-time data fits the 90-9-1 rule better--at least in our data. The data I presented actually uses a 60 day running window every week, even though I also analyze data using cumulative data as well as other window length. Despite that, the precise proportion of the 9:1 is still quite far off, and if I presented the cumulative data, the proportion will be even further off.

So the debate is not that whether the cumulative view has any more or less merit. It just means that if you use cumulative data, you shouldn't be surprised that they are way off from the 90-9-1 rule. And it is because the delicate balance that is required to maintain the 90:9:1 proportion is very difficult to achieve, even if they were true at any point in time, they will not be true when looking over an large window cumulatively. And since this calculation is about the 90-9-1 rule, I just like to point that out as a potential pitfall as some may not be aware of it.

by Tyr
‎06-30-2010 04:40 AM

I have to say I'm very unsure about this 90-9-1 rule. What about users who don't post much but rarely read things ever?

Its fairly common on communities to have people sign up, ask one question, get their answer and never be seen again. Or even for them just to create an account but never use it- facebook is full of such orphaned accounts.

This concentration solely on the very actives seems to me to be the only way to make sense of it, darn the 90, its just 9-1.

by
‎06-30-2010 11:09 AM - edited ‎06-30-2010 11:19 AM
Gamification
Expert
Data Science
Expert

Hello Tyr,

Thank you for commenting.

Let me clarify that the 90 usually does not include those orphan accounts or those who ask one question and left. Those people are really transient visitors who are not really part of the community (even they have an account). The 90 is really the lurkers, who belongs to the community by virtue of repeatedly consuming the community content, just do not participate.

That being said, a lot of people who analyze this simple take the registered user with no post as the 90 because it is simpler. But that is not correct! You must take a look at users who have login within a certain time period to assess whether they are still part of the community.

Moreover, lurker include those who repeatedly visit the community for content, but never even register. Getting an accurate estimate of lurker population involves several data sources that need to be distilled. And we have all that data, we just need to analyze it and make it presentable. So it is more difficult. And when I get around to crunching those data, I will post the result.

This post, however, only focuses on the 9-1 part of the 90-9-1 rule as you've mentioned.

by
‎06-21-2014 02:40 AM
Dr Wu,

I really like the approach of testing different levels of what we understand as 'most of the content' to verify 9-1 ratio of contributors. There seems to be one threat - isn't expected output affecting the research?

From my point of view, the data indicates that if we focus on a single community we can be almost 100% sure there will be a group of contributors and hyper contributors, however the ratio 1-9-90 seems to be questionable. Maybe it is just a concept, not meant to be very accurate, however I am extremely curious if there is no better ratio.

I was wondering if you tried to find more accurate division?
At least just to plot % of posts versus % of users who created them (sorted by number of posts, descending) and find two points having the smallest deviation.
That would give the real insights into tendencies and even if not defining new ratio, that would enable leveling stages of involvement.

Or maybe you did question those levels and it still confirmed 1-9-90 ratio?

Maria
Latest Articles
Top Tags
Archives