Online merchants such as Amazon, iTunes and Netflix may stock more items than your local book, CD, or video store, but they are no friend to "niche culture". Internet sharing mechanisms such as YouTube and Google PageRank, which distil the clicks of millions of people into recommendations, may also be promoting an online monoculture. Even word of mouth recommendations such as blogging links may exert a homogenizing pressure and lead to an online culture that is less democratic and less equitable, than offline culture.
Whenever I make these claims someone says "Well I use Netflix and it's shown me all kinds of films I didn't know about before. It's broadened my experience, so that's an increase in diversity." And someone else points to the latest viral home video on YouTube as evidence of niche success.
So this post explains why your gut feel is wrong.
I'm just going to show you two simulations. Run 1 above - which I will call Internet World - treats the entire set of 48 customers as a single community. The other (run 28 above), which I will call Offline World, breaks it into 24 communities of two people each. In Offline World I will get recommendations from the people around me and you will get recommendations from the people around you, but these recommendations are separate and isolated. In Internet World we each get recommendations from all 48 customers.
Here are the results for the two simulation runs I'm going to focus on. The results of these simulations are far from the only possible outcome, but they show why the gut feeling may fail, and I've chosen them for that purpose.
In Internet World each customer experiences an average of 3.5 products over the course of 75 choices with an active recommender system, while in Offline World each customer experiences only 2.4 different products. So the wider set of people providing recommendations in Internet World has led to an increase in individual diversity. This is like saying that "Netflix shows me pictures I would never had heard about from my friends alone", or "Amazon recommended a book I had never heard of, and I liked it".

But wait!
Here is a graph of the products in each simulation. This time, the area of each dot shows its popularity: how often a customer chooses it.
You can see that on the left, in Internet World, a few products were chosen a lot, especially the one centred on about (-0.2, -0.2). In Offline World there are many more medium-sized dots, showing that the consumption of products is more equal. In Internet World one product has "gone viral" and gets chosen over 1500 times out of the total of 3600, while 26 products languish in the obscurity of being sampled fewer than ten times. In Offline World no single product is chosen more than 10% of the time, and only 14 products are sampled fewer than ten times. In short, niche products do better in Offline World than in Internet World.
While each customer on average experiences more unique products in Internet World, the recommender system generates a correlation among the customers. To use a geographical analogy, in Internet World the customers see further, but they are all looking out from the same tall hilltop. In Offline World individual customers are standing on different, lower, hilltops. They may not see as far individually, but more of the ground is visible to someone. In Internet World, a lot of the ground cannot be seen by anyone because they are all standing on the same big hilltop.
The end result is the Gini values mentioned before. Here are Lorentz curves for Internet World (blue) and Offline World (green), in which the products are lined up in order of increasing popularity along the x axis, and the cumulative choices for those products is plotted up the Y axis.

So there it is. Individual diversity and cultural homogeneity coexisting in what we might call monopoly populism.
But don't think this is just about automated recommender systems, like the ones that Amazon and Netflix use. The recommender "system" could be anything that tends to build on its own popularity, including word of mouth. A couple of weeks ago someone pointed me to this video of Madin, a six-year-old soccer prodigy from Algeria, and the next day my son, who moves in very different online circles to me, was watching the same one. I know who Jim Cramer is even though we don't get CNBC in Canada because everyone is talking about him and helping his disembodied head to shoot down Jon Stewart. More people watched Tina Fey being Sarah Palin online than on Saturday Night Live, and Fey is now famous in countries where no one watches the TV show. Clay Shirky writes an essay and I get five different links to it in my Google Reader feed in one morning. Our online experiences are heavily correlated, and we end up with monopoly populism.
Whenever I make these claims someone says "Well I use Netflix and it's shown me all kinds of films I didn't know about before. It's broadened my experience, so that's an increase in diversity." And someone else points to the latest viral home video on YouTube as evidence of niche success.
So this post explains why your gut feel is wrong.
The argument comes from a paper by Daniel M. Fleder and Kartik Hosanagar called Blockbuster Culture's Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity. They simulate a number of different kinds of recommender system and look at how these systems affect the diversity of a set of choices. Towards the end of the paper they observe that some of their recommender systems increase the experience of diversity for every individual in the sample and yet decrease the overall diversity of the culture. So I wrote a program that does basically what they do in their paper and tweaked it to highlight this result.
The result is what's important here, rather than the particular algorithm used to generate this instance of it. But I know some people will want to know how the results are generated, so I'll give a short sketch. If you want more than this, Fleder and Hosanagar provide details, my tweaks to their model are available as source code (python) if you want, and if you post in the comments we could get into a discussion. But it's not important, trust me.
The result is what's important here, rather than the particular algorithm used to generate this instance of it. But I know some people will want to know how the results are generated, so I'll give a short sketch. If you want more than this, Fleder and Hosanagar provide details, my tweaks to their model are available as source code (python) if you want, and if you post in the comments we could get into a discussion. But it's not important, trust me.
Each simulation starts with 48 customers and 48 products. Each product is described by two attributes, with values generated according to a normal distribution. So the products are distributed on a two-dimensional grid, with a value of about -3 to +3 along each axis. Each customer is assigned a taste for each attribute, so they also are scattered about in the same space. The idea is that a customer will prefer, other things being equal, a product that is close to it in these attributes. Here are two distributions of customers (blue) and products (red). You can see that most customers share a mainstream taste around the middle of the graph, but there are a few who have odd tastes off to the edges. Likewise, most products have attributes that are mainstream, but there are a few "niche" products closer to the edge.

In this particular simulation, a customer can choose the same item over and over again, so it simulates something like streaming radio more than a bookstore. Each simulation starts off with a priming phase, in which each customer makes 75 choices according to a function which favours nearby products, but with some randomness so that they may on occasion choose one further away. After 75 choices we turn on a recommender function. Whenever a customer goes to make a choice, the recommender system identifies a product and recommends it to the customer. The recommendation increases the chance that the customer will choose the recommended product. Fleder and Hosanagar look at a few recommender functions. The one I use works like this:
- The set of 48 customers is divided into equal-sized communities, with members chosen at random so they may not be close in taste.
- The recommender function chooses an item by looking at what customers in the same community have chosen. It recommends the one most popular among others in the community.
I'm just going to show you two simulations. Run 1 above - which I will call Internet World - treats the entire set of 48 customers as a single community. The other (run 28 above), which I will call Offline World, breaks it into 24 communities of two people each. In Offline World I will get recommendations from the people around me and you will get recommendations from the people around you, but these recommendations are separate and isolated. In Internet World we each get recommendations from all 48 customers.
Here are the results for the two simulation runs I'm going to focus on. The results of these simulations are far from the only possible outcome, but they show why the gut feeling may fail, and I've chosen them for that purpose.
In Internet World each customer experiences an average of 3.5 products over the course of 75 choices with an active recommender system, while in Offline World each customer experiences only 2.4 different products. So the wider set of people providing recommendations in Internet World has led to an increase in individual diversity. This is like saying that "Netflix shows me pictures I would never had heard about from my friends alone", or "Amazon recommended a book I had never heard of, and I liked it".
On the other hand, the overall diversity of the culture can be measured by the Gini coefficient of the products. A Gini coefficient of zero is complete equality (each product is chosen an equal number of times) and a Gini coefficient of 1 is complete inequality (only one product is ever chosen by anyone). And Internet World has a Gini of 0.79 while Offline World has a Gini of only 0.52. Internet World is less diverse than Offline World.
How can these seemingly contradictory results happen? Let's take a look.
In the following graph, each dot is a customer, arranged in their two-attribute preference space (just like in the graphs above). But this time the area of each dot is proportional to the number of unique products they experience. So in Run 1 (Internet World) you can see that the dots are, on average, bigger than the dots in Run 28 (Offline World). This shows the greater individual experience of diversity in Internet World; for example, there is a customer with attributes of (1.1, -0.8) who samples no less than 38 different products, and only seven of the 48 customers stay with a single product throughout the whole simulation. Meanwhile in Offline World the most eclectic customer samples only nine and there are no fewer than 19 customers who sample just one product. The experience of individual customers in Internet World is of broader horizons and more selection, as recommendations pour in from far and wide, rather than from the limited experiences of their small community in Offline World. This picture has become the standard narrative of choice in the Internet World - our cultural experiences, liberated from the parochial tastes and limited awareness of those who happen to live close to us, are broadened by exposure to the wisdom of crowds, and the result is variety, diversity, and democratization. It is the age of the niche.
How can these seemingly contradictory results happen? Let's take a look.
In the following graph, each dot is a customer, arranged in their two-attribute preference space (just like in the graphs above). But this time the area of each dot is proportional to the number of unique products they experience. So in Run 1 (Internet World) you can see that the dots are, on average, bigger than the dots in Run 28 (Offline World). This shows the greater individual experience of diversity in Internet World; for example, there is a customer with attributes of (1.1, -0.8) who samples no less than 38 different products, and only seven of the 48 customers stay with a single product throughout the whole simulation. Meanwhile in Offline World the most eclectic customer samples only nine and there are no fewer than 19 customers who sample just one product. The experience of individual customers in Internet World is of broader horizons and more selection, as recommendations pour in from far and wide, rather than from the limited experiences of their small community in Offline World. This picture has become the standard narrative of choice in the Internet World - our cultural experiences, liberated from the parochial tastes and limited awareness of those who happen to live close to us, are broadened by exposure to the wisdom of crowds, and the result is variety, diversity, and democratization. It is the age of the niche.
But wait!
Here is a graph of the products in each simulation. This time, the area of each dot shows its popularity: how often a customer chooses it.
You can see that on the left, in Internet World, a few products were chosen a lot, especially the one centred on about (-0.2, -0.2). In Offline World there are many more medium-sized dots, showing that the consumption of products is more equal. In Internet World one product has "gone viral" and gets chosen over 1500 times out of the total of 3600, while 26 products languish in the obscurity of being sampled fewer than ten times. In Offline World no single product is chosen more than 10% of the time, and only 14 products are sampled fewer than ten times. In short, niche products do better in Offline World than in Internet World.
While each customer on average experiences more unique products in Internet World, the recommender system generates a correlation among the customers. To use a geographical analogy, in Internet World the customers see further, but they are all looking out from the same tall hilltop. In Offline World individual customers are standing on different, lower, hilltops. They may not see as far individually, but more of the ground is visible to someone. In Internet World, a lot of the ground cannot be seen by anyone because they are all standing on the same big hilltop.
The end result is the Gini values mentioned before. Here are Lorentz curves for Internet World (blue) and Offline World (green), in which the products are lined up in order of increasing popularity along the x axis, and the cumulative choices for those products is plotted up the Y axis.
So there it is. Individual diversity and cultural homogeneity coexisting in what we might call monopoly populism.
But don't think this is just about automated recommender systems, like the ones that Amazon and Netflix use. The recommender "system" could be anything that tends to build on its own popularity, including word of mouth. A couple of weeks ago someone pointed me to this video of Madin, a six-year-old soccer prodigy from Algeria, and the next day my son, who moves in very different online circles to me, was watching the same one. I know who Jim Cramer is even though we don't get CNBC in Canada because everyone is talking about him and helping his disembodied head to shoot down Jon Stewart. More people watched Tina Fey being Sarah Palin online than on Saturday Night Live, and Fey is now famous in countries where no one watches the TV show. Clay Shirky writes an essay and I get five different links to it in my Google Reader feed in one morning. Our online experiences are heavily correlated, and we end up with monopoly populism.
A "niche", remember, is a protected and hidden recess or cranny, not just another row in a big database. Ecological niches need protection from the surrounding harsh environment if they are to thrive. Simply putting lots of music into a single online iTunes store is no recipe for a broad, niche-friendly culture.

I've definitely discovered some great books because they were alphabetically next to others in the library. I'm all in favour of randomness too.
Posted by: tomslee | March 25, 2009 at 10:44 PM
I'll watch out for that. Cheers.
Posted by: tomslee | March 25, 2009 at 10:45 PM
One thing I've not sorted out is whether to trust the assumption of it being a matching problem at all. See http://www.nytimes.com/2007/04/15/magazine/15wwlnidealab.t.html for why it might not be.
Posted by: tomslee | March 25, 2009 at 10:48 PM
I'm not the only one who had never heard these worlds. Neither had Paul Kedrosky.
Posted by: tomslee | March 26, 2009 at 02:03 PM
Here's the thing, most recommendation systems rely heavily on product attribute similarity - not just popularity or community preference. It seems to me that both this model and the Fleder Hosanger models discount this fact - and it is a huge one at play in how recommendation systems work all over the web.
Pandora for example, makes recommendations based upon how well the "musical dna" (product attributes) of a song match the "dna" of any other. A song (product) recommendation, then, is *not* made based on the user's preference similarities to others on the web (the community), but on the similarity of the product itself to the amazingly diverse long tail of other products. So in this case, we're seeing pure diversity discovery unbiased by community viral popularity.
Is it an oversimplification to say, sure, the frictionless nature of information discovery on the web makes it possible that we all are "aware of the popular stuff" - but that certainly hasn't reduced our ability to simultaneously discover (and ultimately consume) more from the long tail, right?
Alot has been said in the past about 'cumulative disadvantage' in the context of web 2.0 and a more socially focused web. Here's some of my thoughts from a few years ago:
http://www.kurtvoelker.com/items/view/325/cumulative-dis-advantage
Posted by: Kurt Voelker | April 03, 2009 at 01:17 PM
The best thing I can imagine for increasing diversity is:
A) a "Show me a random thingy" button
along with
B) some way of rewarding the people who viewed and recommended an item before it became popular.
Posted by: Bryce | April 03, 2009 at 06:31 PM
Kurt - Interesting thoughts, but I disagree.
I think Pandora is unusual in its musical dna approach, and the idea of building attributes into a system is perhaps something limited to music. For example, the leaders in the Netflix Prize competition are using nothing about a movie/DVD except its title and release date - everything else comes from viewer assessments of movies. And Amazon doesn't build any attributes into its system either so far as I know.
As a result, in the movie and book space at lease, products don't have attributes until people rate them. This is the "cold start" problem that some recommender system people are looking at.
The Watts study that you don't like highlights the uncertain nature of products having well-defined attributes as well. It shows that people's perception of one song or another are shaped by others recommendation. In the last twelve months of the Netflix Prize one of the new factors the leading teams are building into their approaches is to take the date of ratings into account - the attributes of some movies apparently change over time.
Posted by: tomslee | April 03, 2009 at 09:21 PM
Well I haven't done much thinking about actual constructive ideas. I'm more interested in pouring cold water on others :). But I like (B) a lot.
Posted by: tomslee | April 03, 2009 at 09:22 PM
Interesting post.
Your idea about the internet also relates to island biogeography theory in ecology. The same amount of land in a bunch of small islands will have more species than one big island. This theoretical arguement is one reason why ecologists are worried about the increased transport of organisms around the world coupling the world together (i.e. biotic homogenization/invasive spp) making all the islands into one big island which is able to support much less diversity.
Posted by: garrypeterson | April 06, 2009 at 11:37 AM