Whenever I make these claims someone says "Well I use Netflix and it's shown me all kinds of films I didn't know about before. It's broadened my experience, so that's an increase in diversity." And someone else points to the latest viral home video on YouTube as evidence of niche success.
So this post explains why your gut feel is wrong.
The result is what's important here, rather than the particular algorithm used to generate this instance of it. But I know some people will want to know how the results are generated, so I'll give a short sketch. If you want more than this, Fleder and Hosanagar provide details, my tweaks to their model are available as source code (python) if you want, and if you post in the comments we could get into a discussion. But it's not important, trust me.
- The set of 48 customers is divided into equal-sized communities, with members chosen at random so they may not be close in taste.
- The recommender function chooses an item by looking at what customers in the same community have chosen. It recommends the one most popular among others in the community.
I'm just going to show you two simulations. Run 1 above - which I will call Internet World - treats the entire set of 48 customers as a single community. The other (run 28 above), which I will call Offline World, breaks it into 24 communities of two people each. In Offline World I will get recommendations from the people around me and you will get recommendations from the people around you, but these recommendations are separate and isolated. In Internet World we each get recommendations from all 48 customers.
Here are the results for the two simulation runs I'm going to focus on. The results of these simulations are far from the only possible outcome, but they show why the gut feeling may fail, and I've chosen them for that purpose.
In Internet World each customer experiences an average of 3.5 products over the course of 75 choices with an active recommender system, while in Offline World each customer experiences only 2.4 different products. So the wider set of people providing recommendations in Internet World has led to an increase in individual diversity. This is like saying that "Netflix shows me pictures I would never had heard about from my friends alone", or "Amazon recommended a book I had never heard of, and I liked it".
How can these seemingly contradictory results happen? Let's take a look.
In the following graph, each dot is a customer, arranged in their two-attribute preference space (just like in the graphs above). But this time the area of each dot is proportional to the number of unique products they experience. So in Run 1 (Internet World) you can see that the dots are, on average, bigger than the dots in Run 28 (Offline World). This shows the greater individual experience of diversity in Internet World; for example, there is a customer with attributes of (1.1, -0.8) who samples no less than 38 different products, and only seven of the 48 customers stay with a single product throughout the whole simulation. Meanwhile in Offline World the most eclectic customer samples only nine and there are no fewer than 19 customers who sample just one product. The experience of individual customers in Internet World is of broader horizons and more selection, as recommendations pour in from far and wide, rather than from the limited experiences of their small community in Offline World. This picture has become the standard narrative of choice in the Internet World - our cultural experiences, liberated from the parochial tastes and limited awareness of those who happen to live close to us, are broadened by exposure to the wisdom of crowds, and the result is variety, diversity, and democratization. It is the age of the niche.
But wait!
Here is a graph of the products in each simulation. This time, the area of each dot shows its popularity: how often a customer chooses it.
You can see that on the left, in Internet World, a few products were chosen a lot, especially the one centred on about (-0.2, -0.2). In Offline World there are many more medium-sized dots, showing that the consumption of products is more equal. In Internet World one product has "gone viral" and gets chosen over 1500 times out of the total of 3600, while 26 products languish in the obscurity of being sampled fewer than ten times. In Offline World no single product is chosen more than 10% of the time, and only 14 products are sampled fewer than ten times. In short, niche products do better in Offline World than in Internet World.
While each customer on average experiences more unique products in Internet World, the recommender system generates a correlation among the customers. To use a geographical analogy, in Internet World the customers see further, but they are all looking out from the same tall hilltop. In Offline World individual customers are standing on different, lower, hilltops. They may not see as far individually, but more of the ground is visible to someone. In Internet World, a lot of the ground cannot be seen by anyone because they are all standing on the same big hilltop.
The end result is the Gini values mentioned before. Here are Lorentz curves for Internet World (blue) and Offline World (green), in which the products are lined up in order of increasing popularity along the x axis, and the cumulative choices for those products is plotted up the Y axis.
So there it is. Individual diversity and cultural homogeneity coexisting in what we might call monopoly populism.
But don't think this is just about automated recommender systems, like the ones that Amazon and Netflix use. The recommender "system" could be anything that tends to build on its own popularity, including word of mouth. A couple of weeks ago someone pointed me to this video of Madin, a six-year-old soccer prodigy from Algeria, and the next day my son, who moves in very different online circles to me, was watching the same one. I know who Jim Cramer is even though we don't get CNBC in Canada because everyone is talking about him and helping his disembodied head to shoot down Jon Stewart. More people watched Tina Fey being Sarah Palin online than on Saturday Night Live, and Fey is now famous in countries where no one watches the TV show. Clay Shirky writes an essay and I get five different links to it in my Google Reader feed in one morning. Our online experiences are heavily correlated, and we end up with monopoly populism.

I've definitely discovered some great books because they were alphabetically next to others in the library. I'm all in favour of randomness too.
Posted by: tomslee | March 25, 2009 at 10:44 PM
I'll watch out for that. Cheers.
Posted by: tomslee | March 25, 2009 at 10:45 PM
One thing I've not sorted out is whether to trust the assumption of it being a matching problem at all. See http://www.nytimes.com/2007/04/15/magazine/15wwlnidealab.t.html for why it might not be.
Posted by: tomslee | March 25, 2009 at 10:48 PM
I'm not the only one who had never heard these worlds. Neither had Paul Kedrosky.
Posted by: tomslee | March 26, 2009 at 02:03 PM
Here's the thing, most recommendation systems rely heavily on product attribute similarity - not just popularity or community preference. It seems to me that both this model and the Fleder Hosanger models discount this fact - and it is a huge one at play in how recommendation systems work all over the web.
Pandora for example, makes recommendations based upon how well the "musical dna" (product attributes) of a song match the "dna" of any other. A song (product) recommendation, then, is *not* made based on the user's preference similarities to others on the web (the community), but on the similarity of the product itself to the amazingly diverse long tail of other products. So in this case, we're seeing pure diversity discovery unbiased by community viral popularity.
Is it an oversimplification to say, sure, the frictionless nature of information discovery on the web makes it possible that we all are "aware of the popular stuff" - but that certainly hasn't reduced our ability to simultaneously discover (and ultimately consume) more from the long tail, right?
Alot has been said in the past about 'cumulative disadvantage' in the context of web 2.0 and a more socially focused web. Here's some of my thoughts from a few years ago:
http://www.kurtvoelker.com/items/view/325/cumulative-dis-advantage
Posted by: Kurt Voelker | April 03, 2009 at 01:17 PM
The best thing I can imagine for increasing diversity is:
A) a "Show me a random thingy" button
along with
B) some way of rewarding the people who viewed and recommended an item before it became popular.
Posted by: Bryce | April 03, 2009 at 06:31 PM
Kurt - Interesting thoughts, but I disagree.
I think Pandora is unusual in its musical dna approach, and the idea of building attributes into a system is perhaps something limited to music. For example, the leaders in the Netflix Prize competition are using nothing about a movie/DVD except its title and release date - everything else comes from viewer assessments of movies. And Amazon doesn't build any attributes into its system either so far as I know.
As a result, in the movie and book space at lease, products don't have attributes until people rate them. This is the "cold start" problem that some recommender system people are looking at.
The Watts study that you don't like highlights the uncertain nature of products having well-defined attributes as well. It shows that people's perception of one song or another are shaped by others recommendation. In the last twelve months of the Netflix Prize one of the new factors the leading teams are building into their approaches is to take the date of ratings into account - the attributes of some movies apparently change over time.
Posted by: tomslee | April 03, 2009 at 09:21 PM
Well I haven't done much thinking about actual constructive ideas. I'm more interested in pouring cold water on others :). But I like (B) a lot.
Posted by: tomslee | April 03, 2009 at 09:22 PM
Interesting post.
Your idea about the internet also relates to island biogeography theory in ecology. The same amount of land in a bunch of small islands will have more species than one big island. This theoretical arguement is one reason why ecologists are worried about the increased transport of organisms around the world coupling the world together (i.e. biotic homogenization/invasive spp) making all the islands into one big island which is able to support much less diversity.
Posted by: garrypeterson | April 06, 2009 at 11:37 AM