A very interesting experiment, on lots of levels:
Whimsley: The Netflix Prize: 300 Days Later: Online DVD rental outfit Netflix caused a real buzz last October when it announced the competition. If anyone can come up with a recommender system for predicting customer DVD preferences that beats its own algorithm (Cinematch) by a certain amount, Netflix will hand over $1million. The prize got a lot of attention because it exemplifies the idea of crowdsourcing. Not only does Netflix rely on crowdsourcing of DVD ratings (user ratings of DVD titles) but the competition itself is an attempt to use crowdsourcing to develop the algorithms to make the most of those ratings. Instead of doing the work itself, or hiring specialists, Netflix lets whoever anyone enter their competition and pays the winner...
As soon as you start looking at the data set it becomes obvious why it is so difficult to get good results. Databases don't have the linear algebra and other mathematical tools for taking a run at the prize but they are convenient for exploring data sets, so I loaded the data into a SQL Anywhere database (The developer edition is a free download, and I'll provide a perl script to load the data if you really want it) and started poking around. Here are a few of the more obvious oddities (all these observations have been posted elsewhere - see the Netflix prize forum for more):
- Customer 2170930 has rated 1963 titles and given each and every one a rating of one (very bad). You would think they would have cancelled their subscription by now.
- Five customers have rated over 10,000 of the 17,770 titles selected - and presumably they also have rated some of the others among the 60,000 or so titles Netflix had available when they released the ratings. Are these real people?
- Customer 305344 had rated 17654 titles. Even though Netflix make it easy to rate titles that you have not rented from them (so they can get a handle on your preferences) can this be real?
- Customer 1664010 rated 5446 titles in a single day (October 12, 2005).
- Customer 2270619 has rated 1975 titles. 1931 were given a 5, 31 were given a 4, 10 given a 3, 2 given a 2 (Grumpy Old Men and Sex In Chains) and a single title was given a 1. That title? Gandhi, which has an average rating of over 4 and which less than 2% of those who watch it give a 1.
- The most often rated movie? Miss Congeniality with ratings by over 232,000 of the 480,000 customers. And which title is most similar to it in terms of ratings (using a slightly weighted Pearson formula)? Bloodfist 5: Human Target.
- Most highly rated - Lord of the Rings: Return of the King (Extended Edition), with 4.7...
So what I get from the Netflix prize is that there are probably significant limits to recommender systems. Even the smartest don't do a whole lot better than the simple approaches, and a lot of work is required to eke out even a little more actual information from the morass of data. It seems surprisingly difficult to get reliable, factual information on this important question of how useful they can be. Part of the reason is that they are new - Amazon has only been in business for about ten years after all - and part of the reason is that the behaviour of these systems is often a closely guarded secret despite the aura of openness that web companies cultivate.
This matters because there is a surprising amount riding on the effectiveness of recommender systems. Silicon Valley's new-economy enthusiasts see them as the key to developing a new level of cultural democracy: they see recommender systems as a trebuchet hurling rocks at the castles of the old elite of mainstream media, big publishers with big marketing departments, big-chain book stores and Hollywood sequels. Recommender systems are claimed to embody the "wisdom of crowds". The idea is that everyone just publishes stuff (blogs, wikipedia entries and so on) and amateur readers or viewers decide what has merit by their actions (rating stories, buying and rating books and DVDs and so on). The work of critics is "crowdsourced" to customers, but it is the recommender system that distills these ratings to yield the aforementioned wisdom.
If faith in recommender systems is misplaced, then the new boss may look much like the old boss only with more computer hardware. There is a danger that recommender systems may simply magnify the popularity of whatever is currently hot - that they may just amplify the voice of marketing machines rather than reveal previously-hidden gems. Even worse, their presence may drive out other sources of cultural diversity (small bookstores, independent music labels, libraries) concentrating the rewards of cultural production in fewer hands than ever and leading us to a more homogeneous, winner-take-all culture.
I'm no futurist, but I see little evidence from the first 300 days of the Netflix Prize that recommender systems are the magic ingredient that will reveal the wisdom of crowds.