Data Strategy

March 14, 2008

Seeing Netflix data as more than just a bunch of numbers

It’s a truism among dataminers that analyzing certain data can help us understand people. However, dataminers rarely see that psychology, the discipline of understanding people, can help get more value out of such data. It was recently reported that one of the top ten contestants in the Netflix Prize approached the challenge from a psychologist’s point of view rather than from a computer scientist’s.

For example, people don’t often give their “true” rating on movies. Instead, they can be biased by anchoring. That is, their rating of a movie is influenced by the ratings they had just given earlier for other movies. Adjusting for biases such as this is how Gavin Potter, aka “Just a guy in a garage,” got to be number 9 in the Netflix Prize leaderboard.

August 1, 2007

E-commerce benefiting from new recommendation systems

Filed under: Collaborative filtering, People and Data, Personalization — chucklam @ 2:33 pm

WSJ has an article yesterday “We Know What You Ought To Be Watching This Summer” (subscription required) on how e-commerce sites are benefiting from deploying a new generation of recommendation systems. These new systems try to recommend products based on a sense of your “taste” that may not be obvious from statistics alone. These systems are working and providing concrete business results.

Since adding the software, Blockbuster says it has lost fewer customers, in percentage terms, to rival services, and the number of movies in the average customer’s “to watch” list has grown by almost 50%. is using these new systems in its email advertising to its customers.

The targeted emails have increased the rate at which email recipients go on to make a purchase between 25% and 50%, says [ CEO] Mr. Byrne.

Sucharita Mulpuru, a senior analyst at Forrester Research, claims that companies that implement these recommendation systems usually see at least a 10% bump in sales.

June 27, 2007

Social tagging and voting was invented at Xerox PARC… 15 years ago

Filed under: Collaborative filtering, Personalization — chucklam @ 4:56 pm

It’s easy to believe that social tagging started with and social voting started with digg. However, Xerox PARC had developed such functions in a system called Tapestry more than 15 years ago. The system was described in a 1992 Communications of the ACM article. (Official ACM link here. A “publicly” available pdf version here. A slide presentation here.) From the article:

“The Tapestry system was designed and built to support collaborative filtering. Collaborative filtering simply means that people collaborate to help one another perform filtering by recording their reactions to documents they read. Such reactions may be that a document was particularly interesting (or particularly uninteresting). These reactions, more generally called annotations, can be accessed by others’ filters.” (Emphasis theirs.)

This paper, in fact, was the first to coin the term ‘collaborative filtering,’ which over the years had evolved to mean the special case of automated recommendation using implicit feedback (i.e. Amazon style). Tapestry was architected for the general case. It assumed that “some annotations are themselves complex objects, and those annotations are more simply stored as separate records with pointers back to the document they annotate.” This design would sound familiar to anyone who had implemented a “modern” social tagging and voting system. See, for example, the design of Askeet.

It’s interesting to read a paper from 15 years ago and get some historical perspective. Xerox PARC had gotten the skeleton design of Web 2.0 functions before there was Web 1.0! It’s always amusing to read things like this too:

“Filtering on incoming documents is a very computationally intensive task. Imagine a Tapestry system with hundreds of users, each with dozens of filter queries, running on a document stream of tens of documents per minute.”

Yeah… We all need to thank the electrical engineers that make Moore’s law a reality…

Blog at