Data Strategy

July 13, 2007

Collecting data through games

Filed under: Data Collection, Game, Search — chucklam @ 5:09 pm

There’s a current Wired article on Luis von Ahn’s ESP game and his other projects. Luis von Ahn’s research focuses on using “human computation” to collect and organize data. Of course, grad students and other cheap labor have been assigned to do such “human computation” for a long time. Luis von Ahn’s twist is to turn such data collection into games and thus make “human computation” fun and cheap. When designed well, such data collection tasks become very scalable.

To summarize, the ESP Game (and its spin-off, the Google Image Labeler) is an online game where two players are randomly paired up and an image is shown to both of them. Game-play involves them tagging the image with labels. The two players don’t know each other and have no way of communicating. Their objective is to arrive at a label that they have both given, and to do so using the shortest amount of time. The architecture of game play is such that the players are motivated to give “common” labels that an unknown, “typical” person would give.

I remember a similar game that was developed at UC Berkeley around the same time as the ESP Game but which had gotten almost no attention. In fact, I don’t remember the game’s name myself, and I can’t find references to it now. (Readers are welcome to help. I know about this game just because I had seen a demo of it some years ago.) At any rate, in The Berkeley Game, a game host would come up with a description of something that can be photographed (e.g. “a man walking a dog”) and SMS that description to the team of players, all of whom have camera-enabled mobile phones. The first player to take a picture that fits the description and sends it back to the game host wins. The game obviously calls for more involvement and physical activity from the players, and the technical sophistication needed to play the game is only now becoming more mainstream. In that sense, The Berkeley Game was ahead of its time.

One of the main structural differences between the ESP Game and The Berkeley Game is analogous to the difference between search engines and Naver-like question-and-answer portals. In one (search and ESP game), a lot of content is assumed to exist and one’s challenge is to associate structure/metadata to those content. In the other (The Berkeley Game and Naver), content is assumed to be scarce and people are given incentive to generate content that would conform to some structure/metadata. Search and the ESP Game are much more widely known, and one is led to the conclusion that those strategies are better choices for general Web problems, where content is in fact usually abundant. However, Naver’s definitive win over Google in Korea points to the appropriateness of using the “generate content” strategy in niche markets. That’s a strategy well worth considering if you’re working in vertical markets (real estate, travel, corporate intranet, etc.) or the “long tail” where much of the information may in fact be unpublished. Focusing on information gathering, more than just search, will also give you a lot more leverage when that big, bad search engine tries to invade your space.

If haven’t heard of Naver, I have written a couple posts earlier on this Korean portal (1, 2).


Blog at