Data Strategy

August 29, 2008

OpenCalais: Semantic Processing as a Web Service

Filed under: Uncategorized — chucklam @ 2:57 am

I’ve recently discovered OpenCalais and found its concept to be really interesting. OpenCalais is a web service created by Thomson Reuters for extracting semantic entities from natural language text. The quickest way to understand it is to check out the demo app here. You can copy-and-paste some text into the entry box and see OpenCalais does its semantic processing on the text. For example, I pasted this sentence I just read in the New York Times, “Judy Estrin, who has built several Silicon Valley companies and was the chief technology officer of Cisco Systems, says Silicon Valley is in trouble.” The demo app picks out the references to a “Person” named “Judy Estrin” as well as a “Company” named “Cisco Systems.” In addition, it picks out a “Quotation” by person “Judy Estrin” with quote “Silicon Valley is in trouble.” It also picks out a “Person Professional Past” relationship between a person of “Judy Estrin”, a position of “chief technology officer”, and a company of “Cisco Systems.”

Now, just imagine that kind of natural language processing capability available as a Web service API, and that is OpenCalais.

The OpenCalais team will be presenting at various events in September, being in Palo Alto on the 3rd and San Francisco on the 4th.


August 1, 2008

“Collaborative filtering” help drive Digg usage

Filed under: Datamining, Information Retrieval, People and Data, Personalization — chucklam @ 3:25 am

Digg released their “collaborative filtering” system a month ago. Now they’ve blogged about some of the initial results. While it’s an obviously biased point of view, things look really good in general.

  • “Digging activity is up significantly: the total number of Diggs increased 40% after launch.”
  • “Friend activity/friends added is up 24%.”
  • “Commenting is up 11% since launch.”

What I find particularly interesting here is the historical road that “collaborative filtering” has taken. The term “collaborative filtering” was first coined by Xerox PARC more than 15 years ago. Researchers at PARC had a system called Tapestry. It allowed users to “collaborate to help one another perform filtering by recording their reactions to documents they read.” This, in fact, was a precursor to today’s Digg and Delicious.

Soon after PARC created Tapestry, automated collaborative filtering (ACF) was invented. The emphasis was to automate everything and make its usage effortless. Votes were implied by purchasing or other behavior, and recommendation was computed in a “people like you have also bought” style. This style of recommendation was so successful at the time that it had completely taken over the term “collaborative filtering” ever since.

In the Web 2.0 wave, companies like Digg and Delicious revived the Tapestry-style of collaborative filtering. (Although I’d be surprised if those companies had done so as a conscious effort.) They were in a sense stripped-down versions of Tapestry, blow up to web scale, and made extremely easy to use. (The original Tapestry required one to write database-like queries.)

Now Digg, which one can think of as Tapestry 2.0, is adding ACF back into its style of recommendation and getting extremely positive results. Everything seems to have moved forward, and at the same time it seems to have come full circle.

Blog at