Data Strategy

June 27, 2009

Netflix prize has been won!!

Filed under: Uncategorized — chucklam @ 1:31 am

News via Geeking with Greg. Team Bellkor’s Pragmatic Chaos has achieved greater than 10% improvement on the Netflix Prize.


April 2, 2009

Amazon Elastic MapReduce, and other stuff I don’t have time to grok yet

Filed under: Infrastructure, Uncategorized — chucklam @ 4:54 am

Lots of good stuff have been coming to my attention lately.

  • Amazon just announced their Amazon Elastic MapReduce program. Sounds like the main point of this service is to simplify setting up a Hadoop cluster in the cloud, and Amazon charges you a little extra above the normal EC2 and S3 costs for this service. Not clear to me yet why people will pay the extra cost instead of running their own instance of Hadoop on EC2. I mean, you can just read Chapter 4 of my book and do this all by yourself easily 😉 I hope to look more into this service over the weekend. At the very least this is a sign that a meaningful number of Amazon Web Services’ customers are using the EC2 cloud to run Hadoop, and so Amazon decides to focus on making it easier.
  • The March issue of the IEEE Data Engineering Bulletin is a special issue on data management on cloud computing platforms. It has papers written by academics as well as from Yahoo and IBM. Haven’t had time to read it yet, but it looks like Hadoop and Amazon EC2 are mentioned a lot.
  • Just heard about the open source Sector-Sphere project, which is a system for distributed storage and computation using commodity computers. In other words, it’s an alternative framework to Hadoop but it has a lot of architectural differences. It seems to be just the work of a few academics so far. I hope to play around with it… when I can find time from work and writing the book…

March 6, 2009

Quick notes

Filed under: Uncategorized — chucklam @ 3:22 am

Stephen Wolfram, creator of Mathematica (which I used to use a lot) and author of A New Kind of Science, blogged about his current project, Wolfram|Alpha. The web site won’t fully launch till May, and his blog post is lacking in details, but the project seems to be some combination of search engine, natural language processing, and expert-curated semantic models. Definitely something to watch.

Yahoo Developer Network has a new video on how they use Hadoop to analyze and filter spam. This is part 1 of a series, and it’s just background info on the size of their spam problem and how Hadoop is more scalable than their previous DB solution. Hopefully future episodes will have more meat.

February 12, 2009

My book on Hadoop

Filed under: Uncategorized — chucklam @ 4:08 am

Posting here has been light for a while. Lately my writing time has gone to a new book on Hadoop. It will be published by Manning with the title Hadoop in Action. Yesterday Manning released it in their early access program. You can check it out and pre-order it at

August 29, 2008

OpenCalais: Semantic Processing as a Web Service

Filed under: Uncategorized — chucklam @ 2:57 am

I’ve recently discovered OpenCalais and found its concept to be really interesting. OpenCalais is a web service created by Thomson Reuters for extracting semantic entities from natural language text. The quickest way to understand it is to check out the demo app here. You can copy-and-paste some text into the entry box and see OpenCalais does its semantic processing on the text. For example, I pasted this sentence I just read in the New York Times, “Judy Estrin, who has built several Silicon Valley companies and was the chief technology officer of Cisco Systems, says Silicon Valley is in trouble.” The demo app picks out the references to a “Person” named “Judy Estrin” as well as a “Company” named “Cisco Systems.” In addition, it picks out a “Quotation” by person “Judy Estrin” with quote “Silicon Valley is in trouble.” It also picks out a “Person Professional Past” relationship between a person of “Judy Estrin”, a position of “chief technology officer”, and a company of “Cisco Systems.”

Now, just imagine that kind of natural language processing capability available as a Web service API, and that is OpenCalais.

The OpenCalais team will be presenting at various events in September, being in Palo Alto on the 3rd and San Francisco on the 4th.

January 12, 2008

The piracy root of Hollywood

Filed under: Uncategorized — chucklam @ 6:22 pm

Via a post by Matt Mason on TorrentFreak:

[Thomas] Edison… went on to invent filmmaking, and demanded a licensing fee from those making movies with his technology. This caused a band of filmmaking pirates, including a man named William, to flee New York for the then still wild West, where they thrived, unlicensed, until Edison’s patents expired. These pirates continue to operate there, albeit legally now, in the town they founded: Hollywood. William’s last name? Fox.

November 27, 2007

Under the Hood of Google’s G-Phone

Filed under: Uncategorized — chucklam @ 1:34 am

Got a vague email announcement about a talk at Stanford this Wednesday on the Google G-Phone.

Stanford EE Computer Systems Colloquium
4:15PM, Wednesday, Nov 28, 2007
NEC Auditorium, Gates Computer Science Building B03[1]

Topic: The Google G-Phone

Speaker: Speaker to be announced.

About the talk:

A speaker from Google will discuss the recently announced the
g-phone system. The details of this talk are still in flux; an
abstract will distributed when it becomes available.

Embedded Links:
[ 1 ]

The web site right now names Richard Miner as the speaker. I found on the web a description of him as “a key member of Android’s technical staff and a co-founder of the namesake company Google acquired in 2005.” Web cast of his talk should be available afterward.

November 24, 2007

PARC speaker series on Going Beyond Web 2.0

Filed under: Uncategorized — chucklam @ 1:27 pm

The PARC Forum for the winter season will focus on “going beyond Web 2.0.” This series started last week with the following lineup:

  • November 15 — Ross Mayfield, SocialText
  • November 29 — Garrett Camp, Stumble Upon
  • December 6 — Charlene Li, Forrester Research
  • December 13 — Guy Kawasaki, Truemors, Garage Ventures
  • January 10 — Bernardo Huberman, HP Labs
  • January 17 — Chris Anderson, Long Tail
  • February 7 — Premal Shah,
  • February 21 — Andrew Mc Afee, Harvard Business School
  • March 20 — Lisa Petrides, Amee Evans; OER Commons
  • March 27 — Ed Chi, PARC Augmented Social Cognition

More information here. Some of the forum talks are archived here. The most recently archived video right now is a talk by John Warnock, founder of Adobe, on “Reinventing the Media Businesses.”

August 29, 2007

My blog is censored in China :(

Filed under: Uncategorized — chucklam @ 5:10 pm

I thought posting would be light while I was traveling in China. I didn’t know that I wouldn’t be able to access my blog at all. At first I thought my blog was so important that the Chinese government went out of their way to block it, but it turned out that I just couldn’t access anything on I didn’t get a 404 but the connection just timed out. It was a similar experience trying to get to

I know the locals have special browsers and software to tunnel around and gain access to these “forbidden” sites. I wasn’t motivated enough (and I didn’t speak enough Mandarin) to figure it all out. Besides, I was already able to read all the blogs that I regularly follow, as I normally use Yahoo’s RSS Reader, which functions as a proxy. I had no problem accessing any of the American news sites (e.g. NYT, WSJ) either, as they were not blocked.

At any rate, I’m back and will start posting regularly again.

August 18, 2007

Posting will be light…

Filed under: Uncategorized — chucklam @ 5:39 pm

I’m traveling to China for the Int’l Conference on Intelligent Computing, so posting will be light for the next week or so.

Older Posts »

Blog at