June 27, 2009
April 2, 2009
Lots of good stuff have been coming to my attention lately.
- Amazon just announced their Amazon Elastic MapReduce program. Sounds like the main point of this service is to simplify setting up a Hadoop cluster in the cloud, and Amazon charges you a little extra above the normal EC2 and S3 costs for this service. Not clear to me yet why people will pay the extra cost instead of running their own instance of Hadoop on EC2. I mean, you can just read Chapter 4 of my book and do this all by yourself easily 😉 I hope to look more into this service over the weekend. At the very least this is a sign that a meaningful number of Amazon Web Services’ customers are using the EC2 cloud to run Hadoop, and so Amazon decides to focus on making it easier.
- The March issue of the IEEE Data Engineering Bulletin is a special issue on data management on cloud computing platforms. It has papers written by academics as well as from Yahoo and IBM. Haven’t had time to read it yet, but it looks like Hadoop and Amazon EC2 are mentioned a lot.
- Just heard about the open source Sector-Sphere project, which is a system for distributed storage and computation using commodity computers. In other words, it’s an alternative framework to Hadoop but it has a lot of architectural differences. It seems to be just the work of a few academics so far. I hope to play around with it… when I can find time from work and writing the book…
March 6, 2009
Stephen Wolfram, creator of Mathematica (which I used to use a lot) and author of A New Kind of Science, blogged about his current project, Wolfram|Alpha. The web site won’t fully launch till May, and his blog post is lacking in details, but the project seems to be some combination of search engine, natural language processing, and expert-curated semantic models. Definitely something to watch.
Yahoo Developer Network has a new video on how they use Hadoop to analyze and filter spam. This is part 1 of a series, and it’s just background info on the size of their spam problem and how Hadoop is more scalable than their previous DB solution. Hopefully future episodes will have more meat.
February 12, 2009
Posting here has been light for a while. Lately my writing time has gone to a new book on Hadoop. It will be published by Manning with the title Hadoop in Action. Yesterday Manning released it in their early access program. You can check it out and pre-order it at http://www.manning.com/lam/.
August 29, 2008
I’ve recently discovered OpenCalais and found its concept to be really interesting. OpenCalais is a web service created by Thomson Reuters for extracting semantic entities from natural language text. The quickest way to understand it is to check out the demo app here. You can copy-and-paste some text into the entry box and see OpenCalais does its semantic processing on the text. For example, I pasted this sentence I just read in the New York Times, “Judy Estrin, who has built several Silicon Valley companies and was the chief technology officer of Cisco Systems, says Silicon Valley is in trouble.” The demo app picks out the references to a “Person” named “Judy Estrin” as well as a “Company” named “Cisco Systems.” In addition, it picks out a “Quotation” by person “Judy Estrin” with quote “Silicon Valley is in trouble.” It also picks out a “Person Professional Past” relationship between a person of “Judy Estrin”, a position of “chief technology officer”, and a company of “Cisco Systems.”
Now, just imagine that kind of natural language processing capability available as a Web service API, and that is OpenCalais.
January 12, 2008
[Thomas] Edison… went on to invent filmmaking, and demanded a licensing fee from those making movies with his technology. This caused a band of filmmaking pirates, including a man named William, to flee New York for the then still wild West, where they thrived, unlicensed, until Edison’s patents expired. These pirates continue to operate there, albeit legally now, in the town they founded: Hollywood. William’s last name? Fox.
November 27, 2007
Got a vague email announcement about a talk at Stanford this Wednesday on the Google G-Phone.
Stanford EE Computer Systems Colloquium
NEC Auditorium, Gates Computer Science Building B03
Speaker: Speaker to be announced.
About the talk:
A speaker fromwill discuss the recently announced the
g-phone system. The details of this talk are still in flux; an
abstract will distributed when it becomes available.
[ 1 ] http://ee380.stanford.edu
The web site right now names Richard Miner as the speaker. I found on the web a description of him as “a key member of Android’s technical staff and a co-founder of the namesake company Google acquired in 2005.” Web cast of his talk should be available afterward.
November 24, 2007
The PARC Forum for the winter season will focus on “going beyond Web 2.0.” This series started last week with the following lineup:
- November 15 — Ross Mayfield, SocialText
- November 29 — Garrett Camp, Stumble Upon
- December 6 — Charlene Li, Forrester Research
- December 13 — Guy Kawasaki, Truemors, Garage Ventures
- January 10 — Bernardo Huberman, HP Labs
- January 17 — Chris Anderson, Long Tail
- February 7 — Premal Shah, Kiva.org
- February 21 — Andrew Mc Afee, Harvard Business School
- March 20 — Lisa Petrides, Amee Evans; OER Commons
- March 27 — Ed Chi, PARC Augmented Social Cognition
August 29, 2007
I thought posting would be light while I was traveling in China. I didn’t know that I wouldn’t be able to access my blog at all. At first I thought my blog was so important that the Chinese government went out of their way to block it, but it turned out that I just couldn’t access anything on wordpress.com. I didn’t get a 404 but the connection just timed out. It was a similar experience trying to get to wikipedia.com.
I know the locals have special browsers and software to tunnel around and gain access to these “forbidden” sites. I wasn’t motivated enough (and I didn’t speak enough Mandarin) to figure it all out. Besides, I was already able to read all the blogs that I regularly follow, as I normally use Yahoo’s RSS Reader, which functions as a proxy. I had no problem accessing any of the American news sites (e.g. NYT, WSJ) either, as they were not blocked.
At any rate, I’m back and will start posting regularly again.
August 18, 2007
I’m traveling to China for the Int’l Conference on Intelligent Computing, so posting will be light for the next week or so.