Data Strategy

November 28, 2007

UC Irvine to offer a certificate program in web analytics

Filed under: Datamining, People and Data — chucklam @ 3:48 am

Looks like web analytics is growing to be a big enough field for universities to start offering a certificate program in it. A brief description of the UC Irvine program below. The full press release is here.

The program provides a foundational knowledge of practical Web analysis to experienced marketing and business professionals in light of the changing face of e-commerce. The impetus for the program is a growing need for Web site evaluation spurred by companies challenged in their search for new customers due primarily to an explosive growth in Web-based sales and marketing.


November 27, 2007

Under the Hood of Google’s G-Phone

Filed under: Uncategorized — chucklam @ 1:34 am

Got a vague email announcement about a talk at Stanford this Wednesday on the Google G-Phone.

Stanford EE Computer Systems Colloquium
4:15PM, Wednesday, Nov 28, 2007
NEC Auditorium, Gates Computer Science Building B03[1]

Topic: The Google G-Phone

Speaker: Speaker to be announced.

About the talk:

A speaker from Google will discuss the recently announced the
g-phone system. The details of this talk are still in flux; an
abstract will distributed when it becomes available.

Embedded Links:
[ 1 ]

The web site right now names Richard Miner as the speaker. I found on the web a description of him as “a key member of Android’s technical staff and a co-founder of the namesake company Google acquired in 2005.” Web cast of his talk should be available afterward.

November 24, 2007

PARC speaker series on Going Beyond Web 2.0

Filed under: Uncategorized — chucklam @ 1:27 pm

The PARC Forum for the winter season will focus on “going beyond Web 2.0.” This series started last week with the following lineup:

  • November 15 — Ross Mayfield, SocialText
  • November 29 — Garrett Camp, Stumble Upon
  • December 6 — Charlene Li, Forrester Research
  • December 13 — Guy Kawasaki, Truemors, Garage Ventures
  • January 10 — Bernardo Huberman, HP Labs
  • January 17 — Chris Anderson, Long Tail
  • February 7 — Premal Shah,
  • February 21 — Andrew Mc Afee, Harvard Business School
  • March 20 — Lisa Petrides, Amee Evans; OER Commons
  • March 27 — Ed Chi, PARC Augmented Social Cognition

More information here. Some of the forum talks are archived here. The most recently archived video right now is a talk by John Warnock, founder of Adobe, on “Reinventing the Media Businesses.”

November 19, 2007

IEEE Computer special issue on search

Filed under: Advertising, Collective wisdom, People and Data, Personalization, Search — chucklam @ 6:08 pm

I’m quite behind on a lot of my readings, so I only got around to reading the IEEE Computer’s (August) special issue on search this weekend. (Abstracts are free but actual PDF’s require an expensive subscription or an expensive purchase.) It includes the following articles:

  • Search Engines that Learn from Implicit Feedback
  • A Community-Based Approach to Personalizing Web Search
  • Sponsored Search: Is Money a Motivator for Providing Relevant Results?
  • Deciphering Trends in Mobile Search
  • Toward a PeopleWeb

The articles were written by a mix of university academics and researchers from Google and Yahoo. They seem targeted at giving the general practitioner a sampling of some of current research, rather than being comprehensive in any specific domain or deep in a particular research area.

For me, the most interesting article is “Search Engines that Learn from Implicit Feedback” by Thorsten Joachims and Filip Radlinski of Cornell University. It’s a very accessible summary of the research those two have been doing in the last few years. To start off their research, they used eye-tracking experiments to characterize how people react to search engine rankings. They found that the ranking order strongly biases what people view and therefore click on. A result in the top ranking will often be clicked on more often than a better result in the second or third ranking, as some users may not even have looked at the results beyond the first ranking. A straightforward assumption that a click is the equivalent of a positive vote is therefore naive. Instead, they examine results that were not clicked on but should have. For example, if results at ranking 3 and 4 are clicked on, but not the result at ranking 2, then one can be sure that the result at ranking 2 is worse than the ones at ranking 3 and 4 and can use that knowledge to improve the search engine. Note that if the result at ranking 1 was clicked on, nothing new is learned. People are so biased towards clicking the first result that only if it was not clicked on would that be considered informative.

Under that model, they can interleave the results from two different search engines (or algorithms) and evaluate which one is better based on users’ clickthroughs. This insight led them to develop a ranking SVM model to learn search engine rankings. The new algorithm was shown to create a better meta-search engine as well as a better domain-specific search engine.

November 15, 2007

Data mining doesn’t cure stupidity

Filed under: Datamining, People and Data, Privacy — chucklam @ 3:48 am

Data mining, when done correctly, can improve understanding and provide insight, but data mining just doesn’t work under stupid assumptions. Check out the following paragraph in a Wall Street Journal blog. Apparently some FBI agents assume hummus sales to be predictive of terrorist activity.

The FBI obtained and mined sales data that San Francisco-area grocery stores collected in 2005 and 2006, according to CQ Politics. The agents were looking for a sudden spike in hummus sales that might indicate an Iranian sleeper cell in the Bay Area. An FBI higher-up killed the program, CQ Politics reports, on the grounds that it might be illegal and that it was, well, just ridiculous.

November 14, 2007

Yahoo! Announces Distributed Computing Academic Program

Filed under: Infrastructure — chucklam @ 2:06 am

Story via Read/WriteWeb.

Yahoo!… announced an academic research partnership with Carnegie Mellon University that will give students access to Hadoop and other open source tools running in a supercomputing-class data center. The data center… is a 4,000-processor cluster supercomputer with 3 terabytes of memory and 1.5 petabytes of diskspace… CMU and Yahoo! also plan to hold a Hadoop Summit in the first half of 2008…

This is so cool. I just hope the summit will be out here in Silicon Valley rather than at CMU.

(For more background on Hadoop and its extensions, see my blog post here.)

reCAPTCHA gone awry…

Filed under: Data Collection, Pattern recognition, People and Data — chucklam @ 12:33 am

I think reCAPTCHA is a very clever idea to layer data collection on top of an authentication system. However, sometimes the security check is just a bit too puzzling. I came across this today on Facebook. How am I suppose to type the answer in?? 😉


November 9, 2007

List of accepted papers for WSDM’08

The first ACM conference on Web Search and Data Mining (WSDM), to be held at Stanford on Feb. 11-12, has released its list of accepted papers. A total of 24 papers will be presented. The following ones already sound interesting based on their titles.

  • An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising – Anindya Ghose and Sha Yang
  • Ranking Web Sites with Real User Traffic – Mark Meiss, Filippo Menczer, Santo Fortunato, Alessandro Flammini and Alessandro Vespignani
  • Identifying the Influential Bloggers – Nitin Agarwal, Huan Liu, Lei Tang and Philip Yu
  • Can Social Bookmarks Improve Web Search? – Paul Heymann, Georgia Koutrika and Hector Garcia-Molina
  • Entropy of Search Logs: How Hard is Search? With Personalization? With Backoff? – Qiaozhu Mei and Kenneth Church

November 7, 2007

MySpace looking for a search architect

Filed under: Information Retrieval, Search — chucklam @ 2:46 pm

Not sure if I should read too much into this, but MySpace is currently looking for a search architect. The candidate will have “years of experience and expertise in high volume storage, indexing, and searching to help architect, scale, and optimize the engines and API’s behind vertical search applications. Knowledge of search technologies such as Lucene, Lucene.Net and Xapian is important…”

November 5, 2007

Panel on Web 3.0

Filed under: Collective wisdom, Data Collection, People and Data — chucklam @ 1:04 am

Not sure if I’ll have time to go to this, but this seems like the one event to figure out what “Web 3.0” is about.

The MIT/Stanford Venture Lab presents:
“Web 3.0: New Opportunities on the Semantic Web”

Date: Tuesday, November 20
Time: 6:00 PM
Location: Bishop Auditorium, Stanford University

* Robert Cook, Co-founder and Executive VP of Product Development, Metaweb
* Nova Spivack , CEO and Founder, Radar Networks
* Alex Iskold , CEO and Founder, Adaptive Blue
* Paul Kedrosky , Venture Partner, Ventures West

We are well into the current era of the Web, commonly referred to as Web 2.0. What lies on the horizon? Will Web 3.0 usher in the long awaited vision of the semantic web, as proposed by “Father of the Web” Tim Berners-Lee more than ten years ago?

Join us for a lively panel session where some of the best emerging companies in the semantic web space present their different approaches to realizing the vision. The panel will address questions such as: How can we best implement the vision of the semantic web? What will we do with the web once it is structured with semantic information? What new applications will appear? Where is the consumer value and how should it be marketed? What new businesses can be built on top of the semantic web that are not possible today? Will the semantic web ultimately bring about a new intelligence that surpasses that of humanity, sparking a new era of non-biological evolution?

Join us and bring questions of your own – help us uncover the future of the web!

Cost: $30 pre-registered; $40 at the door

More info:

Older Posts »

Create a free website or blog at