Data Strategy

September 6, 2007

Spelling corrector that learns from query logs

Filed under: Information Retrieval — chucklam @ 1:59 am

I had previously written a post pointing to a Peter Norvig (Director of Research at Google) article on how to write a statistical spelling corrector. My post noted that Peter’s article didn’t explain how spelling suggestions by search engines (such as Google) learn from query logs. Well, it turns out that Silviu Cucerzan and Eric Brill over at Microsoft had already published a great paper in 2004 at EMNLP called Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users (pdf). It explains some of the unique challenges of spelling correction in search queries. For example, new and unique (but correct) terms become query terms all the time, so one can’t just construct a dictionary of correct spellings. Simple frequency counting also doesn’t work as certain misspelled queries (“britny spears”) occur very often. A misspelled query may be composed of correctly spelled terms (“golf war”). Fortunately, Silviu and Eric show how clever use of the query log can overcome these and other problems.

Advertisements

1 Comment »

  1. Spelling corrector that learns from query logs

    Chuck Lam, at Data Strategy, reports an interesting pointer to a scientif paper published by Google explaining how their spelling corrector automatically extracts information from the logs to identify misspellings.

    Trackback by Intelligent Machines — September 7, 2007 @ 6:03 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: