It is written by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, all noted experts in statistical natural language processing and information retrieval. (Chris and Prabhakar are both professors at Stanford. Prabhakar is also Head of Yahoo! Research.) The book will be published by Cambridge University Press sometime in 2008. Fortunately, for those of us who can’t wait, advance draft of the book is available at www.informationretrieval.org.
The book will be a welcome introduction to building today’s information retrieval system. It’s the first coherent textbook that incorporates in one place techniques that tend to be taught in different areas. It covers basic topics in classical IR, such as indexing, vector space model, and relevance feedback. It also covers techniques of machine learning and statistical analysis, such as Naive Bayes, support vector machines, clustering, and latent semantic indexing. Finally, it includes web-specific techniques such as crawling and link analysis. It’s intended to be a texbook for advanced undergraduates and should be a useful addition to a search engine practitioner’s library as well.