The Hakia blog has an interesting post on creating query sets to test semantic (i.e. “natural language”) search. The main point is that the query set must have sufficient coverage to test four aspects of natural language search: query type, query length, content type, and word sense disambiguation.
Testing of a semantic search engine requires at least 330 queries just to scratch the surface. hakia’s internal fitness tests, for example, use couple of thousand queries. Therefore, if you see any report or article about the evaluation of a search engine using a dozen of queries, even if it includes valuable insight, it will tell you nothing about the overall state of that search engine.
The distribution of test queries will ultimately be based on real usage, which of course is limited at this point. In general, the more sophisticated function a system is supposed to perform, the more test cases one needs to evaluate it. For something as sophisticated as natural language search (or even plain keyword search), testing is definitely non-trivial.
Powerset’s first public demo was limited in all four of those aspects. Of course, keep in mind that the demo was just to give the public a peek at Powerset’s search technology, not an exhaustive evaluation of it. It was also meant to illustrate the difference between Powerset’s approach and Google’s approach, not to compare and contrast Powerset with Hakia. Personally, I’m just glad that both startups are starting to reveal some of their inner workings.