What's That Noise?! [Ian Kallen's Weblog]

All | LAMP | Music | Java | Ruby | The Agilist | Musings | Commute | Ball
Main | Next day (Jun 30, 2007) »

20070629 Friday June 29, 2007

Power Bet

Powerset Last night I was among an invited group that Powerset brought in to witness how their natural language search sausage is made. It was actually kinduva cold cut platter: not exactly a meal but an interesting variety was offered for consumption.

When I was a kid, I thought that by 2007 we'd all have flying cars, rocket packs and computers would be all-seeing/all-knowing accoutrements on our wrists. I think all of us who ever watched Scotty verbally ask the Enterprise questions and get responsive answers in English sentences has had hunger pangs for satisfying natural language search. Powerset is trying to advance human-computer interfaces a little closer to that satisfaction, leap frogging previous efforts, by licensing Xerox PARC's technology and hiring a buncha heavy hitters to make it real.

Powerset COO Steve Newcomb introduced some of the sluggers in their line-up, walked attendees through the thinking behind their PR and release strategy and provided a peek into their search capabilities.

Among the impressive powersetters are people who have been-there/done-that with scaled-up search such as x-Yahoo!'s Chad Walters and Tim Converse (read Tim's post the other day about term proximity and linguistics, great stuff), as well as experts in natural language search with backgrounds at PARC and Ask Jeeves. As a company, they're not just-another-web2.0 rails app built by 2 guys and trying to get to the next level. Powerset is more of a bold bottled-lightning science experiment embracing ruby n' rails as a way to get it in front of people.

Powerset has signed up 10K people since announcing the availability of updates and previews on PowerLabs a few weeks ago. Newcomb characterized their labs preview effort as a way to use social software to guide product management decisions, "a mashup of Digg, Facebook and Google apps." I'm a big fan of transparency and community inclusion, it will be interesting to see how inclusive/closed this effort is.

OK, so after all of that, the "Where's the beef?" moment arrived. A side-by-side comparison interface was demonstrated with Powerset results on the left and Google results on the right. Explaining that the test index was scoped to Wikipedia, the goog results were similarly scoped down. The Powerset use case was demonstrated with a query like "What politicians were killed by disease?" On goog, the results are matching terms (and variants on their stems), "politicians", "killed" and "disease". Powerset matches semantically similar tokens and their grammatical relationships.

So Powerset's top result for that query highlighted Sir Edward Heath died from pneumonia on Wikipedia's page for Edward Heath. Highlighting a completely different snippet (none of the query terms were matched but the semantics were) that accurately answers the query is very impressive. Powerset is using Freebase's ontology and WordNet's synonym mappings to connect indexed sentence structures to the query. They do all of this analysis and mapping at index time, which undoubtedly raises the cost of indexing tremendously. They're making a big bet that the raised search results quality will pay those costs back.

When asked about the computational horsepower required to index web documents with the sentence structure decomposition and semantics mappings, Newcomb hedged at first ("Barney's gonna kill me", referring to CEO Barney Pell). But alas, he convinced himself (or did a good job method-acting conviction) that it was safe to reveal that it takes them about a second to grammatically analyze and index a typical document. Lamenting again about his confession, someone from the audience quipped the query, "Which CEO killed Steve Newcomb?" Yea, he didn't search their index for that.

On the subject of Google comparisons, Newcomb kinda squirmily described Powerset as reverent of ("not cocky about") what Google has accomplished but taking a different approach to web search. Doing side-by-side comparisons with Google as their demo does is pretty ballsy and it seems to get them in trouble; being positioned as a "Google killer" by their audience of search wonks and journalists when things are still very much at a proof-of-concept level seems rather premature. I think Powerset needs to reel that in lest they awaken a sleeping giant and fill him with a terrible resolve while they're still on the tarmac. If you've designed a new aircraft, you don't trumpet about revolutionizing aeronautics before the test pilots have taken off. Particularly if folks are proclaiming that Boeing is in trouble. When Powerset indexes a real web corpus, it will be interesting to see how successfully they can overlay web graph, clustering/disambiguation, time and other relevance components. I think that will provide a real moment-of-truth.

Powerset is making a big bet on natural language search as a transformative technology. They've got a lot of great people and a lot of great technology. All in all, the presentation felt a little dog-and-ponyish with the limited corpus but I'm looking forward to hearing more from them later this year when they release a major iteration. See also:

           

( Jun 29 2007, 10:41:46 AM PDT ) Permalink