|
Are semantic search engines passé?
by Keith Ng
12-Sep-08, 13:45
Technically, they should be superior to other online search mechanisms, but semantic search engines may have been too slow to make their mark when they had the chance.
Is Microsoft's US$100 million acquisition of semantic search engine Powerset going to revolutionise search as we know it?
In a word: no.
A semantic search engine tries to understand the words and sentences of a search query and of webpages. It works out the linguistic relationship between the different components to figure out their meaning. In other words, it reads.
They were touted as the future of search even before Google was around, but, unfortunately for them, that future was about five years ago.
If semantic search engines ‘read’, then the keyword-based search engines ‘count’, using statistical relationships to guess at meaning. This is the approach that Google uses, and it’s an inferior method in theory. But as it turns out, the finely-tuned combination of many statistical measures works remarkably well.
Rather than trying to work out what a word means and what it relates to, just knowing that it is often used along with certain other words allows Google to understand context. When you search for ‘butter chicken’, Google knows that you are looking for a curry, not a tub of butter or a free-range chicken.
Understanding relationships and context is the promise of semantic search engines, but that’s the problem. Google already does it extremely well, while Powerset’s technology doesn’t even fully work yet. Powerset has managed to dazzled some pundits with its test platform, which produces high-quality, relevant results most of the time. Which is all very impressive until you realise that it only searches Wikipedia and Freebase. These are well-structured, well-defined databases, written in a formal, standardised style - a monkey with a Rolodex could find high-quality, relevant search results from these sources.
When comparing Wikipedia- or Freebase-only searches, Google still arguably does better.
The Factz function in Powerset shows the real failings of the semantic engine, say experts. For example, a search for Beijing showed 14 semantic relationships. These are supposed to be facts that the search engine understands about a given subject. But only two of those (‘Beijing hosted Olympics’ and ‘Beijing hosted Paralympics’) were actually useful.
The rest were relationships such as 'Beijing-hosted event', or 'Beijing followed population'. It also knows that Bill Gates 'founded Paul Newman' and ‘made vice- president’. The vast majority of the semantic relationships discovered by Powerset were, in fact, junk.
Another historical issue that Powerset deals with is natural-language search - rather than searching for 'iPhone 3G price', it wants users to ask: 'What is the price of a 3G iPhone?'. It gives the semantic engine more information to work with, but it will also require a substantial change in user behaviour.
Writing in sentences might have been the natural tendency for users back in the late 90s, but after a decade of the internet, are there still internet users who are uncomfortable with making search engine queries?
There is potential for semantic search to outstrip statistics-based keyword search one day, and its ability to extract facts from webpages might turn into more useful machine-generated content in the future.
In the meantime, it has the potential to improve Microsoft’s search capabilities and contextual ad targeting, giving Microsoft a real chance to compete with Google AdSense.
Assuming it can get Powerset to work like it’s supposed to.
The upshot
- Keep an eye on Microsoft’s ad solutions - the Powerset technology is bound to be incorporated into that service.
- Don’t be fooled by declarations that semantic search is the future. It’s been the future since the 90s.
- Don't bother trying to optimise for a semantic search engine yet. It's likely to change dramatically before it goes live.
|