Did you mean...?

Just an example of how search engines use statistics to help them "understand" words. Until today, iPad wasn't a word (or at least, it wasn't a name). If you go to Google (for now at least) and search for ipad, you see something like this:

iPad appears very infrequently but is similar to a word that appears very frequently - "iPod". iPad appears infrequently enough and the discrepancy between the two is large enough, that Google assumes I probably made a mistake typing iPad. Google's actual algorithm for spelling corrections is more complicated than this, but this is the basic idea behind spelling correction in search engines. As iPad starts to show up all over the web, Google will stop making that suggestion since iPad will become more plausible (statistically) as a word. Or maybe somebody at Google will just add iPad to an exception list so it stops making the suggestion.

As a side note (no pun intended) the sponsored ads on the side of the page are all for the iPAQ as various vendors have put in bids to get listed for ipad as a misspelling of ipaq. In this case, I'm fairly sure, these are strictly companies that asked their ads to be shown for the word ipad (as well as ipaq).

Comments:

I stumbled upon this post because I was wondering whether exactly this would have happened, but Google has caught on since this post was released :)

I know that Google does not ever have a list of exceptions that they manually include - they are purists to their algorithm, and will stick to it (or update it if necessary). But it was probably automatically dealt with it as news sources came out and got indexed by Google.

Good post :)

Posted by Collin Li on January 27, 2010 at 04:37 PM EST #

Given all languages and phrases, what is a good algorithm for detecting a significant deviation in a sequence of words compared to all previously encountered sequences?

Posted by Kristofer Pettersson on February 07, 2010 at 10:02 PM EST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jeff Alexander is a member of the Information Retrieval and Machine Learning group in Oracle Labs.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Feeds