About Search Engines
   
      There are others besides Google
 
  right reading news service
     
   
 

homeward bound

 

 

The main thing to know about search engines is that they all index different pages, and none of them gets close to indexing the whole web. Therefore, if you're really trying to research something it's worth using more than one. Don't let Google be your only window to the world. (Although image searches can be useful, I am concentrating on regular text searches in the comments below.)

GOOGLE (www.google.com)
Google probably has the largest database, and it has a nice spare interface. It does have some downsides, however. It seems slower to index new sites than many of its competitors (web designers complain about a "sandbox" effect that can suppress new sites from ranking high in results for as much as a year). Also, I haven't seen any comment on this, but its popularity-based ranking system tends to "mainstream" its results (remember how popularity-based rankings were a feature of high school?). So alternative viewpoints may be downgraded using Google. And since high web popularity requires a lot of back links, this additionally skews google toward older sites. Some say Google also has a built-in bias to commercial sites.

Google's exact mechanisms are secret, but in essence it consists of three elements:

1. Googlebot, its web crawler. The crawler does not actually scan pages, but simply retrieves them for the indexer. How frequently the crawler visits pages seems to be based mostly on page rank and frequency of content change.
2. An Indexer, which sorts the words on the page and places them in an enormous database (said to include more than 8 billion urls). The indexer is constantly making small adjustments as Googlebot retrieves new information: this is called "everflux." Changes in search results based on everflux tend to be minor and unstable. Approximately once a month Google updates its index, resulting in more significant and lasting changes: this is called the "Google dance."
3. A Query Processor, which relates searches to the index. Goggle does not reveal exactly how this is done, and in fact the formula used seems to be constantly changing. It is known, however, that it considers such things as page popularity, position of search terms in the page, proximity of search terms to each other, and order of search terms. It's also said that the size of the search item is considered (a head is worth more than plain text), but this seems so stupid I can't believe it's given much weight.

ASK (www.ask.com)
Teoma has merged with Ask (of Ask Jeeves; Jeeves has bit the dust). Ask's database is smaller than Google but it tends to yield relevant results, which can be further refined using subtopics. I've noticed that since the merge it is difficult to distinguish sponsored links from regular ones, which is disappointing.

LIVE (www.live.com)
As of today (13 Sep 2006) msnsearch is now livesearch. The evil empire has actually produced a fair search engine. It's very fast and quick to update its index. It doesn't have the advanced search features that Google offers. The most disappointing thing about microsoft's search engine is how slow it has been to evolve. It's algorithims still seem a little crude. "Keyword.blogspot.com" is almost guanateed a top ranking for keyword, regardless of how thin and unoriginal the content is on the page. MSN is not gaining market traction. It accounts for only 4% of searches at rightreading.com.

YAHOO SEARCH (www.yahoo.com)
In 2002 Yahoo switched its default from directory to search engine, until 2004 simply using Google to deliver search results, but it now uses its own search engine. Yahoo is reputed to search deeper within pages than Google. It's also said to be good at recognizing spam sites as a result of its experience with Yahoo Mail, which is widely used.

CLUSTY (www.clusty.com) and VIVISIMO (www.vivisimo.com)
Clusty "clusters" search results by grouping similar results together. I haven't tested it much yet. Vivisimo combines clustering with metasearching (see below). From preliminary tests Vivisimo's results seem good.

ALTAVISTA, ALL THE WEB, LYCOS
AltaVista was once a leading search engine, but it took a wrong turn when it tried to recast itself as a portal site. The fall of Lycos shows how things have changed since the early days of the web. All of these engines were purchased by Yahoo.

METASEARCHES (Dogpile, Ixquick)
Meta Search engines such as Dogpile and Ixquick collect top results from multiple search engines. This can be convenient if you're in a hurry and want to go beyond a single engine such as Google, but obviously the results are simply shadows. It seems to me Ixquick has overtaken Dogpile, and it is worth trying. See also Vivisimo, above.

DIRECTORIES
Directories are a good alternative to search engines in some cases. Examples include:

 

*

 

 
top of page
 
home