The
main thing to know about search engines is that they all index different
pages, and none of them gets close to indexing the whole web. Therefore,
if you're really trying to research something it's worth using more
than one. Don't let Google be your only window to the world. (Although
image searches can be useful, I am concentrating on regular text searches
in the comments below.)
GOOGLE
(www.google.com)
Google probably has the largest database, and it has a nice spare interface.
It does have some downsides, however. It seems slower to index new sites
than many of its competitors (web designers complain about a "sandbox"
effect that can suppress new sites from ranking high in results for
as much as a year). Also, I haven't seen any comment on this, but its
popularity-based ranking system tends to "mainstream" its
results (remember how popularity-based rankings were a feature of high
school?). So alternative viewpoints may be downgraded using Google.
And since high web popularity requires a lot of back links, this additionally
skews google toward older sites. Some say Google also has a built-in
bias to commercial sites.
Google's
exact mechanisms are secret, but in essence it consists of three elements:
1. Googlebot,
its web crawler. The crawler does not actually scan pages, but
simply retrieves them for the indexer. How frequently the crawler
visits pages seems to be based mostly on page rank and frequency of
content change.
2. An Indexer, which sorts the words on the page and places
them in an enormous database (said to include more than 8 billion
urls). The indexer is constantly making small adjustments as Googlebot
retrieves new information: this is called "everflux." Changes
in search results based on everflux tend to be minor and unstable.
Approximately once a month Google updates its index, resulting in
more significant and lasting changes: this is called the "Google
dance."
3. A Query Processor, which relates searches to the index.
Goggle does not reveal exactly how this is done, and in fact the formula
used seems to be constantly changing. It is known, however, that it
considers such things as page popularity, position of search terms
in the page, proximity of search terms to each other, and order of
search terms. It's also said that the size of the search item is considered
(a head is worth more than plain text), but this seems so stupid I
can't believe it's given much weight.
ASK
(www.ask.com)
Teoma has merged with Ask (of Ask Jeeves; Jeeves has bit the dust).
Ask's database is smaller than Google but it tends to yield relevant
results, which can be further refined using subtopics. I've noticed
that since the merge it is difficult to distinguish sponsored links
from regular ones, which is disappointing.
LIVE
(www.live.com)
As of today (13 Sep 2006) msnsearch is now livesearch. The evil empire has actually produced a fair search engine. It's
very fast and quick to update its index. It doesn't have the advanced search features that Google offers. The most disappointing thing about microsoft's search engine is how slow it has been to evolve. It's algorithims still seem a little crude. "Keyword.blogspot.com" is almost guanateed a top ranking for keyword, regardless of how thin and unoriginal the content is on the page. MSN is not gaining market traction. It accounts for only 4% of searches at rightreading.com.
YAHOO
SEARCH (www.yahoo.com)
In 2002 Yahoo switched its default from directory to search engine,
until 2004 simply using Google to deliver search results, but it now
uses its own search engine. Yahoo is reputed to search deeper within
pages than Google. It's also said to be good at recognizing spam sites
as a result of its experience with Yahoo Mail, which is widely used.
CLUSTY
(www.clusty.com) and VIVISIMO (www.vivisimo.com)
Clusty "clusters" search results by grouping similar results
together. I haven't tested it much yet. Vivisimo combines clustering
with metasearching (see below). From preliminary tests Vivisimo's results
seem good.
ALTAVISTA,
ALL THE WEB, LYCOS
AltaVista was once a leading search engine, but it took a wrong turn
when it tried to recast itself as a portal site. The fall of Lycos shows
how things have changed since the early days of the web. All of these
engines were purchased by Yahoo.
METASEARCHES
(Dogpile, Ixquick)
Meta Search engines such as Dogpile and Ixquick collect top results
from multiple search engines. This can be convenient if you're in a
hurry and want to go beyond a single engine such as Google, but obviously
the results are simply shadows. It seems to me Ixquick has overtaken
Dogpile, and it is worth trying. See also Vivisimo, above.
DIRECTORIES
Directories are a good alternative to search engines in some cases.
Examples include: