A History of Search Engines
PLEASE contact the author
before reproducing any of this material,
or if you know of any additional information not mentioned here. Thank
you.
Quick Index:
Prologue
In the beginning, before the creation of the Galaxy,
there was the Wanderer...
Spinning a lingering web in its wake
were the robots, spiders, Worms(WWW), and Crawlers(Web)...
Bringing order to the Chaos of the newly formed Galaxy
were a bunch of Yahoos.
The wisdom and greatness of the elder Yahoos has gone
unquestioned and unchallenged for an eternity...
the Seekers of Information could not
obtain all of the great one's secrets,
nor could the shockwaves from the great Excitement rattle
the sacred altar of Yahoo!.
But now, there is a new force in the universe...
the Hotomi Bot(1) threatens
to tear apart the very fabric of the web!
Personal Sob Story Introduction
My earliest encounter with the World Wide Web was during my freshman year
at MIT. I was working late one night at the Media Lab, a lowly research
assistant trying to learn the ropes of a technology that nobody outside
of the lab, let alone myself, knew anything about. Suddenly an e-mail from
one of the other late-night vigilantes (it was 4 A.M., but most of the workstations
were occupied) came across the lab network. The message was about some new
global document-sharing network. By adding a few header tags to research
papers, the message claimed, one could publish them in such a way that scientists
in other labs all around the world could access them easily. I hadn't written
anything worth reading yet, and I'd already read enough cryptic research
papers for one night, so I deleted the e-mail and starting searching the
file server for cool games.
That was around April or May of 1993. About that same time, www.mit.edu
went online as one of the first 100 web servers in the world. Naturally,
this was not MIT's official homepage(2), because at that
point, nobody had homepages. It was actually a server set up by a bunch
of students that collectively called themselves SIPB (the Student Information
Processing Board). Their pages provided a central starting place for exploring
MIT's web sites, providing helpful information for "surfers" who were
still confused by the whole concept.(3)
Bow down and give thanks to Archie
The grandfather of all search engines was Archie, created in 1990 by Alan
Emtage, a student at McGill University in Montreal. The author originally
wanted to call the program "archives," but had to shorten it to comply with
the Unix world standard of assigning programs and files short, cryptic names
such as grep, cat, troff, sed, awk, perl, and so on. For more information
on where Archie is today, see:
http://www.bunyip.com/products/archie/
At the early date of 1990, there was no World Wide Web. Around this
time, Tim Burners-Lee probably had a bad dream in which a scary monster
with "HTTP" etched into its hide slowly ate up all of the Earth's resources.
Nonetheless, there was still an Internet, and many files were scattered
all over the vast network.
The primary method of storing and retrieving files was via the File
Transfer Protocol (FTP). This was (and still is) a system that specified
a common way for computers to exchange files over the Internet. It works
like this: Some administrator decides that he wants to make files available
from his computer. He sets up a program on his computer, called an FTP
server. When someone on the Internet wants to retrieve a file from this
computer, he or she connects to it via another program called an FTP client.
Any FTP client program can connect with any FTP server program as long
as the client and server programs both fully follow the specifications
set forth in the FTP protocol.
Initially, anyone who wanted to share a file had to set up an FTP server
in order to make the file available to others. Later, "anonymous" FTP
sites became repositories for files, allowing all users to post and retrieve
them.
Even with archive sites, many important files were still scattered on
small FTP servers. Unfortunately, these files could be located only by
the Internet equivalent of word of mouth: Somebody would post an e-mail
to a message list or a discussion forum announcing the availability of
a file.
Archie changed all that. It combined a script-based data gatherer, which
fetched site listings of anonymous FTP files, with a regular expression
matcher for retrieving file names matching a user query. (4)
In other words, Archie's gatherer scoured FTP sites across the Internet
and indexed all of the files it found. Its regular expression matcher
provided users with access to its database.
Veronica and Jughead - but where is Betty?
Gopher is like FTP, but for documents instead of files. Gopher servers contain
plain-text documents (no images, no hypertext) that can be retrieved. Archie's
popularity had grown such that in 1993, the University of Nevada System
Computing Services group developed Veronica(5) (the grandmother
of search engines). It was created as a type of searching device similar
to Archie but for Gopher files. Another Gopher search service, called Jughead,
appeared a little later, probably for the sole purpose of rounding out the
comic-strip triumvirate. Jughead is an acronym for Jonzy's Universal Gopher
Hierarchy Excavation and Display, although, like Veronica, it is probably
safe to assume that the creator backed into the acronym. Jughead's functionality
was pretty much identical to Veronica's, although it appears to be a little
rougher around the edges.
The lone Wanderer
If Archie was the grandfather of search tools and Veronica the grandmother,
their child, and thus the mother of all search engines, was Matthew Gray's
World Wide Web Wanderer. The Wanderer was the first robot on the web and
was designed to track the web's growth. Initially, the Wanderer it counted
only Web servers, but shortly after its introduction, it started to capture
URLs as it went along. The database of captured URLs became the Wandex,
the first web database.
Matthew Gray's Wanderer created quite a controversy at the time, partially
because early versions of the software ran rampant through the Net and
caused a noticeable netwide performance degradation. This degradation
occurred because the Wanderer would access the same page hundreds of time
a day. The Wanderer soon amended its ways, but the controversy over whether
robots were good or bad for the Internet remained.
| What's a Robot
got to do with the Internet?
The term robot has
special significance to programmers. Their version of the term is
mostly unrelated to the metallic lumbering creatures of Asimov lore.
A synonym for robot "automaton" is actually more enlightening. Computer
robots are programs that automatically perform a repetitive task at
speeds that would be impossible for humans to match, just like the
tasks today's robots perform in factories.
On the Internet, the term robot or bot has become
a bit broader. For the most part, it refers to programs that explore
the Internet for some sort of information. Web robots search the
Internet for web pages, usually for the purpose of compiling a large,
searchable database. This category of robot is often called a spider.
The spider robot falls right into the standard definition of performing
a repetitive task.
Other types of robots on the Internet push the interpretation
of the automated task definition. The chatterbot variety
is a perfect example. These robots are designed to communicate with
humans about some topic in a human-like manner. Some of them are
fairly convincing; others are obviously quickly written computer
programs. Chatterbots are sometimes used as an intuitive way to
communicate certain basic information to users. An example is the
milk robot, which can answer lots of questions about milk.
One could force this type of program into the definition above by
saying that it performs the repetitive task of communicating with
clueless people.
|
The ALIWEB Strikes Back!
In response to the Wanderer, Martijn Koster created Archie-Like Indexing
of the Web, or ALIWEB, in October 1993. As the name implies, ALIWEB was
the HTTP equivalent of Archie, and because of this, it is still unique in
many ways. ALIWEB does not have a web-searching robot. Instead, webmasters
of participating sites post their own index information for each page they
want listed. The advantage to this method is that users get to describe
their own site, and a robot doesn't run about eating up Net bandwidth.
Unfortunately, the disadvantages of ALIWEB are more of a problem today.
The primary disadvantage is that a special indexing file must be submitted.
Most users do not understand how to create such a file, and therefore
they don't submit their pages. This leads to a relatively small database,
which meant that users are less likely to search ALIWEB than one of the
large bot-based sites. This Catch-22 has been somewhat offset by incorporating
other databases into the ALIWEB search, but it still does not have the
mass appeal of search engines such as Yahoo! or Lycos.
Invasion of the Spiders!
As the web grew, it became more and more difficult to sort through all of
the new web pages added each day. Matthew Gray’s Wanderer inspired a number
of programmers to follow up on the idea of web robots, or spiders, as they
are now called. These programs systematically scour the web for pages by
exploring all of the links on a starter site, which is a page that contains
many links to other pages. The concept was that by definition, every page
on the web must be linked to another page. By searching through a large
number of pages and following all of the links, a user will discover new
pages that have their own collection of links. The hope is that most of
the web can be explored through the continuous repetition of this process.
This process caused a great deal of controversy because some poorly
written spiders were creating huge loads on the network by repeatedly
accessing the same series of pages. Most network administrators thought
they were a bad thing, so naturally programmers created even more of them.
By December 1993, the web had a case of the creepy crawlies. Three search
engines powered by robots had made their debut: JumpStation, the World
Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider.
JumpStation’s web bot gathered information about the title and header
from Web pages and used a very simple search and retrieval system for
its web interface. The system searched a database linearly, matching keywords
as it went. Needless to say, as the web grew larger, JumpStation became
slower and slower, finally grinding to a halt.
The WWW Worm indexed only the titles and URLs of the pages it visited.
It used regular expressions to search the index. Results from JumpStation
and the Worm came out in the order that the search found them, meaning
that the order of the results was completely irrelevant. The RSBE spider
was the first to improve on this process by implementing a ranking system
based on relevance to the keyword string.5
The Easily Excitable Spider
The popular public search engine, Excite, has roots that extend rather far
back in the history of the web. Initially, the project was called Architext;
it was started by six Stanford undergraduates in February 1993. Their idea
was to use statistical analysis of word relationships in order to provide
more efficient searches through the large amount of information on the Internet.
Their project was fully funded by mid-1993. Once funding was secured.
they released a version of their search software for webmasters to use
on their own web sites. At the time, the software was called Architext,
but it now goes by the name of Excite for Web Servers.
Billions and billions of catagorized links...
Unfortunately, these spiders all lacked the intelligence to understand what
it was that they were indexing. Therefore, if you didn’t specifically know
what it was that you were looking for, it was unlikely that you’d find it.
This deficiency prompted the creation of EINet Galaxy, now know as the Tradewave
Galaxy, which is the oldest browsable/searchable web directory. Because
it is a directory, Galaxy links are organized into hierarchical categories.
For example, a top-level category might be called "Computers." Within the
Computers category there might be subcategories for "IBM," "Sun Microsystems,"
"Digital Equipment Corporation," and so on. Within each of these subcategories
would be further subcategories, although these would be more or less consistent
across the various machine types. As an example, all of the computer company
categories might contain the subcategories of "Hardware" and "Software."
This method of organization allows users to more effectively explore the
contents of the database by narrowing the field of interest.
The Galaxy went online in January 1994. It contained Gopher and Telnet
search features in addition to the web-searching features. Interestingly
enough, Gopher was vastly popular as a document-sharing tool when the
web was born. The Gopher search capability was probably the primary reason
for the creation of the EINet Galaxy. (There weren’t really very many
web pages to search through in January 1994!) The web page search capability
was simply an additional feature.
Through the present, Tradewave (www.tradewave.com) still clings to its
directory-based roots; it uses no bots or spiders to seek out new URLs.
Therefore, the Galaxy is a true directory in the sense that it lists only
URLs that have been submitted to it, and all categorization and review
of the submitted URLs is done by hand. This results in higher-quality
pages and more relevant searches, but far fewer pages to search through.
Yahoo! and a Yippity tai-yai-yay!
At this stage in the game, people were creating pages of links to their
favorite documents. In April 1994, two Stanford University Ph.D. candidates,
David Filo and Jerry Yang, created some pages that became rather popular.
They called the collection of pages Yahoo! Their official explanation for
the name choice was that they considered themselves to be a pair of yahoos.
As the number of links grew and their pages began to receive thousands
of hits a day, the team created ways to better organize the data. In order
to aid in data retrieval, Yahoo! (www.yahoo.com) became a searchable directory.
The search feature was a simple database search engine. Because Yahoo!
entries were entered and categorized manually, Yahoo! was not really classified
as a search engine. Instead, it was generally considered to be a searchable
directory. Yahoo! has since automated some aspects of the gathering and
classification process, blurring the distinction between engine and directory.
The Wanderer captured only URLs, which made it difficult to find things
that weren’t explicitly described by their URL. Because URLs are rather
cryptic to begin with, this didn’t help the average user. Searching Yahoo!
or the Galaxy was much more effective because they contained additional
descriptive information about the indexed sites.
Brian's WebCrawler: Some Spider!
As bots got better and better, one rose above the pack with it’s unique
ability to index the entire text of a web page. Other bots were storing
the title and the URL, and the first 100 or so words of a document, but
it was WebCrawler that first allowed the user to search the full text of
entire documents.
The history of WebCrawler is best told by those responsible:
"In early 1994, students and faculty in the Department
of Computer Science and Engineering [of the University of Washington]
gathered in an informal seminar to discuss the early popularity of
the Internet and the World-Wide Web. Students typically try out their
ideas in small projects in these seminars, and several interesting
projects were started. The WebCrawler was Brian Pinkerton's project,
and began as a small single-user application to find information on
the Web.
Fellow students persuaded Pinkerton to build the Web interface
to the WebCrawler that became widely usable. In that first release
on April 20, 1994, the WebCrawler's database contained documents
from just over 6000 different servers on the Web. The WebC rawler
quickly became an Internet favorite, receiving an average of 15,000
queries per day in October, 1994 when Pinkerton delivered a paper
describing the WebCrawler."
|
Eventually, the demand for WebCrawler devastated the network resources at
the University of Washington. Although a number of companies invested in
server equipment to ease the load on the WebCrawler servers, there was no
solution to the bandwidth issue. At one point, the service became entirely
unusable during the daytime hours. Finally, America Online (AOL) saved the
day by purchasing the WebCrawler system and running it on its own network.
In 1997, Excite bought out WebCrawler, and now AOL is using an Excite derivative
as the engine behind its own NetFind.
The most important point about WebCrawler is that it was the first full-text
search engine on the Internet. Until its debut, a user could search through
only URLs or descriptions. The descriptions were sometimes created by
the engines themselves or reviewers trying to rate the sites.
A final word about WebCrawler from the company itself: "Several competitors
emerged within a year of WebCrawler’s debut: Lycos, Infoseek, and OpenText.
They all improved on WebCrawler’s basic functionality, though they did
nothing revolutionary. WebCrawler’s early success made their entry into
the market easier, and legitimized businesses that today constitute a
small industry in Web resource discovery."(www.webcrawler.com)
Mellon-Mania: The Birth of Lycos
Lycos was indeed the next big kid on the block, bursting out of the labs
at Carnegie Mellon University during the July of 1994. The person responsible
for unleashing this force onto the world is Michael Mauldin. He is currently
on leave from CMU, acting as Chief Scientist at Lycos, Inc. In a paper describing
design decisions made while programming Lycos, he gives a very nice history
of the service.
"Work on the Lycos spider began in May 1994, using John
Leavitt's LongLegs program as a starting point. (Lycos was named for
the wolf spider, Lycosidae lycosa, which catches its prey by pursuit,
rather than in a web.) In July 1994, I added the Pursuit retrieval
engine to allow user searching of the Lycos catalog (although Pursuit
was written from scratch for the Lycos project, it was based on experience
gained from the ARPA Tipster Text Program in dealing with retrieval
and text processing in very large text databases (9) ). On July 20,
1994, Lycos went public with a catalog of 54,000 documents. In addition
to providing ranked relevance retrieval, Lycos provided prefix matching
and word proximity bonuses. But Lycos' main difference was the sheer
size of its catalog: by August 1994, Lycos had identified 394,000
documents; by January 1995, the catalog had reached 1.5 million documents;
and by November 1996, Lycos had indexed over 60 million documents
-- more than any other Web search engine. In October 1994, Lycos ranked
first on Netscape's list of search engines by finding the most hits
on the word ‘surf.’"(6) |
Hide and Seek
Representatives of Infoseek, another major search engine, say that they
founded their corporation in January 1994. Although this may be true, the
search engine itself was not accessible until much later that year.
Initially, Infoseek was just another search engine. It borrowed conceptually
from Yahoo! and Lycos, not really innovating in any particular way. Yet
the history of Infoseek and its current critical acclaim show that being
the first or most original isn’t always that important. Infoseek’s user-friendly
interface and the numerous additional services (such as UPS tracking,
News, a directory, and the like) have garnered kudos, but it was Infoseek’s
strategic deal with Netscape in December 1995 that brought it to the forefront
of the search engine line. Infoseek convinced Netscape (with the help
of quite a bit of cash) to have its engine pop up as the default when
people hit the Net Search button on the Netscape browser. Prior to this,
Yahoo! was Netscape’s default search service.
Return of the DEC
Digital Equipment Corporation’s (DEC) AltaVista was a latecomer to the scene;
it had its online debut in December 1995. Nonetheless, it had a number of
innovative features that quickly catapulted it to the top. The least of
the features was its speed. Run on a bunch of DEC Alphas, it had the horsepower
to handle millions of hits per day without slowing down in the slightest.
The rest of its features, all available from introduction, changed the
face of search engines forever. AltaVista was the first to use natural
language queries, meaning a user could type in a sentence like "What is
the weather like in Tokyo?" and not get a million pages containing the
word "What." Additionally, it was the first to implement advanced searching
techniques, such as the use of Boolean operators (AND, OR, NOT, etc.).
Furthermore, a user could search newsgroup articles and retrieve them
via the web as well as specifically search for text in image names, titles,
Java applets, and ActiveX objects. Additionally, AltaVista claims to be
the first search engine to allow users to add to and delete their own
URLs from the index, placing them online within 24 hours.
One of the most interesting new features AltaVista provided was the
ability to search for all of the sites that link to a particular URL.
This was very useful for web designers who were trying to get some popularity
for their pages; they could frequently check to see how many other pages
were referencing them.
On the user interface end, AltaVista made a number of innovations. It
put "tips" below the search field to help the user better formulate a
search. These tips constantly change, so that after using the search for
a few times, users see a number of interesting features that they possibly
did not know about. This system became widely adopted by the other search
engines.
In 1997, AltaVista created LiveTopics, a graphical representation system
to help users sort through the thousands of results that a typical AltaVista
search generates. LiveTopics is interesting as a search tool, but conceptually
it is more confusing than the standard search format. Although its innovative
qualities are uncontested, its effectiveness remains to be seen (altavista.software.digital.com/search/showcase/two/index.htm).
A Spider Named "Slurp!": The Powerful HotBot
On the May 20, 1996, Inktomi Corporation was formed, and HotBot was unleashed
upon the world. This is the youngest of all of the major search services,
but even at its young age, it has already caused quite a stir in the online
community. According to the company: "Pronounced ‘ink-to-me’, the company
name is derived from a mythological spider of the Plains Indians known for
bringing culture to the people. Inktomi was founded in January 1996 by Eric
Brewer, an assistant professor of computer science at the University of
California at Berkeley, and Paul Gauthier, a graduate student in the computer
science Ph.D. program, with a desire to commercialize the highly-effective
technologies developed during their research. (www.inktomi.com/press/icf-pr.html)"
The Inktomi search engine was quickly licensed to Wired magazine’s web
site, HotWired. This site’s popularity accounted for much of the initial
fervor over HotBot. Wired’s reputation as the oracle of the Net made promoting
the site fairly straightforward.
So what’s the big deal? Just another search engine? Well, yes and no.
HotBot is probably the most powerful of the search engines, with a spider
that can supposedly index 10 million pages per day. According to the Wired
web site, HotBot should soon be able to reindex its entire database on
a daily basis. This will ensure that the pages returned from a search
are not out of date, which is now common with other search engines.
Additionally, HotBot makes extensive use of cookie technology to store
personal search preference information. A cookie is a small file that
a site can store on your computer. This file can be read only by the site
that generates it. It can hold a small amount of text or binary information.
This information is often used by sites to store customization information
or to store user demographic data.
HotBot recently won the PC Computing Search Engine Challenge, a contest
between the major search engines. Representatives from each company were
asked questions that could be answered only by a web search. The engine
that most effectively led the representative to the right answer won the
question. Although this challenge proved very little more than the searching
abilities of the various representatives, it still garnered quite a bit
of critical acclaim for HotBot, further increasing its popularity.
Information Overload: METAbolic Shutdown
What the PC Computing Challenge did show was that different engines pull
up completely different sets of materials for similar searches. This makes
it extremely frustrating to find what you want on the web, because a query
that has little effect using one engine may turn up a gold mine of information
on another. Additionally, the little differences between the engines, especially
regarding the support of Boolean operators, has a large impact on the type
of query format that works most effectively.
The current solution to this problem is the META engine. META engines
forward search queries to all of the major web engines at once. The first
of these engines was MetaCrawler. MetaCrawler searches Lycos, AltaVista,
Yahoo!, Excite, WebCrawler, and Infoseek simultaneously.
MetaCrawler was developed in 1995 by Eric Selburg, a Masters student
at the University of Washington (the same place where WebCrawler was developed
a few years earlier). Like WebCrawler, MetaCrawler soon grew too large
for its university britches and had to be moved to another site. Here,
Eric tells the story of how MetaCrawler became the go2net search engine:
MetaCrawler was conceived in spring of 1995 by myself
and my advisor, Oren Etzioni, as my master's degree project. It grew
rapidly in popularity once we released it publicly, gaining many new
users after Forbes mentioned us in a cover-page article. Use jumped
after C|Net reviewed all the major search services, ranking us No.
1, with AltaVista No. 2 and Yahoo No. 3...
In May of 1996, I (along with most of the rest of the AI department
at UW) created NETbot. ...When I left NETbot to return to research
at UW… MetaCrawler was now under 7 ´ 24 monitoring service, the
code was as reliable as ever, and we had made several performance
improvements. ...
There was a realization that Netbot was ill-equipped to handle
negotiations with the search services for continued MetaCrawler
use. Thus, the decision was made to license MetaCrawler to go2net,
who could provide the resources necessary to make MetaCrawler viable
as well as negotiate with the search services toward mutually beneficial
arrangements. (www.metacrawler.com/selberg-history.html)
|
MetaCrawler functions by reformatting the search engine output from
the various engines that it indexes it onto one concise page. Throughout
MetaCrawler’s history, the search engine companies that it worked with
did not entirely approve of this procedure. The most common complaint
was that the advertising banners that the search engines had on their
sites were not appearing when a user employed MetaCrawler. This meant
that their ads were not reaching the intended audience, reducing their
ad revenues.
The move to go2net heralded MetaCrawler’s concession to these concerns.
Now MetaCrawler displays the ads from each search site right above the
results. MetaCrawler users were not thrilled by this change because it
increased the time it took for the result page to download. However, skillful
design of the result pages now causes the text to load first, calming
the restless native users.
Are You Savvy Enough to Search with Me?
Colorado State University also has a tool called Savvy Search that searches
up to 20 engines at once, including a number of topic-specific directories
such as Four11 (e-mail addresses), FTPSearch95 (files on the Net), and DejaNews
(UseNet database). It’s faster but less reliable than MetaCrawler. SavvySearch’s
solution to the problem of differing types of search engine query formats
is to ignore them all. Users should not try and enter complex search strings
into SavvySearch. MetaCrawler at least tries to tackle this problem by creating
its own search syntax (using + to indicate AND, - to indicate AND NOT) and
by converting this syntax into the equivalent command for each engine. However,
neither MetaCrawler nor SavvySearch let you tap the full power of the advanced
search syntaxes offered by most engines.
One Click, DoubleClick, Red Click, Blue Click
We’ve already briefly touched upon the relationship between advertisers
and search engines, but the area is now of such importance that it deserves
its own section. It wasn’t long after the advent of search engines—especially
when Yahoo! made its much publicized move from the servers at Stanford to
those at Netscape—before advertisers noticed that search engine sites were
receiving numbers of hits in orders of magnitude greater than any other
type of site on the web. Receiving daily hits in the millions, search engines
seemed like advertising gold mines. This realization prompted the creation
of many of the other current search engines.
"Intra"-ducing...
Netscape, severely shaken and battered by Microsoft’s free release of a
competing web browser (Internet Explorer), decided to concentrate on the
new phenomenon of the intranet. Corporations wanted to use web technology
to facilitate document sharing within their own corporate networks. These
corporations also wanted to be able hide these documents from the rest of
the web, yet provide their employees with the same search capabilities offered
on the web. Search engine companies now had a market for their product,
which initially capitalized on the advertising industry for revenue. Although
there were a number of freely available search engines, corporations such
as Digital Equipment and Infoseek capitalized on the lack of programmers
who understood web administration and priced technical support and service
into their commercial search engine packages.
Soon, another reason for having a "private" search engine became apparent.
Unlike most other media, a web page is constantly updated, and new pages
are added to and removed from sites every day. None of the major web-based
search engines could search the entire web on a daily basis. Therefore,
the search databases would often contain out-of-date references or would
miss entire sections of web sites. The larger sites began indexing their
own sites and providing search engines that would primarily search through
their own materials. Some allowed the user to search the rest of the web
as well by linking the engine into one of the larger web databases such
as AltaVista.
Many relatively small sites are now providing search engines for their
own sites. This is because search engines are becoming easier and easier
to use and incorporate within a web site, and because the rapid growth
of the web has led to an incredible amount of "junk" in the form of out-of-date
pages, pages with misleading descriptions, pages deliberately designed
to confuse search engines, and so on. Additionally, it is often difficult
to know what to search for, and many users have a hard time expressing
what it is they wish to find in a language that search engines can effectively
understand. Using a site-specific search engine narrows the possibilities
enough that a poorly formulated search may still return the intended result.
Summary
Now that we’ve finished our search engine history lesson, you should
be somewhat familiar with a number of the key players in the search engine
area. Additionally, you should be starting to get a feeling for some of
the issues that search engines face.
The next chapter takes a closer look at some of the engines mentioned
here as well as a few others. You’ll learn how users interact with each
engine. Ultimately, you’ll understand the strengths and limitations of
today’s search techniques and what users have come to expect from a search
engine. This knowledge is extremely important when choosing a search engine
for your own web site. It will help you determine if a particular engine
can handle the task you need it to accomplish. You’ll also be able to
better understand how your users will interact with the engine you choose.
References
1. Hotomi sounds better than HOinkTomi, don’t you think?
2. This fact did not thrill MIT network administrators
when the web became popular a year later. Although they made an attempt
to wrestle the URL away from SIPB, the students prevailed, and to this
day MIT’s own homepage is located at http://web.mit.edu. There is an interesting
allegory relating to this at the bottom of SIPB’s main page at http://www.mit.edu
for those that are curious.
3. such as the document, "Inessential Refrigerator
Restocking," which is still available at: http://www.mit.edu:8001/sipb/documents/
4. Michael Maudlin, "Lycos: Design choices in an Internet
search service" 1997
5. The name Veronica officially expands to Very Easy
Rodent-Oriented Netwide Index to Computerized Archives -- somehow I think
they worked the expansion out afterwards, but you decide.
6. Michael Maudlin, "Lycos: Design choices in an Internet
search service" 1997
Copyright © 1997 Wes
Sonnenreich. All Rights Reserved.
|