A History Of Search Engines

A History Of Search Engines

For about a decade, finding things online was a different kind of activity. There was no single box you typed into. There was no algorithm that quietly understood what you meant. There was a crowded marketplace of search engines, web directories, butler mascots, and animated GIF mascots, and you had to know which one to use for which kind of question. Then in 1998 two grad students published a paper about ranking pages by their backlinks, and within five years the whole landscape was gone.

This article is a tour of that lost web. It walks through the search tools that came before Google, how they actually worked, what the broader internet around them felt like, and how a single algorithm rolled over almost all of them. If you ever had a Yahoo email address, a GeoCities homepage, or asked Jeeves a question, this is your era.

The Internet Before the Web

To understand early search, you have to remember that the internet existed before the World Wide Web did. By the late 1980s, there was already a global network of universities and research labs, and there were already mountains of files scattered across it. They just lived on FTP servers, not websites.

If you wanted a file, you had to know which server it was on and connect to that server directly. This was tedious. You spent hours typing FTP commands to browse directories one at a time, hoping the file you wanted was in there somewhere. The community swapped lists of known servers on bulletin boards.

This was the problem that produced the first search tool, and it happened a year before the Web existed.

Archie (1990): The First Search Engine

In September 1990, a computer science student at McGill University in Montreal named Alan Emtage built a program to crawl public FTP servers and index their file listings. He called it Archie, short for "archive." (The Archie Comics connection came later, as a backronym joke, but the name stuck and inspired the names of everything that followed.)

Archie did one thing well. It periodically logged in to FTP sites, downloaded their directory listings, and let you search the combined index by filename. It did not look at the contents of files, only their names and locations. By modern standards that is barely a search engine at all. By 1990 standards it was magic. Universities all over the world started running their own Archie servers, and the proprietary software was eventually released so others could spin up regional copies.

Archie ran for years. A single Archie server is still maintained for posterity at the University of Warsaw, more than thirty years after Emtage built the first one.

Archie's Query Form from 1990

Veronica and Jughead (1992 to 1993): The Gopher Years

In 1991, the University of Minnesota released a protocol called Gopher. Gopher organized the internet's documents into a hierarchy of menus that you navigated by selecting numbered items. It was text-only and looked like nothing, but for a brief moment it was the dominant way to read documents over the internet. Tim Berners-Lee's World Wide Web was launched the same year but took a few years to overtake Gopher.

Gopher needed its own search tools, and following the precedent Archie set, they got cartoon names. Veronica, launched in November 1992 by the University of Nevada-Reno, indexed Gopher menu titles across the entire Gopher-space. Its full name was Very Easy Rodent-Oriented Net-wide Index to Computer Archives, which is the kind of name only an early-90s graduate student could love. Jughead, developed by Rhett Jones at the University of Utah in 1993, did the same job at a smaller, more localized scale. Jughead stood for Jonzy's Universal Gopher Hierarchy Excavation and Display.

Neither tool lasted, because Gopher itself did not last. By 1994 the World Wide Web had eaten Gopher's audience. But Archie, Veronica, and Jughead established the basic pattern: a piece of software crawls the network, builds an index, and gives users a search box. Every search engine since has been a variation on that idea.

The Web Arrives and Search Explodes (1993 to 1995)

The Web's invitation to anyone, anywhere, to publish a page produced an explosion of content and an immediate, urgent need for a way to find any of it.

W3Catalog appeared in September 1993, developed by Oscar Nierstrasz at the University of Geneva. It was arguably the first true web search engine, although it worked by aggregating existing manually-curated link lists rather than crawling pages itself.

ALIWEB, released by Martijn Koster in October 1993, took a different approach. Site owners would submit a description of their site, and ALIWEB would index those descriptions. No crawler. The catch was that almost nobody bothered to submit, so its coverage was thin.

JumpStation, the World Wide Web Worm, and the RBSE spider all appeared in late 1993 and were the first true crawler-based search engines for the Web, indexing pages automatically by following links. They were primitive: they typically indexed only page titles or URLs, not the full text of pages, and they ranked results by simple measures like keyword frequency.

The real expansion came in 1994 and 1995, which is when most people first remember encountering search.

The Big Names of the Mid-90s

By the time most ordinary people got online, usually through a dial-up account on AOL, CompuServe, or Prodigy, the search market had filled with competitors. Each one tried to solve the problem a slightly different way.

Yahoo (1994): The Directory

Yahoo is the brand that most defines pre-Google search, partly because it survived. It was started in early 1994 by two Stanford PhD students, Jerry Yang and David Filo, who built a hand-curated list of links called "Jerry and David's Guide to the World Wide Web." They renamed it Yahoo in March 1994 and incorporated as a company in 1995.

The crucial thing about Yahoo in its first years is that it was not a search engine in the modern sense. It was a directory, a hierarchical catalog of websites organized by category, the way a library organizes books. You browsed it like a tree: Arts > Movies > Genres > Horror. Real humans, called "surfers," reviewed each submitted site and decided where it belonged. The Yahoo Directory was the front page of the internet for millions of people through the late 90s.

Yahoo's directory homepage from the mid-90s

Yahoo eventually added search functionality on top, but its core asset was the curated taxonomy. The company became one of the most valuable tech firms of the 90s on the strength of that model.

Lycos (1994): The Crawler

Founded at Carnegie Mellon University by Michael Mauldin in 1994 and named after the Latin word for the wolf spider (a hint at its method), Lycos was an early aggressive crawler. By the end of its first year it had indexed over a million documents, an absurd number for the time. It went public in 1996, briefly became the most visited destination on the web in 1999, and was sold to the Spanish telecom Terra in May 2000 for $12.5 billion at the peak of the dot-com bubble. Terra sold it four years later to a South Korean company for around $100 million. The crash was that fast.

The Lycos Homepage in 1997

Excite (1995): The Statistician

Originally a Stanford research project called Architext, founded by six students in 1994 with $1.5 million in seed funding, Excite launched as a public search engine in 1995. Its angle was using statistical analysis of word relationships rather than just keyword matching to deliver better results. Excite went public in 1996 at a $177 million valuation, bought rivals Magellan and WebCrawler in 1996, was the first major portal to offer free email in summer 1997, and merged with @Home Networks in January 1999 in a $6.7 billion deal. Excite@Home filed for bankruptcy in October 2001. The remnants of the brand still exist.

The Excite Homepage July 1998

AltaVista (1995): The Power User's Engine

AltaVista deserves its own paragraph because for a while it was the best search engine in the world. Launched in December 1995 by Digital Equipment Corporation, AltaVista was built to demonstrate the power of DEC's Alpha 64-bit processors. It used a fast crawler called Scooter and indexed the entire text of pages rather than just titles or summaries. It supported Boolean operators (AND, OR, NOT) for precise queries, natural language questions, image search, and multilingual results. It processed tens of millions of queries a day at its peak.

The AltaVista search interface, the technical favorite of the mid-90s

If you were a technical user in 1996 or 1997, AltaVista was probably your search engine. Then the corporate fortunes turned: Compaq acquired DEC in 1998 for $9.6 billion and tried to turn AltaVista into a Yahoo-style portal, which diluted what made it special. It was sold to CMGI in 1999 for $2.3 billion, the IPO was killed by the dot-com crash, and the once-supreme engine was sold to Overture in 2003 for $140 million. Yahoo finally shut it down in July 2013.

Infoseek (1995): The Subscription Experiment

Infoseek launched in 1995 with an unusual model: it tried to charge a subscription for search. That did not work. It pivoted to free, ad-supported search, became Disney's search engine through the Go.com portal, and quietly disappeared.

HotBot (1996): The Wired Engine

Launched in May 1996, HotBot was initially powered by Inktomi's search technology and was the official search engine of Wired magazine. Its homepage had a distinctly 90s look: bright colors, drop-down menus for advanced search filters, and an aggressively modern feel for the era. It briefly had the largest search index in the world.

Ask Jeeves (1996): The Butler

Of all the pre-Google engines, Ask Jeeves is the one that lives in cultural memory. Founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California, Ask Jeeves promised something different: you could type a question in plain English, and Jeeves, an animated cartoon butler in tuxedo and tails, would deliver the answer.

Jeeves the butler, mascot of Ask Jeeves

The technology under the hood was a blend of conventional keyword search and a database of pre-written answers to common questions. The results were inconsistent, but the experience was friendly. Jeeves was modeled on the valet from P.G. Wodehouse's novels, and he became one of the most recognizable mascots of the late 90s internet. A 70-foot Jeeves balloon floated in Macy's Thanksgiving Day Parade. The company spent more than $100 million building the brand around him.

Ask Jeeves went public in 1999 and saw its stock surge nearly fivefold on its first trading day. By 2000 the warnings were already starting. The company was acquired by InterActiveCorp in July 2005 for $2.3 billion. In February 2006, IAC chairman Barry Diller announced that Jeeves was being retired. "Not that I don't like that fat butler," Diller said, but research showed users associated the butler with the inconsistent old technology rather than the improved engine. The site became Ask.com. The mascot moved to the corporate-mascot retirement home alongside the Pillsbury Doughboy and the Energizer Bunny.

Dogpile and MetaCrawler (1996): The Meta-Search Approach

Dogpile, launched in November 1996, took a different angle. Instead of running its own crawler, it sent your query to multiple other search engines and aggregated the results. The pitch was that no single engine indexed the whole web, so combining them gave better coverage. The site's early visual identity was loud, casual, almost Nickelodeon-grade clip art. MetaCrawler, started in 1994 at the University of Washington, was the first major meta-search engine and worked on the same principle. Both still exist in some form today, both now aggregating from Google, Yahoo, and Yandex.

Sherlock (1998): The Desktop Hybrid

This one is worth a special mention because it was different from the rest. Sherlock was not a website. It was a piece of software Apple introduced with Mac OS 8.5 in October 1998, named after Sherlock Holmes. It searched your local Mac and the web from the same interface, using plug-ins that talked to existing search engines like AltaVista, Excite, Infoseek, Lycos, and Yahoo. Sherlock 2, shipped with Mac OS 9 in 1999, added channels for shopping, news, stocks, and movie listings. Sherlock 3, in Mac OS X 10.2 in 2002, expanded those channels further.

Sherlock was forward-thinking, and it also produced a piece of tech vocabulary that survives today. When Apple's Sherlock 3 absorbed the functionality of a popular third-party app called Watson, the developer felt the company had copied his work. The verb "to be Sherlocked" entered the language to describe what happens when a platform owner builds a feature that wipes out a third-party app.

What the Web Around Them Looked Like

The search engines did not exist in a vacuum. They were the front door to a different kind of internet, and remembering that internet is half the point.

Dial-Up

You connected by plugging your computer into a phone line. The modem made a sequence of escalating chirps and screeches that anyone over thirty-five can still hum from memory. While you were online, the phone was busy, and your mother yelled at you to get off so she could make a call. A photograph took thirty seconds to load, top to bottom, line by line, and you watched it appear.


Kid's explaining Dial Up internet

Web Portals

Yahoo, Excite, Lycos, and AOL all converged on the same product idea by the late 90s: the portal. A portal was a homepage you set as your browser's default, and it bundled a search box with email, weather, stock quotes, news headlines, horoscopes, sports scores, and games. The pitch to users was convenience. The pitch to investors was eyeball time. Excite was the first major search property to offer free email in summer 1997, with Yahoo and Lycos following in October. AOL itself was the most extreme version of this idea: for millions of users, AOL was the internet, a walled garden of chat rooms, AIM, message boards, and curated channels, with the actual Web grafted on as an afterthought.

The AOL homepage in the 90s

GeoCities, Tripod, and the Personal Homepage

Before social media, regular people who wanted a presence on the web built a homepage. GeoCities, founded in 1994 by David Bohnett and John Rezner, was the biggest of the free homepage services. It organized user pages into themed "neighborhoods" with names like Hollywood (entertainment), SiliconValley (computing), Heartland (family), and Area51 (sci-fi). At its peak GeoCities had millions of pages.

A typical GeoCities homepage in 1997

These pages had a specific look. Tiled background patterns. Comic Sans. Animated GIFs of dancing skeletons, flaming dividers, and "under construction" signs with little workmen. Visitor counters at the bottom that the page owner refreshed obsessively. Guestbooks where strangers signed in to say "cool site!" MIDI files that started playing the moment the page loaded and could not be turned off. The Hampster Dance (1998 to 1999) was the canonical example of the form, an entire site that was just rows of identical animated hamsters with a looping calliope soundtrack, and it became one of the first viral memes on the web.

Yahoo bought GeoCities in January 1999 for $3.57 billion in stock and shut the US version down in 2009. The Archive Team rescued a huge portion of its content, and you can still wander those streets if you know where to look.

Webrings

Because search was unreliable, people invented a different way to find related sites: the webring. A webring was a set of websites about a single topic, linked into a circular chain. At the bottom of each member's page sat a navigation box with buttons: Previous, Next, Random, and a link to the ring's hub. You'd land on a fan page for Twin Peaks or Calvin and Hobbes or vintage typewriters, click Next, and end up on another page on the same theme. WebRing.org, the main coordinating service, was bought by GeoCities in 1998 and rolled into Yahoo a year later. Webrings were the social graph before the social graph existed.

The WebRing.org homepage

The Era's Other Texture

The Netscape Navigator browser launched in 1994 and dominated until Microsoft's Internet Explorer caught up around 1998. ICQ, AOL Instant Messenger, and MSN Messenger were how teenagers talked to each other after school. Email signatures had elaborate ASCII art. Sites posted "Best Viewed in 800x600 with Internet Explorer 4.0" badges. The phrase "information superhighway" was used unironically. Banner ads, usually for X10 cameras or "punch the monkey" games, blinked across the top of every page.

Netscape Navigator in 1995

How These Engines Actually Worked, and Why Their Results Were Bad

The pre-Google search engines all worked some variation of the same way. A crawler (sometimes called a spider or a bot) walked the web by following links from page to page, downloading the HTML it found. An indexer processed those downloads and built a giant inverted index, basically a dictionary mapping every word to the list of pages that contained it. When you searched, the engine looked up your terms in the index and ranked the matching pages by some heuristic.

The ranking heuristic is where it all fell apart.

Most engines ranked pages mostly by keyword frequency (how many times your search term appeared on the page) and meta tags (information the page author included in the HTML header to describe the page). Both were trivial to game. By the mid-90s a whole cottage industry of "search engine optimization" was stuffing pages with hidden keywords, often as white text on a white background so users could not see them but the indexer could. If you searched for "Madonna" in 1997, the top result might be a porn site that had repeated the word a thousand times in white-on-white text.

The directories (Yahoo, the early DMOZ Open Directory Project) sidestepped this by using humans to vet entries. The cost was scale. Human editors could not possibly keep up with a web that was doubling in size every few months. By 1998, the directory model was visibly failing.

Everyone in the industry knew the problem. The race was on for a better ranking signal.

The Disruption: PageRank, BackRub, and the Rise of Google

In March 1995, two new Stanford computer science PhD students met during an orientation: Larry Page, assigned to be shown around campus, and Sergey Brin, the student showing him around. They started arguing immediately, and they became friends.

Page got interested in the structure of links on the web. His insight was that hyperlinks were a form of citation, and citation patterns had been used in academic publishing for decades to measure the importance of a paper. A page linked to by many other pages was likely important, and a page linked to by other important pages was likely very important. The math became recursive in a satisfying way.

Page started a project at Stanford in 1996 called BackRub, the name a joke on its method of analyzing backward links. Brin joined him. They wrote a crawler, indexed about 24 million pages, and built an algorithm to rank them by this link-based authority. They called the score PageRank, which was both Larry Page's name and a description of what it ranked.

The early version ran on `google.stanford.edu`, on a tower of borrowed and scrounged hardware that the pair assembled in their dorm room and later in a friend's garage in Menlo Park. They published a paper in early 1998 called "The Anatomy of a Large-Scale Hypertextual Web Search Engine," which is still one of the most-cited papers in computer science.

The results were obviously, dramatically better. Searching Google in 1998 returned the page you actually wanted at the top, where every other engine returned the page that had gamed its keyword count the hardest.

Google's original 1998 homepage, a sparse white page in a sea of cluttered portals

The homepage was a second piece of the puzzle. Every other search engine had piled itself into a portal of email, weather, news, and games. Google's homepage was a logo, a search box, two buttons, and nothing else. It loaded instantly even on dial-up, and it implicitly said: this is a tool, not a destination. You came, you searched, you left. The competitors did not yet understand that this was a feature.

Page and Brin tried to license the technology to existing search companies. Excite, Yahoo, and others passed. Yahoo's leadership reportedly worried that better search would make users leave the portal faster, which was the exact opposite of the engagement business model. So in September 1998, with $100,000 in seed money from Sun Microsystems co-founder Andy Bechtolsheim, Page and Brin incorporated Google Inc. in a garage in Menlo Park.

The Collapse of the Old Order

What happened next happened with stunning speed.

By 1999, Google was already the search engine of choice for technical users who valued result quality. By 2000, it was indexing over a billion pages and serving 18 million queries a day. AOL switched its underlying search to Google in 2002. Yahoo did the same that same year, finally admitting it had been wrong to pass, before scrambling to acquire Inktomi, Overture, and AltaVista in a frantic attempt to build its own. The portals tried to bolt better search onto themselves; Google had been built as nothing but better search from the start.

The dot-com crash in 2000 and 2001 accelerated the rout. Excite@Home went bankrupt in October 2001. AltaVista's IPO was killed by the market collapse and the property was sold for pennies on its peak valuation. Lycos changed hands three times, each one cheaper than the last. Infoseek became part of Disney and was quietly euthanized. Ask Jeeves abandoned its butler and merged into a holding company. By the mid-2000s, the only pre-1998 search engine that still mattered was Yahoo, and Yahoo by then was running Google's results behind the scenes for years before building its own again.

Google added AdWords in 2000, took its IPO in 2004 at $85 a share, and the rest is the present.

The two big factors that finished off the old engines, beyond Google's better algorithm, were that the portals chose engagement over result quality at exactly the wrong moment, and that the dot-com crash starved the also-rans of the capital they would have needed to catch up. Google emerged from that wreckage as effectively the only well-capitalized search company with a working business model, and it has held that position for two decades.

NYSE Floor in the late 90s

What Was Lost

The story of search engines is usually told as a story of progress, which it largely is. Google's results were genuinely better, and pretending otherwise is nostalgia masquerading as critique.

But the era around early search had texture that the current internet does not. When search did not work, you found things by stumbling: following a webring from a fan page to a fan page, hitting "Random" until you landed somewhere weird, reading the personal homepage of someone you'd never heard of because their GeoCities URL was at the bottom of a forum signature. The web was a landscape you wandered, not a database you queried. Curation was distributed across thousands of hand-built link pages and obsessive subject directories.

That model could not have scaled. The web of 1998 had a few million sites. The web of today has billions. No directory of humans could keep up with that, and no webring covers a meaningful fraction of anything. The old way was lovely partly because the web was small enough to fit inside it.

What we lost was the sense that the internet was a place people made, rather than a service people used. Personal homepages, with their broken HTML and looping MIDI files, were the work of individuals. The portals replaced them with templated profiles. Social media finished the job. Search became one box on one site, and the box's results were ranked by an algorithm that no one outside Google fully understood.

It is worth remembering that this was not always how it worked. For about a decade, finding things online meant choosing between half a dozen different engines, knowing the quirks of each one, browsing a directory like it was a library, signing a guestbook to leave a trace. The whole landscape vanished into one search box.

But Jeeves is out there somewhere, in the corporate mascot retirement home, still wearing his tuxedo, waiting for a question.