U.C. Berkeley

Library Web
Finding Information on the Internet: A Tutorial
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
Recommended Search Engines
UC Berkeley - Teaching Library Internet Workshops

Recommended Search Engines: Tables of Features
Google has one of the largest databases of Web pages, including many other types of web documents (blog posts, wiki pages, group discussion threads and document formats (e.g., PDFs, Word or Excel documents, PowerPoints). Despite the presence of all these formats, Google's popularity ranking often makes pages worth looking at rise near the top of search results. Our web searching workshop reflects our recognition that Google currently is the winning web search engine and so people need to learn to use it really well.

Google alone not always sufficient, however. Less than half the searchable Web is fully searchable in Google. Overlap studies show that more than 80% of the pages in a major search engine's database exist only in that database. Getting a "second opinion" is therefore often worth your time. For this purpose, we recommend Ask.com or Yahoo! Search. We no longer recommend using any meta-search engines.

Features in common among the search engines we recommend. Search engines have become somewhat standardized, allowing us to use some common search techniques in all of them:

Things You CAN Do
in Google, Yahoo!, and Ask.com
Things NOT Supported
in Google, Yahoo!, or Ask.com

Some Ways the Recommended Search Engines Differ:

Search Engine Google
www.google.com
Yahoo! Search
search.yahoo.com
Ask.com
www.ask.com
Links to help Google help Yahoo! help Ask.com help

Size, type
See tests and more charts.

HUGE. Size not disclosed in any way that allows comparison. Probably the biggest.

HUGE. Claims over 20 billion total "web objects." LARGE. Claims to have 2 billion fully indexed, searchable pages.
Noteworthy features Popularity ranking using PageRank™ emphasizes pages most heavily linked from other pages.
Many additional databases including Book Search, Scholar (journal articles), Blog Search, Patents, Images, etc.
Shortcuts give quick access to dictionary, synonyms, patents, traffic, stocks, encyclopedia, and more. Subject-Specific Popularity™ ranking.
Suggests broader and narrower terms.
AskEraser privacy option.
Boolean logic
(what's this?)

Partial. AND assumed between words.
Capitalize OR.
( ) accepted but not required.
In Advanced Search, partial Boolean available in boxes.

Accepts AND, OR, NOT or AND NOT. Must be capitalized.
( ) accepted but not required.
Partial. AND assumed between words.
Capitalize OR.
- excludes.
No ( ) or nesting.
+Requires/ -Excludes
(what's this?)
- excludes 
+ will allow you to retrieve "stop words" (e.g., +in)
- excludes 
+ will allow you to search common words: "+in truth"
- excludes 
+ will allow you to retrieve "stop words" (e.g., +in)
Sub-Searching
(what's this?)
The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.) The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.) The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)
Results Ranking
(what's this?)
Based on page popularity measured in links to it from other pages: high rank if a lot of other pages link to it.
Fuzzy AND also invoked.
Matching and ranking based on "cached" version of pages that may not be the most recent version.
Automatic Fuzzy AND. Based on Subject-Specific Popularity™, links to a page by related pages.
Field limiting
(what's this?)

link:
site:
intitle:
inurl:
Offers U.S.Gov't Search and other special searches. Patent search.

link:
site:
intitle:
inurl:
url:
hostname:
(Explanation of these distinctions.)

intitle:
inurl:
site:
last:[time period]
(Details)
Truncation,
Stemming
(what's this?)
No truncation. Stems some words. Search variant endings and synonyms separately, separating with OR (capitalized):
airline OR airlines
Neither. Search with OR as in Google. Neither. Search with OR as in Google.
Language  Yes. Major Romanized and non-Romanized languages in Advanced Search. Yes. Major Romanized and non-Romanized languages. Yes. Major Romanized languages. Use Advanced Search to limit.
Translation Yes, in Translate this page link following some pages. To and sometimes from English and major European languages and Chinese, Japanese, Korean. Ues its own translation software with user feedback. Yes. No.


You may also wish to consult "What Makes a Search Engine Good?" - a table (PDF file) summarizing useful factors for evaluating search engines.

How do Search Engines Work?

Search Engines for the general web (like all those listed above) do not really search the World Wide Web directly. Each one searches a database of the full text of web pages automatically havested from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page.

Search engine databases are selected and built by computer robot programs called spiders. These "crawl" the web, finding pages for potential inclusion by following the links in the pages they already have in their database (i.e., already "know about"). They cannot think or type a URL or use judgment to "decide" to go look something up and see what's on the web about it. (Computers are getting more sophisticated all the time, but they are still brainless.)

If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.

After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.

Many web pages are excluded from most search engines by policy. The contents of most of the searchable databases mounted on the web, such as library catalogs and article databases, are excluded because search engine spiders cannot access them. All this material is referred to as the "Invisible Web" -- what you don't see in search engine results.
Quick Links
Search Engines |Subject Directories | Meta-Search Engines | Invisible Web

[ HELP/SEARCH ] [ CATALOGS ] [ COMMENTS ] [ HOME ]
Copyright (C) 2008 by the Regents of the University of California. All rights reserved.
Comments and questions welcome.
Last update 01/27/08. Server manager: Contact