Search Engines for Family History
- Overview
This class will provide an introduction to the
most popular Search Engines and teach you details about using
them effectively in locating family history information.
We will also look at general principles for searching at
specific family history web sites.
- When I use the Internet to search for completed research or to
extend my database, I first go to web sites that have the larger user
contributed or extracted databases and then to one or more of the major
search engines.
- familysearch.org
- rootsweb.com
(including mailing lists and message boards)
- ancestry.com
- major search engines (typically
Google and/or
AltaVista)
- Every site or search engines has unique rules for effectively
searching the information that site has available.
- Types of search engines / search sites
- Site Specific Search Tools (most major informational sites have them)
- Each sites rules are different.
- During the latter part of this presentation we will look at
the first three sites noted above.
- The majority of information on the Internet is at sites with
large databases that cannot be accessed by outside search
engines.
- Directory or Subject Searches
(yahoo.com and/or
dmoz.com)
- The included web sites are evaluated by editors.
- These have the least coverage - less than 1% of web.
- Every Word (General) Search Engines (Google is the most popular)
- These have largest coverage - less than 10% of web.
- Multiple Search Engines/Sites
(dogpile.com and/or
metacrawler.com)
- These search multiple every word search engines.
- They have fewer results than the General Search Engines.
- They cannot take advantage of specialized search techniques.
- How do Every Word (General) Search Engines work?
- Users or web masters notify the search engine of new pages.
- Search engines have 'worms', 'robots' or ‘spiders' that go out
and follow all the links (connections) they can find.
- They maintain an index of the contents of every page they find.
- They periodically check out old connections to make sure they
still exist.
- A recent survey showed that the roughly 3-4 billion web pages
indexed by the largest search engines include less than 15%
of the estimated pages in existence
- How do search engines determine how to rank their results? - Relevancy
- Each search engine has its own prioritizing or weighting schemes.
- They typically include
- The content of the <TITLE> field of the browser
- Nearness of search word(s) to the top of the page
- Search word(s) frequency on the page, except at the end of the page
- The first word in a search is ranked more important than a later word.
- Keywords in tags may still be used by some search engines.
- How many other sites link to this page (unique to Google).
- Some search engines allow sites to pay for a higher ranking
- Therefore, it is important to use more than one search engine.
- Which General Search Engine(s) should I use?
- The most popular search engines by Nielsen NetRatings (July 2005) are
(My own search results taken on 1 Dec 2005 are listed beside them.)
Site
| comScore Rank
| genealogy
| "family history"
| "john smith"
| map
|
Google
| #1 (61.5%)
| 91,700,000
| 41,700,000
| 6,200,000
| 1,250,000,000 |
http://www.google.com/ |
Yahoo
| #2 (20.9%)
| 136,000,000
| 73,300,000
| 16,800,000
| 8,320,000,000 |
http://www.yahoo.com/ also accessible from
http://www.altavista.com/ |
Microsoft Search
| #3 (92%)
| 48,900,000
| 14,300,000
| 4,990,000
| 844,000,000 |
http://www.live.com/ |
Ask Network
| #4 (4.3%)
| 10,310,000
| 5,847,000
| 726,400
| 499,840,000 |
http://www.ask.com/ |
- The basic search features applicable to most search engines - search engine math
(Using Google and AltaVista as examples)
- Upper case and lower case letters in search words
- Most search engines no longer differentiate case.
exception: AltaVista IS case sensitive within quotes
and when using Advanced Search.
- This also applies to FamilySearch, RootsWeb, and Ancestry searches.
- Avoid case sensitivity by always using lower case characters.
- Use multiple words to limit the search.
- Web pages with all the words will be ranked at the top of the results.
(In Google and AltaVista only pages with all the words are displayed.)
- This does not apply to FamilySearch, RootsWeb, or Ancestry.
- Use your browser's search feature to find one of the search terms on a particular page.
- In Internet Explorer use Edit/Find(on This Page) or <CTRL>f
- In Netscape or Firefox use Edit/Find in Page or <CTRL>f
- If you don't find a search term, the page may have been changed
since it was indexed, or the 'term' is 'hidden' on the page.
- Use the Google Toolbar in Internet Explorer to hightlight and find
search terms.
- These techniques work on all normal web pages viewed.
- Forcing the inclusion and exclusion of search terms
- use a + in front of a search term to make it mandatory
The word or phrase must be somewhere on the page.
Google requires use of a + for single letter words or simple words
like the, of, and, or ... otherwise they are ignored
- use a - in front of a word to exclude pages with that word.
This also works for phrases with Google.
The word or phrase may not be anywhere on the page.
example: +germany +genealogy -history
+kodak +john +alabama -camera -film
- There is no space between the + or - and the word
- These marks may not be used at FamilySearch, Rootsweb, or Ancestry.
- Using wildcards (*) (stemming) (Not supported by Google)
gold* will find gold, golden, goldy, etc.
german* will find german, germans, germany, etc.
genealog* will find genealogy, genealogist, genealogical, etc.
useful for finding spelling variations in the ending of a name
typically may not be used in the middle or beginning of the word
Wildcareds may not be used at FamilySearch or RootsWeb.
Special rules apply to using wildcards at Ancestry.
Some database searches, like Message Board Title searches at Ancestry,
use stemming without an asterisk.
- Searching for phrases
Phrases are defined with quotation marks
A few search engines assume multiple words are phrases
examples: "smith, john" "john a. smith"
This cannot be used at FamilySearch, Rootsweb, or Ancestry.
- Looking for ranges of numbers (only in Google)
Use two periods between numbers to set a range of numbers
Any number in the range will be searched for.
1920..1930
- Each search engine has an advanced search page with additional features
- Features on the Google Advanced search page include
- Most of the above features
- language, file format, date, where on page, specific domains
- Features on the AltaVista Advanced search page
- Most of the above features
- date, file type, location, boolean searching
- Additional specialized search commands for the basic search page
- Search commands are separated from the search term by a colon(:) and no spaces.
- Search within the web page title
Google: intitle, allintitle
AltaVista: title
example: intitle:text
- At a web site (limit search results to a particular web site)
Google: site
AltaVista: host, domain
example: host:buy.edu, domain:uk
- URL search (looks for a link to another page that includes the term)
Google: inurl, allinurl
AltaVista: url
example: inurl:usgenweb
- dictionary
Google only: define
example: define:scurvy
- Advanced Searching with Boolean Operands (commands)
- use AND(&), OR(vertical bar), AND NOT(!), NEAR(~) with words and phrases
- operands are not case sensitive but words and phrases are
- combined words are treated as phrases (no quotation marks required)
- Google only supports the OR operand
- AltaVista only supports these operands on the Advanced Search screen
- BOOLEAN Search Terms (Operands) and what they do
AND
similar to + in regular searching
example: family history AND purrington
OR
similar to not using any notation in regular searching
AND NOT
similar to - in regular searching
example: (genealogy AND kodak) AND NOT film AND NOT camera*
NEAR
no equivalent in regular searching
finds words or phrases within 10 words of each other (at AltaVista)
example: gerhard NEAR ruf (will find it with or without middle names)
()
use parenthesis to group operations
example: (ruf OR ruff OR roufe OR rouffe) NEAR (gerhard OR gerhart OR gerhardt)
- Other common features on advanced search screens
- Limit the search to certain specified file types, PDF, XLS, PPT, DOC, RTF, etc.
- Limit the search to time periods, languages, domains, and other factors.
- Special Features at Google and AltaVista
- Protect yourself from finding pornographic or other objectionable sites.
Google: go to Preferences, make selection in Safe Search Filtering
AltaVista: go to Settings, select Family Friendly Filter (password protected)
- Toolbar (allows searching with in the browser)
Google has a toolbar for either Internet Explorer or Firefox
at toolbar.google.com/
Features: popup blocker, highlighting, find on page, etc.
AltaVista has a toolbar for Internet Explorer only
at www.altavista.com/toolbar/
Features: popup blocker, highlighter, translator, etc.
- Translation of web sites
Google: go to Language Tools
AltaVista: go to Translate, or use AltaVista Toolbar
- Google also has a Deskbar which resides in the Taskbar
at deskbar.google.com/
Searches while you are online without a browser running
Loads search results into your default browser
- Common Rules for effective searching at web sites like FamilySearch, RootsWeb and Ancestry
- Use global search (searches all databases) when you start searching for someone
new or when you don't know everything that is available at the site.
- Use database specific searching to take advantage of additional search fields.
You have to search specifically in the IGI to use Batch numbers.
You must go to the SSDI database to be able to use birth and death dates or
Social Security Numbers.
- Use wildcards and soundex options, when available to find hard to spell names.
- Use the help screens and Frequently Asked Question (FAQ) at each site.
- Try variations in spelling .
- Try variations in the combinations of data you use to find an individual.
- On many search screens you do not have to enter a name to find individuals.
- Learn More About Using and Evaluating Search Engines at:
Summary
Search engines provide a powerful method to find specific
information on the ever changing World Wide Web. Learning to use a global or site
specific search engine effectively will decrease the amount of time you spend looking
for family history information.
Return to Gerhard's list of classes page.
Return to Gerhard and Deon's Home Page.
This page has been accessed times since 10 Jan 04.