Search Engines for Family History

by Gerhard Ruf, gruf@byu.edu

© copyright 1996 - 2008
updated 24 August 2008
http://www.xmission.com/~gruf/classes/searchenginesfh.htm

 
  1. Overview
        This class will provide an introduction to the most popular Search Engines and teach you details about using them effectively in locating family history information.  We will also look at general principles for searching at specific family history web sites.
     
  2. When I use the Internet to search for completed research or to extend my database, I first go to web sites that have the larger user contributed or extracted databases and then to one or more of the major search engines.
    1. familysearch.org
    2. rootsweb.com (including mailing lists and message boards)
    3. ancestry.com
    4. major search engines (typically Google and/or AltaVista)
    5. Every site or search engines has unique rules for effectively searching the information that site has available.

     
  3. Types of search engines / search sites
    1. Site Specific Search Tools (most major informational sites have them)
      • Each sites rules are different.
      • During the latter part of this presentation we will look at the first three sites noted above.
      • The majority of information on the Internet is at sites with large databases that cannot be accessed by outside search engines.
    2. Directory or Subject Searches (yahoo.com and/or dmoz.com)
      • The included web sites are evaluated by editors.
      • These have the least coverage - less than 1% of web.
    3. Every Word (General) Search Engines (Google is the most popular)
      • These have largest coverage - less than 10% of web.
    4. Multiple Search Engines/Sites (dogpile.com and/or metacrawler.com)
      • These search multiple every word search engines.
      • They have fewer results than the General Search Engines.
      • They cannot take advantage of specialized search techniques.

     
  4. How do Every Word (General) Search Engines work?
    1. Users or web masters notify the search engine of new pages.
    2. Search engines have 'worms', 'robots' or ‘spiders' that go out and follow all the links (connections) they can find.
    3. They maintain an index of the contents of every page they find.
    4. They periodically check out old connections to make sure they still exist.
    5. A recent survey showed that the roughly 3-4 billion web pages indexed by the largest search engines include less than 15% of the estimated pages in existence

     
  5. How do search engines determine how to rank their results? - Relevancy
    1. Each search engine has its own prioritizing or weighting schemes.
    2. They typically include
      • The content of the <TITLE> field of the browser
      • Nearness of search word(s) to the top of the page
      • Search word(s) frequency on the page, except at the end of the page
      • The first word in a search is ranked more important than a later word.
      • Keywords in tags may still be used by some search engines.
      • How many other sites link to this page (unique to Google).
    3. Some search engines allow sites to pay for a higher ranking
    4. Therefore, it is important to use more than one search engine.

     
  6. Which General Search Engine(s) should I use?
     
  7. The basic search features applicable to most search engines - search engine math
    (Using Google and AltaVista as examples)
    1. Upper case and lower case letters in search words
      • Most search engines no longer differentiate case.
            exception: AltaVista IS case sensitive within quotes and when using Advanced Search.
      • This also applies to FamilySearch, RootsWeb, and Ancestry searches.
      • Avoid case sensitivity by always using lower case characters.

       
    2. Use multiple words to limit the search.
      • Web pages with all the words will be ranked at the top of the results.
        (In Google and AltaVista only pages with all the words are displayed.)
      • This does not apply to FamilySearch, RootsWeb, or Ancestry.

       
    3. Use your browser's search feature to find one of the search terms on a particular page.
      • In Internet Explorer use Edit/Find(on This Page) or <CTRL>f
      • In Netscape or Firefox use Edit/Find in Page or <CTRL>f
      • If you don't find a search term, the page may have been changed since it was indexed, or the 'term' is 'hidden' on the page.
      • Use the Google Toolbar in Internet Explorer to hightlight and find search terms.
      • These techniques work on all normal web pages viewed.

       
    4. Forcing the inclusion and exclusion of search terms
      • use a + in front of a search term to make it mandatory
            The word or phrase must be somewhere on the page.
            Google requires use of a + for single letter words or simple words like the, of, and, or ... otherwise they are ignored
      • use a - in front of a word to exclude pages with that word.
            This also works for phrases with Google.
            The word or phrase may not be anywhere on the page.
            example: +germany +genealogy -history
                +kodak +john +alabama -camera -film
      • There is no space between the + or - and the word
      • These marks may not be used at FamilySearch, Rootsweb, or Ancestry.

       
    5. Using wildcards (*) (stemming) (Not supported by Google)
          gold* will find gold, golden, goldy, etc.
          german* will find german, germans, germany, etc.
          genealog* will find genealogy, genealogist, genealogical, etc.
          useful for finding spelling variations in the ending of a name
          typically may not be used in the middle or beginning of the word
      Wildcareds may not be used at FamilySearch or RootsWeb.
      Special rules apply to using wildcards at Ancestry.
      Some database searches, like Message Board Title searches at Ancestry, use stemming without an asterisk.
       
    6. Searching for phrases
          Phrases are defined with quotation marks
          A few search engines assume multiple words are phrases
          examples: "smith, john"     "john a. smith"
      This cannot be used at FamilySearch, Rootsweb, or Ancestry.
       
    7. Looking for ranges of numbers (only in Google)
          Use two periods between numbers to set a range of numbers
          Any number in the range will be searched for.
              1920..1930

     
  8. Each search engine has an advanced search page with additional features
    1. Features on the Google Advanced search page include
      • Most of the above features
      • language, file format, date, where on page, specific domains
    2. Features on the AltaVista Advanced search page
      • Most of the above features
      • date, file type, location, boolean searching

     
  9. Additional specialized search commands for the basic search page
    1. Search within the web page title
          Google: intitle, allintitle
          AltaVista: title
          example: intitle:text
    2. At a web site (limit search results to a particular web site)
          Google: site
          AltaVista: host, domain
          example: host:buy.edu, domain:uk
    3. URL search (looks for a link to another page that includes the term)
          Google: inurl, allinurl
          AltaVista: url
          example: inurl:usgenweb
    4. dictionary
          Google only: define
          example: define:scurvy

     
  10. Advanced Searching with Boolean Operands (commands)
  11. Other common features on advanced search screens
     
  12. Special Features at Google and AltaVista
     
  13. Common Rules for effective searching at web sites like FamilySearch, RootsWeb and Ancestry
     
  14. Learn More About Using and Evaluating Search Engines at:

Summary

    Search engines provide a powerful method to find specific information on the ever changing World Wide Web. Learning to use a global or site specific search engine effectively will decrease the amount of time you spend looking for family history information.


Return to Gerhard's list of classes page.

Return to Gerhard and Deon's Home Page.


This page has been accessed times since 10 Jan 04.