Section 1 - Definitions and Abbreviations
Section 2 - Introduction
Section 3 - Requirements
Section 4 - Registration
Section 5 - Installation
Section 6 - Logfile Format
Section 7 - Configuration
Section 8 - Running Statbot
Section 9 - Customizing Statbot
Section 10 - Interpreting Page Statistics
Section 11 - Advanced Topics
Section 12 - Common Questions and Problems
Appendix I - Quick Start
Appendix II - Error Messages
Appendix III - Acknowledgments
The following definitions and abbreviations are used throughout this document:
Browser
A program used to display pages written in HTML that are accessed using the
World Wide Web. Two of the most popular browsers are Mosaic and Netscape.
CGI
Abbreviation for 'Common Gateway Interface'. The
Common Gateway Interface is a program that runs as part of a World Wide Web
page server. It is used to execute programs or scripts in response to
information provided by users. Some of the more commonly used CGI programs
are used to handle forms processing and Email requests.
HTML
Abbreviation for 'HyperText Markup Language'.
The text language used to create special documents with information tags
embedded in the body of the document that describe how the document should
be displayed and how different documents are linked together.
HTTPD
Abbreviation for 'HyperText Transfer Protocol
Daemon'. This program runs on the host and is responsible for serving
WWW pages and processing WWW transactions.
Host
The computer system that provides the User Internet connectivity. Hosts come
in a wide variety of sizes and operating systems.
WWW
Abbreviation for the 'World Wide Web'.
This document describes Statbot, a WWW utility that gathers, maintains, and displays page access statistics. It has been designed to be easy to install and configure, with no prior programming experience required. The User doesn't need to know how to write Perl scripts, program in 'C', or have access to the CGI. The User simply edits a configuration file and runs Statbot from their own local directory.
Statbot is a passive statistics-gathering utility. It runs outside the normal HTTPD environment and does not need to be installed to run under the CGI. This feature solves the problems of Host security, program maintenance, and the need to have super-user access.
Statbot supports several different Host configurations:
Statbot gathers the information it uses by reading the Host's logfiles. These logfiles are updated by the Host each time a request for a WWW page occurs. The logfiles contain information regarding the time and date of access, the identification of the site making the access, file(s) accessed, and other pertinent information.
Because Host logfiles can become very large, old logs are routinely removed from the system. This can cause problems for users who want to gather statistics for accesses spanning more than just a few weeks. Statbot gets around this limitation by maintaining its own site information database. As it scans the Host's logfiles, any new references of interest to Statbot are added to the local database. When the Host's logfiles are deleted, Statbot's own database remains intact.
Statbot displays the statistics that it gathers by automatically creating an HTML-compliant 'stats' page that can be displayed by any browser or 'linked to' by other HTML documents on the WWW. It is expected that the User will create a link in their home page to the 'stats' page. When someone is viewing the User's home page, all they need to do to see the page statistics is to select the link.
The format of the 'stats' page as well as its content can be customized by the User without the need for code modifications and re-compiling. A configuration file is used by Statbot as a reference in building the 'stats' page. To customize the page, the User simply changes the appropriate entries in the configuration file. The next time Statbot is run, the new changes are added immediately.
The rest of this document describes the installation, configuration, operation and maintenance of Statbot. This manual provides complete information on Statbot, so some sections may not be of interest to all users. As you use Statbot and become more familiar with how it works, the information here will help you get the most out of the utility.
The requirements for installing, configuring, and running Statbot are described below:
Host Requirements
Statbot must be run from the Host that is responsible for serving the
User's WWW pages. This is usually the system that the User's login account
is found. There are a few situations where WWW pages are uploaded to the
Host and the User does not have an account on the system. In this case, the
User will need to contact Technical Support at the Host and make
arrangements to have Statbot installed and configured for them.
Logfile Permissions
The Host's logfile directory and the logfiles found inside must be
"readable" by the User. If the logfiles can't be read by the User, the User
will need to contact Technical Support at the Host to have the permissions
changed.
Text Editor
The User must have access to a text editor of some type. This is needed
for the User to make changes to the configuration file.
Statbot is shareware. Try it for 30 days; if you find it useful and wish to continue using it, you should register your copy.
When you register your copy of Statbot, you are automatically entitled to many benefits and free services. Most of these services are found in The Penthouse, a private area that is part of Club Statbot, the primary support site. All registered users will receive a free account and password that will give them access to this area.
The Penthouse will contain items not available to non-registered users:
Individual Registration - $10
To register your personal copy of Statbot, fill out
this form,
use your Browser to print it, and send it along with $10 to the address
provided. As soon as your registration has been processed, you will receive
a welcome message that contains your unique login name and password for
access to The Penthouse.
Business Registration - $20
To register you business/commercial copy of Statbot, fill out
this form,
use your Browser to print it, and send it along with $20 to the address
provided. As soon as your registration has been processed, you will receive
a welcome message that contains your unique login name and password for
access to The Penthouse.
Quantity discounts and site licenses are available. Charitable organizations may request a free license and registration. Contact the author at one of these addresses below for more information:
Dave Tubbs
Email: dtubbs@xmission.com
Web: http://www.xmission.com/~dtubbs/
The process of installing Statbot is very simple:
For DOS Hosts:
> pkunzip -d sb200dos.zip
For Sun Hosts running 4.1.X:
% uncompress sb200sun.tar.Z
% tar -xvf sb200sun.tar
For Sun Hosts running Solaris 2.X:
% uncompress sb200sol.tar.Z
% tar -xvf sb200sol.tar
statbot - Statbot executable binary.
statbot.cfg - Statbot configuration file.
country.txt - Country code file.
readme.1st - Last-minute information and installation notes.
legal.txt - Legal notices.
register.txt - Statbot registration form.
open.txt - Example text file.
close.txt - Example text file.
ps.gif - Example transparent GIF file (small version).
psh.gif - Example transparent GIF file (large version).
Before Statbot can be configured, a brief introduction to the Host logfile format will help the User understand the issues involved in properly setting up their configuration file.
The Host logfile is a text file that contains one log entry per line. The log entry consists of 10 fields separated by space codes. The following line is an example of a typical log entry:
slc6.xmission.com - - [04/Feb/1995:04:58:28 -0700] "GET /~dtubbs/ HTTP/1.0" 200 4502
In the example above, the log entry is broken down into the 10 fields in the following manner:
Remote Name
This field contains the name of the site making access. In this example, the
name of the site is slc6.xmission.com.
Remote Logname
This field contains the login name of the user who owns the account making
access. This is one of two fields in the logfile that can be enabled or
disabled by the Host. In this example it is disabled, so it is represented
by the '-' place holder. This field is disabled on most systems because the
information can be used to gain unauthorized access.
User
This field contains the full name of the user who owns the account making
access. This is the other field in the logfile that can be enabled or
disabled by the Host. In this example it is disabled, so it is represented
by the '-' place holder. This field is disabled on most systems because the
information can be used to gain unauthorized access.
Date and Time
This field contains the time and date of the access. In this example, the
access occurred on February 4, 1995 at 4:58:28 in the morning. The time
will be displayed in 24 hour time format.
GMT Offset
This field contains the signed offset from Greenwich Mean Time, which is the
international time reference. In this example, the Date and Time information
in the previous field is 7 hours and 0 minutes earlier than GMT.
Operation
This field contains the type of operation requested. For WWW page accesses,
this field will always be GET.
File
This field contains the path and filename of the WWW page being accessed.
There are three types of path/filename combinations:
Server Protocol
This field contains the type of protocol used to access the page. In this
example, HTTP version 1.0 protocol was used.
Status
This field contains the status code generated during the access. In this
example, the value 200 was generated.
File Size
This field contains the size of the file accessed in bytes. In this example,
the file was 4,502 bytes in size.
In many cases, the Host is configured to access the User's home page using implied path and filenames. These can be difficult to analyze because the User must know how the Host WWW Server has been configured. Here are some examples of home page references using implied path and filenames:
Before Statbot can run, it needs to know about its environment: the location of the logfiles, the login name of the user, where to copy the resulting HTML file, etc. This information is conveyed to Statbot by means of a configuration file called statbot.cfg.
The configuration file contains a series of keywords and associated text and works in a manner similar to the DOS and UNIX environment variables. There are several rules that must be followed when creating or editing the configuration file:
When Statbot is executed, the first thing it does it try to locate the configuration file. It does this using the following sequence:
Once Statbot has found the configuration file, it reads the file and extracts all the information it needs to run. It also reads any optional entries and then customizes itself accordingly.
To properly configure Statbot, the following entries in the configuration file must be set:
WWW_LOG_DIRECTORY
This keyword specifies the directory Statbot will search when it reads the
Host logfiles.
WORK_DIRECTORY
This keyword specifies the directory Statbot will use when running. This
directory is where Statbot will keep its site database and where is writes
all of its temporary files. This is also the location any optional files
must be placed.
HTML_DIRECTORY
This keyword specifies the directory Statbot will place the resulting
'stats' page when it is finished. This usually references the directory
where the User stores all of their WWW pages. If you do not want the file to
be copied, set this value to the same value associated with the
WORK_DIRECTORY keyword.
WWW_LOG_LIST
This keyword specifies the name(s) of the Host logfiles. Up to 16 logfiles
may be included in the list. On UNIX systems, the default logfile name is
set to 'access_log'. On DOS systems, it is set to 'access.log'.
WWW_LOG_TYPE
This keyword specifies the format of the Host logfiles. At the present time,
there is only one type supported. This value is set to 'default'.
USER_ID
This keyword specifies the login name of you, the User. This entry is used
to scan the logfiles for implied references to your WWW pages.
OUTPUT_HTML
This keyword specifies the name of the 'stats' page. The default name is set
to 'ps.html'.
There are several factors involved in choosing a default configuration. Select the one that seems to be the best match with your current Host configuration and try it out. If the results aren't what you expected, try one of the other configurations. Entries in italics indicates text that the User should change in the configuration file.
Single User
If you are running your own WWW server on a workstation or personal
computer, use this configuration (note the space code after the '/'):
USER_ID: nullInternet Host/Personal Account
WWW_PAGE_LIST: "/ " , "My Home Page Title"
USER_ID: yourLoginNameInternet Host/Page Server
*WWW_PAGE_LIST: commented out (note the '*' at the first of the line)USER_ID: null
WWW_PAGE_LIST: youDir/yourHomePageName , "My Home Page Title"
USER_ID: null
WWW_PAGE_LIST:
dir/user1HomePageName , "User1 Home Page Title",
dir/user2HomePageName , "User2 Home Page Title",
.
. (other user entries)
.
dir/userNHomePageName , "UserN Home Page Title"
Running Statbot is as simple as typing statbot on the command line, followed by 'return'. In normal execution, there are no command-line parameters that need to be entered. All important information resides in the configuration file that Statbot locates and loads when it first starts.
When Statbot is executed, a number of user messages are displayed to inform the user about Statbot's progress:
Loading the configuration...
Statbot has found the configuration file and is loading it.
Checking for another running copy...
Statbot is checking to see if another copy is running.
Loading the site database...
Statbot is loading the local site database file.
Processing new logfile entries...
Statbot is scanning all of the specified logfiles for new entries to be
added to the local database. The number of entries will be dynamically
displayed on the command line as they are processed.
Saving the new site database...
Statbot is saving a new copy of the local site database file.
Building HTML page...
Statbot has started the process of 'stats' page construction.
Copying HTML page to "/dir/dir/ ... /dir/outputfile"...
Statbot is copying the resulting HTML document to the user-specified
directory.
Once Statbot has completed its processing, it displays the new page access statistics. These are the same totals that are included in the HTML document. The HTML-compliant page that Statbot generated will have been copied into the user-specified directory and should be available for access.
The basic Statbot statistics report is fairly common; nothing but a few digits indicating the number of accesses for the day. One of Statbot's main features is the power and flexibility of custom report generation. By enabling optional report types, the User can add interesting information to their page statistics report:
The following list of keywords are all optional. They may be enabled by including them in the configuration file; they may be disabled by commenting out the line(s) containing them, or they may be completely omitted from the configuration file.
COUNTRY_ID
This keyword contains a 2-character country identification found in the country.txt file, which is located in the work directory. This keyword is used to suppress the addition of a single country name to the visitor lists.For example, if a User lives in the United Kingdom and wishes to include the country extensions, they would enable the COUNTRY_NAMES keyword. This would display all country origins (including the United Kingdom). If the User sets COUNTRY_ID to UK, all country extensions would be displayed except for the United Kingdom.
If this keyword is enabled, the following keywords must also be included:
DAILY_SORT_KEY
This keyword specifies the type of sorting to be done. If the value is set to alphanumeric, the list will be sorted alphabetically. If the value is set to date, the list will be sorted by the access time and date. If the value is set to visits, the list will be sorted by the number of visits the site has made.DAILY_SORT_ORDER
This keyword specifies the direction of sorting. If the value is set to forward, the sort will be done in ascending order. For example, an alphabetical list would start with Adams and end with Williams. If the value is set to reverse, the sort will be done in descending order - the list would start with Williams and end with Adams.
If this keyword is enabled, the following keywords must also be included:
FULL_SORT_KEY
This keyword specifies the type of sorting to be done. If the value is set to alphanumeric, the list will be sorted alphabetically. If the value is set to date, the list will be sorted by the access time and date. If the value is set to visits, the list will be sorted by the number of visits the site has made.FULL_SORT_ORDER
This keyword specifies the direction of sorting. If the value is set to forward, the sort will be done in ascending order. For example, an alphabetical list would start with Adams and end with Williams. If the value is set to reverse, the sort will be done in descending order - the list would start with Williams and end with Adams.
WWW_PAGE_LIST:
dtubbs/ps.html,"Page Statistics",
george/cs.html,"Club Statbot",
dtubbs/db.html,"<A HREF=\"db.html\">Downloadable
Binaries</A>"
From the configuration file text above, the following entries would be made in the 'stats' page:
Individual Web Page Visits:
Page Statistics: 24
Club Statbot: 12
Downloadable Binaries: 28
Make note of the fact that a hypertext link was included as part of the text in the 'Downloadable Binaries' entry, and that the \" escape sequence was used to format it properly.
BACKGROUND_TEXT: mybkgnd.gifBackground files can also be in directories other than WORK_DIR. Just use a relative path. The following example show a relative path:
BACKGROUND_TEXT: ../images/mybkgnd.gifTo add the document attributes, you must build a complex text string. This includes using the \" sequence to insert the double-quote character. Here is an example:
BACKGROUND_TEXT: "mybkgnd.gif text=\"#ffffff\" link=\" #ff0000\" vlink\"#00ff00\" alink=\"#00ffff\""Note that each time the string required an embedded " character, the escape sequence \" was used. Also notice that the entire string is enclosed by double-quotes. All the document attributes may be used; the only requirement is that the entire HTML line must be contained on a single line in the statbot.cfg file. Line-wrapping is not allowed.
The following example would exclude my site (XMission):
WWW_EXCLUDE_LIST: xmission.comThe following example would exclude my site, and anyone from Netscape:
WWW_EXCLUDE_LIST: xmission.com,netscape.comThe following example would exclude the education and military domains:
WWW_EXCLUDE_LIST: .edu,.milThe following example would exclude everyone above, as well as any site from the United Kingdom (not that I've got anything against them!):
WWW_EXLUDE_LIST: xmission.com,netscape.com,.edu,.mil,.uk
There is no limit on the number of sites, groups, or domains that can be listed under the WWW_EXCLUDE_LIST keyword.
Statbot's database contains entries for every site that accesses the User's WWW page(s). It also contains a list of timestamps for each site. This information is processed and data regarding 'unique' and 'total' accesses is determined. The difference between these two types of accesses is best illustrated using an example:
If tonto@lone.ranger.com accessed the User's WWW page at 10am and later that afternoon at 2pm; and sword@zorro.com accessed the User's WWW page at 9:30am, 12:30pm, and then again at 11:30pm, there would be 5 total accesses. There would also be 2 unique accesses, one for each visiting site.
Running Statbot Automatically
Up until now, you have been running Statbot manually - you type
statbot on the command line, hit return, and watch as Statbot gathers
the WWW statistics. At times, it can become inconvenient to run Statbot
manually, and makes sense to let the Operating System automatically execute
it.
Under UNIX, the two programs of choice are at and cron. The 'at' command allows a single, one-time execution of any UNIX command at a specified time or date. The 'cron' command is a program scheduler that allows the User to specify the time(s) during the day and day(s) of the week for program execution. For information on these commands, refer to your Host 'man' pages or references manuals.
When using cron to invoke Statbot automatically, the configuration file MUST be in the User's home directory. This is the only location that Statbot will be able to find the file in due to the environment that is established by cron prior to executing the script.
"I want to use Statbot, but I can't find the binary archive for my particular computer system. Where can I get it?"
New binary archives are added as soon as they become available. If the one you need is not found in the list, contact the author (especially if you know software)."The first time I run Statbot, it seems to take forever. I give up after 5 minutes. What is happening?"
The first time Statbot is run, it must parse all Host logfiles and create a new database file. It is not uncommon to have Host logfiles greater than 50 Megabytes in size (or roughly 1,500,000 logfile entries). Depending on your system's load and the size and quantity of the logfiles, it can take Statbot 10 minutes or longer to run the first time. After this, the speed-up algorithms kick in and Statbot only requires a few seconds to update the database."Statbot seems to run correctly, but reports zero accesses. I know people have visited my home page. Why is this?"
This is the most common problem new Statbot users encounter, and it is usually due to missing information in the configuration file. Refer back to the Configuration section for more information, and pay particular attention to the WWW_PAGE_LIST keyword. If you've already done this, check out the online support areas in Club Statbot."I don't want to count visits from my own site in the statistics. Is there any way to exclude my own site?"
The WWW_EXCLUDE_KEYWORD was added for this very purpose. Refer to the Customizing Statbot section above for all the details."How many WWW pages can Statbot monitor?"
Statbot has the capability of monitoring an unlimited number of pages. The only real limitation is the size of your disk and the amount of system memory available."How many hits/day can Statbot process? How fast does it run?"
Statbot is running on sites that are visited more than 2,000,000 times each day. It can generate statistics on-the-fly due to the intelligent processing algorithms that are used. When Statbot is run once every 5 or 10 minutes, it takes less than 10 seconds each time to update the statistics."I really like the page title GIF files that came with Statbot. Are any more of them available?"
Dozens of 2D and 3D GIF files can be found in The Penthouse. They are only available to registered Statbot users."Is source available? If so, where is it located?"
The very latest ANSI 'C' source code to Statbot can be found in The Penthouse. It is only available to registered Statbot users."I tried to visit The Penthouse, but I couldn't get in. Why not? "
The Penthouse is a protected WWW page that requires a username and password to gain access to it. Only registered users are given a password. This page contains the Beta release of the next version of Statbot, as well as free software utilities, special offers, dozens of free GIF's designed especially for Statbot users, and other assorted goodies."What does the future hold for Statbot"
The next release of Statbot will include full support for auto-generated GIF images. These GIF's will show your statistics graphically using a variety of bar charts, pie charts, line graphs, and other visual methods. The alpha release will be online in September, and the beta-release will be available to all registered users before the end of October."I still can't get Statbot to work properly - where do I get technical support?"After the next release, the long-awaited and often-delayed database utilites will be finished. These utilities allow you to access the Statbot database and extract information such as visitor lists. There will also be ways to edit the database. For example, the User could delete all entries prior to a given date.
The first place to check for help would be Club Statbot. There will be several online support areas devoted to configuration issues, bug lists, and last-minute announcements. If the information found here doesn't help, contact the author at the following Email address or use the Email form found on the WWW page:Dave Tubbs
Email: dtubbs@xmission.com
Web: http://www.xmission.com/~dtubbs/
This procedure is designed to help experienced users install and configure Statbot as quickly as possible. Several 'default' conditions are assumed, so some of the more interesting options may not be enabled after this 'Quick Start' process is completed. Once Statbot has been installed and is running, it is strongly recommended that the User read the rest of the manual, particularly the Customizing Statbot and Advanced Topics sections.
Follow these instructions step-by-step and Statbot should be up and running in just a few minutes:
***** Unable to locate "statbot.cfg"
This message is displayed if Statbot is unable to locate the configuration
file. Refer to the Configuration section for information regarding
the proper location of this file.
***** Unable to load the parameters...
This message is displayed if an error occurred while Statbot was opening or
reading the configuration file. This error occurs if the formatting or text
content of the file is incorrect. It can also occur if the file permissions
have been set to prevent reading the file. Refer to the Configuration
section for information regarding the proper formatting and text content of
the configuration file.
***** Statbot is already running - TERMINATED!
This message is displayed if Statbot found another Statbot process running
on the system. This message can appear on multitasking operating systems
like Unix and Windows NT. It can also occur if the previous Statbot process
was prematurely aborted. If you are positive there are no other running
copies of Statbot, delete the 'statbot.pid' file in the work directory and
try again.
***** Unable to load the site database file...
This message is displayed if an error occurred while Statbot was opening or
reading the database file. This error occurs if the database file has been
corrupted or if the WORK_DIRECTORY keyword in the configuration file
is set incorrectly. It can also occur if the file permissions have been set
to prevent reading of the file.
***** Unable to save the site database file...
This message is displayed if an error occurred trying to update the database
file. This error can occur if the WORK_DIRECTORY keyword in the
configuration file is set incorrectly. It can also occur if the file
permissions have been set to prevent writing into the specified directory.
***** Unable to create the HTML page...
This message is displayed if an error occurs while trying to create or write
the HTML output page. This error can occur if the HTML_DIRECTORY
keyword in the configuration file is set incorrectly. It can also occur if
the file permissions have been set to prevent writing into the specified
directory.
Lawrence Landweber and the Internet Society for creating and maintaining the international connectivity table from which the 'country.txt' file is derived.