
The ECN No Name Newsletter is no longer being published.
This is an archived issue.
[previous article]
[next article]
ECN Analog For Web Page Statistics
Kyler Laird
kyler@purdue.edu
Along with the introduction of our new HTTP server, Apache, ECN is
introducing a new Web page statistics program, Analog. Analog is a
replacement for the current statistics gathering tool, Pagecount.
Pagecount was created to satisfy the user need for estimates of how often
individual Web pages were being viewed. In order to function, Pagecount
required installation as part of every page being counted. This application
was a good first generation statistics program; however, it did not offer
enough information to please many users and all too frequently malfunctioned,
resulting in total data loss.
Analog was created at the University of Cambridge Statistical Laboratory by
Stephen Turner. Its HTTP (Web) interface has been modified
for use on the Engineering network so that Netscape upgraded to Version
3.0,3.01 can easily handle long-term log analysis.
Differences
Unlike Pagecount, Analog is a log analyzer which works in cooperation with
the HTTP server. This offers several advantages:
- There is no setup for basic Analog usage. As soon as your HTTP server
is running Apache, the statistics concerning page use are gathered.
- Access statistics about collections of pages can be analyzed in a single
query to Analog. Pagecount required logs for individual pages. With Analog,
complex queries can be made.
- There's more information available. Analog analyzes the entire line from
the log file for each access. You can now see, for example, which pages
referred users to your pages.
- Analog's data files are simple compressed text which are easily
manipulated. Instead of the single large files of Pagecount, Analog uses
gzipped text files to store the HTTP server logs. Each file holds
all of the entries for a single day. This allows efficient access to the
log entries of interest, and it simplifies "trimming" of old logs.
- Analog's data can be stored centrally or in a user-chosen location. Each
host will store its archived logs for a limited time period. Storage life of
this data is dependent on the available space and the level of HTTP activity.
If you only need to look at access statistics for the past week, you will
probably never need to store any of your own data.
If you want to store long-term access data, you may use our HTTP log
extraction utility which daily pulls the log entries you specify. This
data can be stored (compressed) in any ECN directory where you have write
permissions. Look for more information about this tool in an upcoming issue.
Queries
Run Analog on the host serving the pages of interest. For example, to get
statistics about pages served from the host "ce", you would start with the
form at
http://ce.ecn.purdue.edu/tools/analog/
Report Choices
Complete the section by checking the box to the left of each report that you
would like generated. Some reports also have settings which you can change.
In the directory report, for example, you can set the depth to which the
report will print. If you would like an overview of which sections are
getting the most "hits," choose a low number like 1 or 2. If you want to
see specifically which directories are getting hit, choose a higher number.
Hit Filter
Here you can specify which accesses you want Analog to use when creating
its statistics.
The "from date" and "to date" accepts dates of the form
"19970428"
(YYYYMMDD) and also relative dates such as
"-7"
(for "one week ago"). To analyze last week's data, use a "from date" of
"-7"
and a "to date" of
"-1" .
The log archive directory is used to specify the location of archived log
files. It will be explained more in a later ECN newsletter.
The include paths and exclude paths fields are used to
narrow the scope of the analysis based on the path to the documents
(the URLs). To see statistics for a single user, "me",
set the include path to
"/~me/*"
and leave the exclude path blank.
To eliminate a directory, icons,
set the exclude path to "*/icons/*" .
For statistics about a single document, like the index file for a research
project, set the include path to "/~me/research/" .
Multiple paths may be given for each field using comma-separated lists.
Layout
The title and URL field are only used in the title and top heading in the
generated report. They do not affect the content of the report.
Usage
It is likely you will develop frequently used queries. Once you make a query,
store the URL for the resulting report in a bookmark list or HTML page for
easy access. If you want users to see statistics concerning your page, consider
putting a link at the bottom of the page to a specific Analog report.
webmaster@ecn.purdue.edu
Last modified: Monday, 13-Jul-98 07:52:30 EST
HTML