AWStats: Overview of Apache/IIS log analysis tools (ZZ)

Source: Internet
Author: User
Tags geoip apache log reverse dns
Document directory
  • Download/install
  • Configuration File naming rules: AWStats. sitename. conf
  • Minimal configuration file modification: logfile sitedomain logformat
  • Log Analysis
  • Statistical Output
  • Automatic Running of log statistics
  • Multi-site log statistics
  • Statistical indicator description
  • Hacking AWStats

Author: chelong Email: chedongATbigfoot.com/chedongATchedong.com

Last Update written on: 2003/04:

02/22/2006 14:42:55
Feed Back> (Read this before you ask question)

Copyright Disclaimer: You can reprint the document at will. During reprinting, you must mark the original source and author information of the article as hyperlinks and this statement.
Http://www.chedong.com/tech/awstats.html

Keywords: awstats web log analysis apache iis log analysis open source

You do not have to patiently read all the content: brief installation instructions are as follows:
Install
====
Http://sourceforge.net/projects/awstats/ download installation package:
GNU/Linux: tar zxf awstats-version.tgz
By default, awstats scripts and static files are stored in the wwwroot Directory: deploy awstats. pl programs to/home/apache/cgi-bin/awstats /.
Mv awstats-version/wwwroot/cgi-bin/path/to/apache/cgi-bin/awstats
# Copy the icons and other file directories to the HTML file publishing directory of the WEB:/home/apache/htdocs/
More batch update scripts can be stored in the cgi-bin/awstats/directory under the tools directory.
Upgrade major domestic search engine definition: http://www.chedong.com/tech/search_engines.pm

Configuration
====
Name the default awstats. model. conf as common. conf.
Modify
LoadPlugin = "decodeutfkeys"

Create an awstats Directory: data directory for statistical data output

Set the configuration file as follows:
Include "common. conf"
LogFile = "/home/apache/logs/access_log. % YYYY-24 % MM-24 % DD-24"
SiteDomain = "www.chedong.com"
HostAliases = "chedong.com"
DefaultFile = "index.html"
DirData = "/home/cgi-bin/awstats/data /"

Summary: Introduction to AWStats and some configuration improvements. We are glad to see that at the beginning of awstats 6.3, Chinese users basically only need to enable loadplugin = "decodeutfkeys" in the configuration file, so there is basically no statistical problem with the Chinese search engine, # minor Chinese search engines 'Baidu /. com ', 'search /. sina /. com ', 'search /. sohu /. com ', the three search engines. Includes patches for major search engines and portal websites in China. (unpack the patches and overwrite the original program directory under the LIB/directory)

The log statistics system plays an important role in site user behavior analysis, especially for keyword access statistics from search engines: it is a very effective data source for user behavior analysis. With the development of the Internet over the years, Web Log statistics tools have become more and more mature and feature-rich. Many of them are open source code, and AWStats is an excellent one.

AWStats: Advanced Web Statistics

AWStats is a fast-growing Perl-based web log analysis tool on SourceForge. Compared With Webalizer, an excellent open-source log analysis tool, AWStats has the following advantages:

  1. User-friendly: You can directly call the corresponding language interface (Simplified Chinese Version) based on your browser)
    Reference output sample: http://www.chedong.com/cgi-bin/awstats/awstats.pl? Config = chedong
  2. Perl-based: the system can run on GNU/Linux or Windows (after ActivePerl is installed), and the log analysis directly supports the Apache format (combined) and IIS format (need to be modified ). Although Webalizer has a Windows platform version, it lacks maintenance;
    AWStats allows you to use a system to collect statistics on different WEB servers on your website: GNU/Linux/Apache and Windows/IIS servers.
  3. High Efficiency: the AWStats output statistics project is much richer than Webalizer, and the speed can still reach about 1/3 of Webalizer. This speed is sufficient for websites with millions of daily visits;
  4. Convenient configuration/customization: The system provides flexible but reasonable configuration rules by default. The default configuration to be modified cannot exceed 3 or 4 items to start running, there are also many plug-ins for Modification and extension;
  5. AWStats designers are designed for precise "Human visits", so robot access to many search engines is filtered out, therefore, the statistics may be lower than those of other log statistics tools. Access from within the company can also be filtered out through IP address filtering settings.
  6. Provides many extended Parameter Statistics functions: using the ExtraXXXX series configuration to generate Parameter Analysis for specific applications is very useful for product analysis.

For more information about Webalizer and analog, see:
Http://awstats.sourceforge.net/#COMPARISON

AWStats installation memo

The AWStats running mode is as follows:

  1. Log Analysis: after running, the log statistics are archived into an AWStats database (plain text;
  2. Then there is the output in two forms:
    • One is to read the statistical result database output through the cgi program;
    • One is to run the background script to export the output to a static file;

The following are two log statistics examples for a single site:
One is CGI-based output on GNU/Linux,
One is static page-based export on Windows 2000

Download/install

Http://sourceforge.net/projects/awstats/ download installation package:

GNU/Linux: Tar zxf awstats-version.tgz
By default, awstats scripts and static files are stored in the wwwroot Directory: deploy awstats. pl programs to/home/apache/cgi-bin/awstats /.
Mv awstats-version/wwwroot/cgi-bin/path/to/apache/cgi-bin/awstats
# Copy the icons and other file directories to the HTML file publishing directory of the WEB:/home/apache/htdocs/
More batch update scripts can be stored in the cgi-bin/awstats/directory under the tools directory,

Windows 2000: run the script in the background mode. Unpack the package and move it to the D:/AWStats directory.
Copy the icon directory to the publishing directory of IIS: inetpub/icon

Data Source log format and day-based truncation rules
  1. For Apache: Set the log format to combined. to truncate logs, you must install the cronolog tool to truncate logs by day:
    CustomLog "|/usr/local/sbin/cronolog/path/to/apache/logs/access_log. % Y % m % d" combined
    For example: logs/access_log.20030326
    Logs are compressed. You can use gzip-d
  2. For IIS: by default, there are good daily log truncation rules, but the IIS log format is not suitable for AWStats statistics,
    Therefore, it is best to remove all log fields directly and set them strictly according to the following list:
    • Date
    • Time
    • Customer ip address c-ip
    • User name cs-username
    • Method cs-method
    • URI resource cs-uri-stem
    • Protocol status SC-status
    • Number of sent bytes SC-bytes
    • Protocol version cs-version
    • User Agent cs (User-Agent)
    • Refer to cs (Referer)

    Compared with IIS default settings:
    There are:

    • Server IP Address
    • Server Port
    • URI query

    Added:

    • Sent bytes
    • Protocol version
    • Reference
Configuration File naming rules: awstats. sitename. conf

AWStats. pl automatically calls the configuration file awstats. sitename. conf of the corresponding Site Based on the site name.
For example, running./awstats. pl-config = chedong calls the awstats. chedong. conf configuration file in the same directory;
If-config is not specified, awstats. conf or/etc/awstats. conf in the current directory will be used as the default configuration file.
So it is best to rename the default awstats. model. conf to awstats. yoursite. conf; for example: awstats. chedong. conf,

For statistics on multiple sites, the AWStats configuration file inclusion function is still very useful. We can put General configurations in a document and then use it (Versions later than 5.4 will be supported) the Include configuration includes the general configuration in the header of each specific configuration file, and then overwrites the corresponding attributes in the general configuration with other configurations, such:
Include = "common. conf"
LogFile = "/path/to/bbs/access_log"
SiteName = "bbs.chedong.com"

Minimal configuration file modification: LogFile SiteDomain LogFormat

For Statistics of Apache logs on GNU/Linux, you only need to modify the two options: LogFile SiteDomain.

  1. GNU/Linux LogFile = "/path/to/apache/logs/access_log. % YYYY-24 % MM-24 % DD-24"
    Windows 2000 LogFile = "d:/iis_logs/W3SV3/ex % YY-24 % MM-24 % DD-24.log"
    This configuration indicates the log file name spelled out by the year, month, and date 24 hours ago;
  2. SiteDomain = "www.chedong.com"
    The site name. The default value is null. If it is null, AWStats will refuse to run;
  3. You need to modify one more log for IIS statistics:
    LogFormat = 2
    The default value is 1: Apache log, and 2 is IIS log.

Other precautions:
Awstats if the swf file is not filtered, The. swf file will be converted to PageView. Therefore, if the swf file on the site is primarily an advertisement, it is best to filter it out:

Log Analysis

./Awstats. pl-update-config = sitename-lang = cn
For example:./awstats. pl-update-config =Chedong
Automatically calls awstats.Chedong. Conf configuration file

Statistical Output

GNU/Linux http: // localhost/cgi-bin/awstats. pl? Config = chedong
Windows 2000 http: // localhost/awstats/awstats.chedong.html

Automatic Running of log statistics

On GNU/Linux: crontab-e: Run at 08:10 every day
# Update awstats
10 8 *** (cd/path/to/apache/cgi-bin/awstats/;./awstats. pl-update-config = chedong)

On Windows 2000: set to run at 08:10 every day
D:/perl/bin/perl.exe D:/AWStats/tools/awstats_buildstaticpages.pl-Update-Config = chedong-lang = cn-Dir = C: /inetpub/AWStats/-awstatsprog = D:/AWStats/wwwroot/cgi-bin/AWStats. PL

Multi-site log statistics

AWStats comes with a batch processing tool: Tools/awstats_updateall.pl, which can traverse all configuration files in a directory in batches and run statistics. Therefore, the remaining work is mainly due to log synchronization issues.

For multiple sites, many configuration options are repeated. If each configuration file is modified and maintained, it will be very troublesome. AWStats provides the configuration file inclusion function starting from version 5.4, so we can configure a general configuration, such as: Common. conf

The configuration of other sites is as follows: You can overwrite the configuration that is inconsistent with the default one using the following options.
AWStats. bbs. chedong. conf
Include "chedong. Common. conf"
Logfile "/path/to/bbs_log"
Sitename "bbs.chedong.com"

AWStats. www. chedong. conf
Include "chedong. Common. conf"
Logfile "/path/to/www_log"
Sitename "www.chedong.com"
Hostaliases = "chedong.com"

Statistical indicator description
  • Visitor: collect statistics based on unique IP addresses of visitors. One IP address represents one visitor;
  • Number of visits: A visitor may visit multiple times within one day (for example, once in the morning or once in the afternoon), so within a certain period of time (for example, 1 hour ), number of non-repeated IP addresses, visitor visits;
  • Webpage count: the total number of page visits, excluding images, CSS, and JavaScript files. However, if a page uses multiple frames, each frame is regarded as a page request;
  • Number of Files: Total number of file requests from the browser client, including images, CSS, JavaScript, etc. The user requests a page, if the page contains images, etc, therefore, the server will send multiple file requests, and the number of files is generally far greater than the number of files;
  • Byte: the total data traffic sent to the client;
  • Data from REFERER: the reference (REFERER) field in the log records the address before accessing the corresponding webpage. Therefore, if you click to enter the website through the search results of the search engine, the log contains the user's query address in the corresponding search engine. This address can be parsed to extract the keywords used by the user's query:
    For example:
    15:43:58 123.123.123.123.123-GET/index.html 200 192 HTTP/1.1 Mozilla/4.0 + (compatible; + MSIE + 5.01; + Windows + NT + 5.0) http://www.google.com/search? Q = chedong
    AWStats has complete functions in key phrases and keyword statistics of search engines: it can identify more than types of Web Crawlers around the world, it can also identify most mainstream international search engines and local language search engines in many regions.
Hacking AWStats

IIS patch by GMT: awstats. pl
The IIS log time is between Greenwich Mean, and there is a + 8-hour gap between local time and GMT in China. If you use the TIMEZONE plug-in directly to convert from Greenwich Mean Time, the performance will be reduced by 40%, here is a patch that modifies the time coordinates by local time:

7696d7695
<My $ TIME_ZONE = 8;
7698,770 2c7697
<My $ ix_local = $ ix + $ TIME_ZONE;
<If ($ ix_local> = 24 ){
<$ Ix_local = $ ix_local-24;
<}
<Print "$ ix_local/n"; # width = 19 instead of 18 to avoid a MacOS browser bug.
---
> Print "$ ix/n"; # width = 19 instead of 18 to avoid a MacOS browser bug.
7708,771 2c7703
<My $ ix_local = $ ix + $ TIME_ZONE;
<If ($ ix_local> = 24 ){
<$ Ix_local = $ ix_local-24;
<}
<My $ hr = $ ix_local + 1; if ($ hr> 12) {$ hr = $ hr-12 ;}
---
> My $ hr = ($ ix + 1); if ($ hr> 12) {$ hr = $ hr-12 ;}


The definition of major Chinese search engines has been added after Awstats 5.5: Here is the complete list after the Supplement (including the main portal search and search portals)
62600.
<"Baidu/. com", "search/. sina/. com", "search/. sohu/. com ",
---
> "Baidu /. com "," sina /. com "," 3721 /. com "," 163 /. com "," tom /. com "," sohu /. com ",

153c144
<"Baidu /. com "," word = "," search /. sina /. com "," word = "," search /. sohu /. com "," word = ",
---
> "Baidu /. com "," word = "," sina /. com "," word = "," 3721 /. com "," name = "," 163 /. com "," q = "," tom /. com "," word = "," sohu /. com "," word = ",

250c234
<"Baidu/. com", "Baidu", "search/. sina/. com", "Sina", "search/. sohu/. com", "Sohu ",
---
> "Baidu /. com "," Baidu "," sina /. com "," Sina "," 3721 /. com "," 3721 "," 163 /. com "," NetEase "," tom /. com "," Tom "," sohu /. com "," Sohu ",

Some query patches are required for Google Unicode query:
Because Google for Windows 2000 and later IE browser by default to send the query is in UTF-8 format, while most of the other search engines use the system local encoding: GB2312, so you need to decode the query URI, it also depends on whether the UTF-8 is used for GB2312 transcoding, otherwise the same word will have two records in the statistics: UTF-8 and GB2312.

I added the following functions for decoding Google UTF-8 characters and Decoding for queries like "/xc4/xbe/xd7/xd3/xc3/xc0"
Sub Utf8_To_Ascii {
My $ string = shift;
My $ encoding = shift;

# Change/xc4/xbe/xd7/xd3/xc3/xc0 into % c4 % be % d7 % d3 % c3 % c0
$ String = ~ S /// x (/w {2})/%/1/gi;

# Uri unescape
$ String = uri_unescape ($ string );

If ($ string = ~ M/^ ([/x00-/x7f] | [/xc2-/xdf] [/x80-/xbf] |/xe0 [/xa0-/xbf] [/x80- /xbf] | [/xe1-/xef] [/x80-/xbf] [/
X80-/xbf] |/xf0 [/x90-/xbf] [/x80-/xbf] [/x80-/xbf] | [/xf1-/xf7] [/x80 -/xbf] [/x80-/xbf] [/x80-/xbf]) * $ /)
{
$ String = decode ("UTF-8", $ string );
$ String = encode ($ encoding, $ string );
}

# Trim space
$ String = ~ S/^/s + //;
$ String = ~ S // s + $ //;

# Reverse "+", ";" to space
$ String = ~ S/; + // g;
$ String = ~ S/s + // +/g;

# Print $ string. "/n ";
Return $ string;
}

Here are more patches for GOOGLE UTF-8 queries.

Install GIS-based plug-ins:

GeoIP and Geo: IPfree (awstats 5.5 +)
GeoIP and Geo: IPfree are free of charge. They are more accurate and faster than those obtained through reverse DNS resolution. GeoIP APIs are free of charge, and the default library is free of charge. It charges for its Data Update Service. Geo: IPfree not only makes code public, but also makes library data public. Therefore, you can customize it yourself. I once imagined a ing from a Chinese city to an IP address.

GeoIP installation:
Download the C library first: After GeoIP C is unwrapped
%./Configure; make
# Make install

Download the Perl Library: GeoIP Perl.
% Perl MakeFile. PL; make
# Make install

Geo: IPfree installation:
Download Geo: IPfree
% Perl Makefile
% Make
# Make install

Configuration: Enable Plug-In GeoIP or Geo: IPfree in the configuration file

References:

AWStats
Http://awstats.sourceforge.net/

Webalizer
Http://www.webalizer.org/

Log Analysis Tools
Http://directory.google.com/Top/Computers/Software/Internet/Site_Management/Log_Analysis/

Business Log statistics/analysis tools
Http://directory.google.com/Top/Computers/Software/Internet/Site_Management/Log_Analysis/Commercial/

Merge logs of multiple sites:
Http://www.chedong.com/tech/rotate_merge_log.html

Log statistics are of great significance for analyzing the impact of search engines on sites.
Http://www.chedong.com/tech/google.html

AWStats itself also contains a lot of plug-ins, including the statistical summary of multiple sites output again, IIS log time conversion, URL header ing and other http://awstats.sourceforge.net/awstats_contrib.html

<A href = "http://www.chedong.com/tech/awstats.html"> http://www.chedong.com/tech/awstats.html </a>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.