Full solution for Web server log statistical analysis (2)

Source: Internet
Author: User
Tags apache access log apache log
4.2 using the rotatelogs provided by apache to implement log rotation apache provides the ability to send logs to another program through pipelines instead of Directly Writing logs to files, in this way, the log processing capability is greatly enhanced. The program obtained through the pipeline can be any program, such as log analysis and log compression. To write logs

4.2 using the rotatelogs provided by apache to implement log rotation apache provides the ability to send logs to another program through pipelines instead of Directly Writing logs to files, in this way, the log processing capability is greatly enhanced. The program obtained through the pipeline can be any program, such as log analysis and log compression. To write logs

4.2 Use the rotatelogs provided by apache to implement log Rotation

Apache provides the ability to send logs to another program through pipelines instead of Directly Writing logs to files. This greatly enhances the log processing capability, the program obtained through the pipeline can be any program, such as log analysis and log compression. To write logs to the pipeline, you only need to replace the content of the log file in the configuration with "| program name", for example:


# Compressed logs

CustomLog "|/usr/bin/gzip-c>/var/log/access_log.gz" common



In this way, you can use the rotatelogs tool that comes with apache to round-robin log files. Rotatelogs is used to control logs on time or by size.


CustomLog "|/www/bin/rotatelogs/www/logs/secfocus/access_log 86400" common



In the preceding example, the apache access log is sent to the program rotatelogs. rotatelogs writes the log to/www/logs/secfocus/access_log, and rounds the log every 86400 seconds (one day. After the round robin, the file name is/www/logs/secfocus/access_log.nnnn. Here nnn is the time when the log starts to be recorded. Therefore, in order to align logs by day, you need to start the service at a.m. so that the daily round-robin result is a complete log of one day, which is provided to the access statistics analysis program for processing. If the log is generated at, the log generated by the round robin is access_log.0000.

4.3 Use cronolog for log round robin

First you need to download and install cronolog, you can download the latest version of cronolog to the http://www.cronolog.org. After the download is complete, unzip the package and install it as follows:


[Root @ mail root] # tar xvfz cronolog-1.6.2.tar.gz

[Root @ mail root] # cd cronolog-1.6.2

[Root @ mail cronolog-1.6.2] #./configure

[Root @ mail cronolog-1.6.2] # make

[Root @ mail cronolog-1.6.2] # make check

[Root @ mail cronolog-1.6.2] # make install



This completes cronolog configuration and installation. By default, cronolog is installed under/usr/local/sbin.

The command to modify apache Log configuration is as follows:


CustomLog "|/usr/local/sbin/cronolog/www/logs/secfocus/% w/access_log" combined



Here, % w indicates that logs are saved in different directories Based on the date and day of the week. In this way, logs of the week are saved. To perform log analysis, you need to copy (or move) the log file every day, if you do not want to save the log for one week, to a fixed location to facilitate processing of the log analysis statistics file, add a scheduled task as follows:


5 0 ***/bin/mv/www/logs/secfocus/'date-v-1d + \ %

W'/access_log/www/logs/secfocus/access_log_yesterday



In this way, the log statistics and analysis program is used to process the file access_log_yesterday.

For large websites that use Server Load balancer technology, there is a problem of merging and processing the access logs of multiple servers. in this case, each server cannot use access_log_yesterday when defining or moving log files. Therefore, the server number should be included, such as the server IP address, to distinguish between them. Then run the website image and backup service rsyncd on each server (refer to "use rsync for website image and backup", ttp: // www.linuxaid.com.cn/engineer/ideal/article/rsync.htm ), then, download the daily installation configuration file of each server through rsync to the server dedicated for access statistics and analysis for merging.

Merge the log files of multiple servers. For example: log1 log2 log3 and output to log_all:


Sort-m-t ""-k 4-o log_all log1 log2 log3

-M: The merge optimization algorithm is used.-k 4 indicates sorting by time, and-o indicates storing the sorting result to the specified file.


V. installation and configuration of the log statistics analysis program webalizer

Webalizer is an efficient and free web server log analysis program. The analysis result is in HTML file format, which allows you to conveniently browse through the web server. Many websites on the Internet use webalizer for web server log analysis. Webalizer has the following features:

1. It is a program written in C, so it has a high operating efficiency. On a machine with a clock speed of 10000 MHz, webalizer can analyze records per second. Therefore, it takes only 15 seconds to analyze a 40 m log file.

2. webalizer supports the standard Common Logfile Format. In addition, it also supports the CombinedLogfile Format variants, this allows you to collect statistics on the customer's situation and the type of the customer's operating system. Now webalizer supports the wu-ftpd xferlog format and squid log file format.

3. Supports command line configuration and configuration files.

4. You can support multiple languages or perform Localization on your own.

5. Supports multiple platforms, such as UNIX, linux, NT, OS/2, and MacOS.

This is the first page of the access statistics analysis report generated by webalizer. It contains the table and bar chart statistics and analysis of the average access volume of each month. Click the minute of each month to obtain detailed statistics for each day of the month.

5.1 Installation

Before installation, make sure that the system has installed the gd library. You can use:

[Root @ mail root] # rpm-qa | grep gdgd-devel-1.8.4-4gdbm-devel-1.8.0-14gdbm-1.8.0-14sysklogd-1.4.1-8gd-1.8.4-4

To confirm that the system has installed two rpm packages, gd-deve and gd.

You can install webalizer either by downloading the source code or directly using the rpm package.

Installing the rpm package is very simple. Find the webalizer package from rpmfind.net and download it later:

Rpm-ivh webalizer-2.01_10-1.i386.rpm

You can install it.

For the source code method, download the source code package from http://www.mrunix.net/webalizer/, and then install the package:

Tar xvzf webalizer-2.01-10-src.tgz

There is an lang directory in the generated Directory, which stores various language files, but only the Traditional Chinese version, you can convert it into simplified, or re-translate it yourself. Then enter the generated directory:

Cd webalizer-2.01-10./configuremake -- with-language = Chinesemake install

After compilation, A webalizer executable file is installed in the/usr/local/bin/directory.

5.2 configure and run

To control webalizer running, you can use the configuration file or specify parameters in the command line. The configuration file is simple and flexible, and is suitable for the application environment of automatic web server log statistics and analysis.

The default configuration file of webalizer is/etc/webalizer. conf. When the "-f" option is not used When Webalizer is started, Webalizer searches for the file/etc/webalizer. conf, you can also use "-f" to specify the configuration file (when the server has a virtual host, you need to configure multiple different webalizer configuration files, the webalizer of different virtual hosts uses different configuration files. The configuration options to be modified in the Webalizer. conf configuration file are as follows:

LogFile/www/logs/secfocus/access_log

Indicates the path information of the configuration file. webalizer uses the log file as the input for statistical analysis;

OutputDir/www/htdocs/secfocus/usage

Used to indicate the directory for saving the generated Statistical Report. We used alias before to enable users to use http://www.secfocus.com/usage/to compile the statistical report.

HostName www.secfocus.com

Indicates the host name, which is referenced in the statistical report.

You do not need to modify other options. After the configuration file is modified, You need to periodically generate daily statistical analysis for webalizer.

Run as root: crontab-e enters the scheduled running task editing status and adds the following tasks:

5 0 ***/usr/local/bin/webalizer-f/etc/secfocus. webalizer. conf15 0 ***/usr/local/bin/webalizer-f/etc/tomorrowtel. webalizer. conf

Here we assume that the system runs two virtual hosts and define the Log Analysis configuration files secfocus. webalizer. conf and tomorrowtel. webalizer. conf respectively. In this way, we define statistical analysis of secfocus logs at a.m. and statistical analysis of tomorrowtel logs at a.m.

Then use the second day.

6. Protect log statistical analysis reports from unauthorized user access

We certainly do not want our Website access statistics to be browsed by others at will. Therefore, we need to protect the usage directory and only allow access by legal users. The basic authentication mechanism provided by apache can be used here. After you configure it, you need to provide a password to connect to this address to access this page:

1. Conditions

In the configuration file, set the directory "/":

DocumentRoot/www/htdocs/secfocus/AccessFileName. htaccessAllowOverride All

2. Requirements

Requirement: to restrict the renewal of http://www.secfocus.com/usage/, you need to request a user certificate to renew. Set the user to "admin" and password to "12345678 ".

3. Use htpasswd to create a User File

Htpasswd-c/www/. htpasswd admin

This program will ask the user "admin" password, you enter "12345678", take effect twice.

4. Create a. htaccess File

Use vi to create a file. htaccess in the/www/logs/secfocus/usage/directory and write the following lines:

AuthName admin-onlyAuthType BasicAuthUserFile/www/. htpasswdrequire user admin

5. Test

In this case, you can access ghost through a browser.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.