Multi-server log merge statistics (2)

Source: Internet
Author: User
Tags date format copy log sort system log apache log
Server | Statistics are simpler. The method of not affecting the service is: first copy, then empty

Cp/path/to/apache/log/access_log/path/to/apache/log/access_log_yesterday
Echo >/path/to/apache/log/access_log

A serious analyst would do this to find a problem:

But CP cannot strictly guarantee strict 0-point truncation. The process of joining the replication took 6 seconds, and the log in the truncated access_log_yesterday log appears during the copy process to 00:00:06. For a single log count these hundreds of-line logs per day are no problem. However, for multiple logs, there will be a merge sort problem for the 1 days of the span month:

[31/mar/2002:59:59:59 +0800]
[31/mar/2002:23:59:59 +0800]
[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]

You know [01/apr/2002:00:00:00 This field is not allowed to "sort across days.") Because the date uses the dd/mm/yyyy, the month or the English name, if sorted alphabetically, most likely the result: the sort caused the log error

[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]
[01/apr/2002:00:00:00 +0800]
[31/mar/2002:59:59:59 +0800]
[31/mar/2002:59:59:59 +0800]
[31/mar/2002:23:59:59 +0800]
[31/mar/2002:59:59:59 +0800]
[31/mar/2002:23:59:59 +0800]

The abnormal data in these trans-day processes is almost like eating a bug for analysis tools such as Webalizer, and the result is that it may lose all data in the first one months! So there's a lot of risk for such data in the process of processing data for the last day of last month.

There are several ways to solve the problem:

1) after treatment

So one way to do it afterwards is to use the grep command to remove the log across the month on the 1th day of the month, for example:

Grep-v "01/APR" access_log_04_01 > Access_log_new

Modify the log after sort: All data across the day is removed. Perhaps the post processing of the log is a way, although the sort command has a special option for sorting the date-m (Note: Capital M), you can let the specified field in the English month rather than alphabetical order, but for the Apache log, the sort command to cut out the month field is cumbersome. (I tried to use "/" as a separator and sort by using the "month" Year: Time). Although some Perl scripts are definitely achievable, I finally gave up. This does not conform to the system administrator's design principles: versatility. And you need to keep asking yourself: Is there a simpler way? There is to change the log format to use timestamp (like squid's log does not have this problem, its log itself is using timestamp to do timestamp), but I can not guarantee that all log tools can identify you in the date of this field using a special format.

2 Optimize the data source

The best way to do this is to optimize the data source. Ensure that the data source is on a daily basis, and that the data in the log on the same day is in the same date. After that you will not be affected by the complexity of the log preprocessing mechanism, regardless of the tools (commercial, free) you use to analyze the log.

The first thing you might think about is controlling the time to intercept the log: for example, the interception of the log strictly from 0 o'clock, but the first 1 minutes before midnight or a minute after the start of the interception is no difference, you still cannot control a log has a problem that spans 2 days, and you cannot predict the time that the log archive process will be used.

Therefore, it is important to consider the issue of using log-round tools, which should be followed by tools:

1 Do not interrupt the Web service: Can not stop apache=> mobile log => restart Apache;
2 ensure that the same day the log can be followed by days: a daily log 00:00:00-23:59:59;
3 is not affected by the Apache reboot: If Apache every reboot will generate a new log is not in line with the requirements;
4 installation configuration is simple.

First, consider the Apache/bin directory with a round-robin tool: Rotatelogs This tool is basically used to control the log by time or size, unable to control when to truncate and how to file on a daily basis.

Then consider the logrotate background service: Logrotate is a dedicated background service that rounds through the various system logs (Syslogd,mail), such as System log, but its configuration is more complex, giving up, In fact, it also sends a-hup reboot command to the corresponding service process to implement the log truncation file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.