Apache log file detailing and Practical Analysis command _linux

Source: Internet
Author: User
Tags apache log file http authentication apache log

A. Log analysis
If the Apache installation uses the default configuration, two files will be generated in the/logs directory, respectively, Access_log and Error_log
1). Access_log
Access_log for the access log, log all requests to the Apache server for access, its location and content is controlled by the Customlog command, Logformat instructions can be used to simplify the content and format of the log
For example, one of my servers is configured as follows:

Copy Code code as follows:

Customlog "| /usr/sbin/rotatelogs/var/log/apache2/%y_%m_%d_other_vhosts_access.log 86400 "vhost_combined


-rw-r--r--1 root root 22310750 12-05 23:59 2010_12_05_other_vhosts_access.log
-rw-r--r--1 root root 26873180 12-06 23:59 2010_12_06_other_vhosts_access.log
-rw-r--r--1 root root 26810003 12-07 23:59 2010_12_07_other_vhosts_access.log
-rw-r--r--1 root root 24530219 12-08 23:59 2010_12_08_other_vhosts_access.log
-rw-r--r--1 root root 24536681 12-09 23:59 2010_12_09_other_vhosts_access.log
-rw-r--r--1 root root 14003409 12-10 14:57 2010_12_10_other_vhosts_access.log

#通过CustomLog指令, every day to generate a separate log file, but also write a timer will be a week before the log files all clear, so that can be clearer, can be separated from each day of the log and can clear a certain amount of time before the log through system, Logformat to define the record format of the log

Logformat "%h%l%u%t \%r\"%>s%b \ "%{referer}i\" \ "%{user-agent}i\" "combined
Logformat "%{x-forwarded-for}i%l%u%t \%r\"%>s%b \ "%{referer}i\" \ "%{user-agent}i\" "Combinedproxy
Logformat "%h%l%u%t \"%r\ "%>s%b" common
Logformat "%{referer}i->%u" Referer
Logformat "%{user-agent}i" Agent

Random tail a Access_log file, below is a classic access record

Copy Code code as follows:

218.19.140.242--[10/dec/2010:09:31:17 +0800] "get/query/trendxml/district/todayreturn/month/2009-12-14/ 2010-12-09/haizhu_tianhe.xml http/1.1 "1933"-"" mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.9.2.8) gecko/20100722 firefox/3.6.8 (. NET CLR 3.5.30729) "

Altogether there are 9 of them, and they take apart to illustrate:

Copy Code code as follows:

218.19.140.242
-
-
[10/dec/2010:09:31:17 +0800]
"Get/query/trendxml/district/todayreturn/month/2009-12-14/2010-12-09/haizhu_tianhe.xml HTTP/1.1"
200
1933
"-"
"Mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.9.2.8) gecko/20100722 firefox/3.6.8 (. NET CLR 3.5.30729) "

1 218.19.140.242 This is a request to the Apache server client IP, by default, the first message is only the IP address of the remote host, but if we need Apache to find the host's name, you can set hostnamelookups to ON, However, this practice is not recommended because it greatly slows down the server. In addition, the IP address here is not necessarily the IP address of the client host, if the client uses a proxy server, then the IP here is the address of the proxy server, rather than the original machine.
2-This entry is blank, using "-" instead, this position is used to mark the visitor, this information is identd by the client, unless IdentityCheck is on, Apache is not going to get that part of the information (PS: not quite understand, Basically this one is empty, in the original text)
The "hyphen" in the output indicates this requested piece of information is not available. In this case, the information and is isn't available is the RFC 1413 identity of the ' client determined by Identd on the CLI Ents machine. This information is highly unreliable and should almost never to used except on tightly controlled internal. Apache httpd won't even attempt to determine this information unless be set to ON.
3-This entry is blank, but this is the user records the user HTTP authentication, if some websites require users to identity Wild Goose lineup, then this is to record the user's identity information
4) [10/dec/2010:09:31:17 +0800] The fourth is the time to record the request, in the format [Day/month/year:hour:minute:second Zone], and the last +0800 indicates that the server is in the zone of East eight.
5) "Get/.. Haizhu_tianhe.xml http/1.1 "This is the most useful information in the entire record, first of all, it tells us that the server received a GET request, second, is the client request resource path, third, the client uses the protocol when http/1.1, the entire format is"%m% U%q%H "," request method/Access Path/protocol "
6) 200 This is a status code, sent back to the client by the server, it tells us whether the client's request succeeded, or was redirected, or what error was encountered, the value is 200, indicating that the server has successfully responded to the client's request, generally, this value begins with a 2 indication that the request succeeded. With a 3-preceded representation Redirect, there are some errors at the beginning of the 4-mark client, with some errors on the server side with the beginning of 5, and the details can be seen in the HTTP specification (RFC2616 section 10). [Http://www.w3.org/Protocols/rfc2616/rfc2616.txt]
7) 1933 This indicates how many bytes the server sent to the client, and when the log analysis is counted, add the bytes to the total amount of data sent by the server at a certain point in time.
8)-HTTP Referer: Tell the server where I came from the link to the page, there is no value may be directly open the page reason.
9) "mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.9.2.8) gecko/20100722 firefox/3.6.8 (. NET CLR 3.5.30729) "This primary record of the client's browser information
2). Error_log
Error_log is the error log, recording any error processing request, its location and content by the errorlog instruction control, usually the server what error, first check it, is one of the most important log files
Tail Error_log, pick a record at random

Copy Code code as follows:

[Fri Dec 10 15:03:59 2010] [ERROR] [Client 218.19.140.242] File does not exist:/home/htmlfile/tradedata/favicon.ico

It is also divided into several items:

Copy Code code as follows:

[Fri Dec 10 15:03:59 2010]
[ERROR]
[Client 218.19.140.242]
File does not exist:/home/htmlfile/tradedata/favicon.ico

1 [Fri Dec 10 15:03:59 2010] recording the time the error occurred, note that it is different from the time format of the Access_log record above
2 [ERROR] This is the wrong level, according to the loglevel instructions to control the wrong category, above 404 is the error level
3) [Client 218.19.140.242] record the IP address of the clients
4) file does not exist:/home/htmlfile/tradedata/favicon.ico This entry first describes the error, for example, if the client accesses a file that does not exist or the path is wrong, it gives a 404 hint error

Two. Practical log analysis commands and scripts

Once you understand the various definitions of the log, share some of the scripts that you've learned from the Web for log analysis.

1. View the number of processes in Apache
Ps-aux | grep httpd | Wc-l
2. Analysis log to see the number of IP connections for the day
Cat Default-access_log | grep "10/dec/2010" | awk ' {print $} ' | Sort | uniq-c | Sort-nr
3. See what URL the specified IP visited on the same day
Cat Default-access_log | grep "10/dec/2010" | grep "218.19.140.242" | awk ' {print $} ' | Sort | uniq-c | Sort-nr
4. View the URL of the top 10 of the day's visit
Cat Default-access_log | grep "10/dec/2010" | awk ' {print $} ' | Sort | uniq-c | Sort-nr | Head-n 10
5. See what the designated IP is doing
Cat Default-access_log | grep 218.19.140.242 | awk ' {print ' \ t ' $} ' | Sort | uniq-c | Sort-nr | Less
6. View the maximum number of minutes visited (find hotspots)
awk ' {print $} ' Default-access_log |cut-c 14-18|sort|uniq-c|sort-nr|head


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.