Recommended Apache log analysis methods and tools

Source: Internet
Author: User
Tags http authentication apache log
I. log analysis if the default configuration is used during apache installation, two files are generated under the logs directory, access_log and error_log1.access_logaccess_log, respectively, as access logs, record all requests to the apache server. Its location and content are controlled by the CustomLog command. The LogFormat command can be used to simplify

I. log analysis if the default configuration is used during apache installation, two files are generated under the/logs directory, access_log and error_log 1. access_log is an access log that records all requests to the apache server. Its location and content are controlled by the CustomLog command. The LogFormat command can be used to simplify

I.LogsAnalysis
If the default configuration is used during apache installation, two files, access_log and error_log, are generated in the/logs directory.
1. access_log
Access_log Indicates accessLogsRecord all access requests to the apache server. Its location and content are controlled by the CustomLog command. The LogFormat command can be used to simplifyLogsContent and format
For example, one of my servers is configured as follows:

CustomLog "|/usr/sbin/rotatelogs/var/log/apache2/% Y _ % m _ % d_other_vhosts_access.log 86400 480" vhost_combined

-Rw-r-1 root 22310750 12-05 23:59 2010_12_05_other_vhosts_access.log
-Rw-r-1 root 26873180 12-06 23:59 2010_12_06_other_vhosts_access.log
-Rw-r-1 root 26810003 12-07 23:59 2010_12_07_other_vhosts_access.log
-Rw-r-1 root 24530219 12-08 2010_12_08_other_vhosts_access.log
-Rw-r-1 root 24536681 12-09 2010_12_09_other_vhosts_access.log
-Rw-r-1 root 14003409 12-10 2010_12_10_other_vhosts_access.log

Use the CustomLog command to generate an independentLogsFile, and also write the timerLogsClear all files to make them clearer.LogsIt can also clearLogsDefined by LogFormatLogsRecord format

LogFormat "% h % l % u % t \" % r \ "%> s % B \" % {Referer} I \ "% {User-Agent} I \ "" combined
LogFormat "% {X-Forwarded-For} I % l % u % t \" % r \ "%> s % B \" % {Referer} I \ "% {User-Agent} I \ "" combinedproxy
LogFormat "% h % l % u % t \" % r \ "%> s % B" common
LogFormat "% {Referer} I-> % U" referer
LogFormat "% {User-agent} I" agent

Randomly tail an access_log file. Below is a classic access record

218.19.140.242--[10/Dec/2010: 09: 31: 17 + 0800] "GET/query/trendxml/district/todayreturn/month/2009-12-14/2010-12-09/haizhu_tianhe.xml HTTP/1.1" 200 1933 "-" "Mozilla/ 5.0 (Windows; u; Windows NT 5.1; zh-CN; rv: 1.9.2.8) Gecko/20100722 Firefox/3.6.8 (. net clr 3.5.30729 )"

There are nine items in total, and they will be split one by one

218.19.140.242
-
-
[10/Dec/2010: 09: 31: 17 + 0800]
"GET/query/trendxml/district/todayreturn/month/2009-12-14/2010-12-09/haizhu_tianhe.xml HTTP/1.1 ″
200
1933
"-"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 1.9.2.8) Gecko/20100722 Firefox/3.6.8 (. net clr 3.5.30729 )"

1) 218.19.140.242 this is the ip address of the client that requests to the apache server. By default, the first item is the ip address of the remote host. However, if we need apache to find the host name, you can set HostnameLookups to on, but this method is notRecommendationBecause it greatly slows down the server. in addition, the ip address here is not necessarily the ip address of the client host. If the client uses a proxy server, the ip address here is the ip address of the proxy server, not the original server.

2)-This item is blank and replaced by "-". This location is used to mark visitors. This information exists from the identd client, unless IdentityCheck is on, otherwise, apache will not obtain the information of this Part (ps: Not quite understandable, basically this item is empty, and the original article is provided)
The "hyphen" in the output indicates that the requested piece of information is not available. in this case, the information that is not available is the RFC 1413 identity of the client determined by identd on the clients machine. this information is highly unreliable and shoshould almost never be used operated t on tightly controlled internal networks. apache httpd will not even attempt to determine this information unless IdentityCheck is set to On.

3)-This item is blank, but this item is used to record the HTTP authentication of users. If some websites require users to perform identity authentication, this item is used to record the user's identity information.

4) [10/Dec/2010: 09: 31: 17 + 0800] The fourth item is the request time, in the format of [day/month/year: hour: minute: second zone], the last + 0800 indicates that the server is located in the UTC + 8 zone

5) "GET /.. haizhu_tianhe.xml HTTP/1.1 "indicates the most useful information in the entire record. First, it indicates that the server receives a GET request, second, the resource path of the client request, and third, when the client uses the protocol HTTP/1.1, the entire format is "% m % U % q % H", that is, "RequestMethod/Access path/protocol"

6) 200 this is a status code sent back to the client by the server, which tells us whether the client's request is successful, or is redirected, or what kind of error is encountered. The value is 200, indicates that the server has responded to the client's request successfully. Generally, a value starting with 2 indicates that the request is successful, and a value starting with 3 indicates redirection, there are some client errors marked with start 4 and some server errors marked with start 5. For details, see HTTP specification (RFC2616 section 10 ). [http://www.w3.org/Protocols/rfc2616/rfc2616.txt]

7) 1933 indicates the number of bytes the server sends to the client.LogsAnalysisWhen calculating the statistics, you can add these bytes to know the total amount of data sent by the server at a certain point in time.

-Unknown

9) "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 1.9.2.8) Gecko/20100722 Firefox/3.6.8 (. net clr 3.5.30729) "This mainly records the browser information of the client.

2. error_log
Error_log is incorrect.LogsTo record any error processing requests. Its location and content are controlled by the ErrorLog command. Generally, when a server encounters an error, it is the most important thing to check it first.LogsFile

Tail error_log. Extract A record at will.

[Fri Dec 10 15:03:59 2010] [error] [client 218.19.140.242] File does not exist:/home/htmlfile/tradedata/favicon. ico

There are also several items

[Fri Dec 10 15:03:59 2010]
[Error]
[Client 218.19.140.242]
File does not exist:/home/htmlfile/tradedata/favicon. ico

1) [Fri Dec 10 15:03:59 2010] records the time when an error occurred. Note that it is different from the time format recorded in the access_log above.

2) [error] indicates the error level. The error type is controlled based on the LogLevel command. The above 404 belongs to the error level.

3) [client 218.19.140.242] record the IP address of the client

4) File does not exist:/home/htmlfile/tradedata/favicon. ico first describes the error. For example, if the client accesses a file that does not exist or has a path error, the Error 404 is returned.

Ii. PracticalLogsAnalysisScript
UnderstandingLogsFor more informationLogsAnalysisScript

1. view the number of apache Processes
Ps-aux | grep httpd | wc-l

2.AnalysisLogsView the number of ip connections on the current day
Cat default-access_log | grep "10/Dec/2010" | awk '{print $2}' | sort | uniq-c | sort-nr

3. view the url accessed by the specified ip address on the current day.
Cat default-access_log | grep "10/Dec/2010" | grep "218.19.140.242" | awk '{print $7}' | sort | uniq-c | sort-nr

4. view the top 10 URLs on the current day
Cat default-access_log | grep "10/Dec/2010" | awk '{print $7}' | sort | uniq-c | sort-nr | head-n 10

5. See what the specified ip Address does.
Cat default-access_log | grep 218.19.140.242 | awk '{print $1 "\ t" $8}' | sort | uniq-c | sort-nr | less

6. view the most frequently accessed minutes (find the hotspot)
Awk '{print $4}' default-access_log | cut-c 14-18 | sort | uniq-c | sort-nr | head

3. Use awstats to automaticallyAnalysisLogs
Of course, if you want the simplest and most intuitiveAnalysisLogsStill usedTools, Which is currently popular on the InternetToolsIs awstats, a perl-based webLogsAnalysisToolsAnd supports servers such as IIS.
Http://awstats.sourceforge.net

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.