Apache Log Analysis

Source: Internet
Author: User
Tags apache log

 

I. Log Analysis

If Apache is installed by default, two log files are generated when the server runs. The two files are access_log and error_log.

When the default installation method is used, these files can be found under/usr/local/Apache/logs.

Access_log is the access log, which records all access activities to the web server. The following is a typical access log record:

192.168.1.100--[17/MAR/2004: 01: 29: 21-0800] "Get/HTTP/1.1" 200 429 "http://tw.search.yahoo.com/search/kimo? P = % AB % D8 % B3 % 5d % 5d % B6 % 7D % b5o % AE % D7 & U = B "" Mozilla/4.0 (compatible; MSIE 6.0; windows NT 5.1 ;. net CLR 1.0.3705 ;. net CLR 1.1.4322 )"

This line of content consists of nine items. The preceding example contains two blank items, but the entire line of content is still divided into nine items.

The first information is the address of the remote host, that is, it indicates who is accessing the website. In the preceding example, the host used to access the website is 192.168.1.100. By default, the first item is only the IP address of the remote host. However, we can ask Apache to find out all the host names and use the host name in the log file to replace the IP address. However, this approach is generally not recommended because it will greatly affect the speed of server logging, thus reducing the efficiency of the entire website. In addition, there are many tools that can convert the IP addresses in the log file into the host name, so it is not worth the candle to require Apache to record the host name to replace the IP address.

However, if it is necessary for Apache to find the remote host name, we can use the following command:

Hostnamelookups on

If hostnamelookups is set to double instead of on, the logging program will reverse query the host name it finds to verify that the host name actually points to the original IP address. By default, hostnamelookups is set to off.

The second item in the log record of the previous example is blank and replaced with a "-" placeholder. In fact, this is the case most of the time. This location is used to record the visitor's identity. It is not just the visitor's login name, but the viewer's email address or other unique identifier. This information is returned by identd, or directly by the browser. At that time, Netscape 0.9 was still dominant, which often recorded the email address of the browser. However, since someone uses it to collect mail addresses and send spam, it has not been retained for a long time and almost all browsers on the market have removed this feature. Therefore, today, we can see the chance of email address in the second log record.

The third log record item is blank. This location is used to record the name provided by the viewer for identity authentication. Of course, this information will not be blank if users are required to perform authentication on some content of the website. However, for most websites, this field is still blank in most records of log files.

The fourth log record is the request time. This information is enclosed in square brackets and uses the so-called "Public log format" or "Standard English format ". The last "-0800" in the time information indicates that the server is located 8 hours before UTC.

The fifth item of log record may be the most useful information in the entire log record. It tells us what kind of request the server receives. The typical format of this information is "method resource protocol", that is, "method resource protocol ".

In the above example, the method is get, and other methods that may frequently appear include post and head. In addition, there are many possible valid methods, but these three methods are the main ones.

Resource refers to the document or URL requested by the browser to the server. In this example, the visitor requests "/", that is, the homepage or root of the website. In most cases, "/expose refers to the index.html document of the documentrootdirectory, but it may also point to other files based on different server configurations.

Protocol is usually HTTP, followed by the version number. The version number is either 1.0 or 1.1, but it is usually 1.0. We know that the HTTP protocol is the basis for Web work. HTTP/1.0 is an earlier version of the HTTP protocol, and 1.1 is the latest version. Currently, most Web client programs still use HTTP 1.0.

The sixth information recorded in the log is the status code. It tells us whether the request is successful or what kind of error is encountered. Most of the time, this value is 200, which indicates that the server has successfully responded to the browser request and everything is normal. A complete list of status codes and their meanings are not provided here. Please refer to the relevant information for more information. However, the status code starting with 2 indicates that the request is successful, and the status code starting with 3 indicates that the request is redirected to another location for various reasons, the status code starting with 4 indicates that the client has an error. The status code starting with 5 indicates that the server has encountered an error.

The seventh entry in the log indicates the total number of bytes sent to the client. It indicates whether the transmission is interrupted (that is, whether the value is the same as the file size ). By adding these values in the log records, you can know how much data the server sends within one day, one week, or one month.

Item 8 of the log record

"Http://tw.search.yahoo.com/search/kimo? P = % AB % D8 % B3 % 5d % 5d % B6 % 7D % b5o % AE % D7 & U = B "indicates that visitors can search for the search strings on their websites.

The ninth entry in the log "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;. Net CLR 1.0.3705;. Net CLR 1.1.4322)" indicates the visitor's system and browser Information

The file name of the error log is error_log.In most cases, the content we see in the log file falls into two categories: Document errors and CGI errors.

1. Document Error

Document errors correspond to the 400 series of code in the server response. The most common error is the 404 error -- document not found (this document is not found ). Besides the 404 error, user authentication error is also a common error.

404 error occurs when the resource requested by the user (that is, the URL) does not exist. It may be caused by a URL error or because the original document on the server is deleted or moved for some reason.

When a user cannot open a document on the server, the error log contains the following records:

[Fri Aug 18 22:36:26 2000] [Error]

[Client 192.168.1.6] file does not exist:

/Usr/local/Apache/bugletdocs/img/south-korea.gif

As you can see, just like the access log access_log file, the error log records are also divided into multiple items.

The error records start with the date/time mark. Note that their format is different from the date/time format in access_log. The format in access_log is called "Standard English format". This may be a joke of history, but it is too late to change it.

The second item of the error record is the current record level, which indicates the severity of the problem. This level of information may be any level listed in the loglevel instruction documentation (see the previous loglevel link). The error level is between the warn level and the crit level. 404 belongs to the error level, which indicates that a problem is encountered, but the server can still run.

The third item in the error record indicates the IP address used by the user to send the request.

The last record is the real error message. For Error 404, it also provides a complete path to indicate the files the server is trying to access. This information is useful when we expect a file to have a 404 error at the target location. At this time, this error is often caused by server configuration errors, virtual hosts in which files are actually located, different from what we expected, or unexpected situations.

The error records due to user authentication problems are as follows:

[Tue Apr 11 22:13:21 2000]

[Error] [client 192.168.1.3] user rbowen @ rcbowen.

COM: authentication failure for "/cgi-bin/hirecareers/company. cgi ":

Password Mismatch

Note: Because document errors are the direct results of user requests, they will also be recorded in the access log.

2. CGI Error
The main purpose of the error log is probably the CGI program that diagnoses abnormal behavior. For further analysis and processing convenience, all content output by CGI program to stderr (standard error, standard error device) will go directly to the error log. This means that if a problem occurs in any well-written CGI program, the error log will tell us detailed information about the problem.

However, outputting CGI program errors to the error log also has its disadvantages. The error log will contain a lot of content without standard format, this makes it quite difficult to use an automatic error log analysis program to analyze useful information.

The following is an example of an error record in the error log when debugging the perl cgi code:

[Wed Jun 14 16:16:37 2000] [Error] [client 192.168.1.3] premature

End of script headers:/usr/local/Apache/cgi-bin/hypercalpro/announcement. cgi

Global symbol "$ RV" requires explicit package name

/Usr/local/Apache/cgi-bin/hypercalpro/announcement. cgi line 81.

Global symbol "% details" requires explicit package name

/Usr/local/Apache/cgi-bin/hypercalpro/announcement. cgi line 84.

Global symbol "$ config" requires explicit package name

/Usr/local/Apache/cgi-bin/hypercalpro/announcement. cgi line 133.

Execution of/usr/local/Apache/cgi-bin/hypercalpro/announcement. cgi

Aborted due to compilation errors.

The CGI error is in the same format as the preceding 404 error, including the date/time, error level, customer address, and error message. However, there are several lines of CGI error messages, which often interferes with the work of some error log analysis software.

With this error message, even if you are not familiar with Perl, you can find a lot of information about the error. For example, you can easily find out which lines of code has a problem. Perl has a sound mechanism for reporting program errors. Of course, the information output from different programming languages to error logs is different.

Ii. Define the log format

The format of the custom log file involves two Commands: The logformat command and the customlog command. In the httpd. conf file.

The logformat command defines the format and specifies a name for the format. Then we can directly reference this name. The customlog command sets the log file and specifies the format used by the log file.

The function of the logformat command is to define the log format and specify a name for it. For example, in the default httpd. conf file, we can find the following line of code:

Logformat "% H % L % u % t/" % R/"%> S % B" common

This command creates a log format named "common". The log format is specified in the content enclosed by double quotation marks. Each variable in the format string represents a specific information, which is written to the log file in the order specified by the format string.

The Apache document has provided all the variables that can be used for format strings and their meanings:

----------------------------------------------------------------------

%... A: remote IP Address

%... A: local IP Address

%... B: Number of sent bytes, excluding the HTTP Header

%... B: Number of sent bytes in CLF format, excluding the HTTP header.

For example, if no data is sent, write '-' instead of 0.

%... {Foobar} e: content of the Environment Variable foobar

%... F: File Name

%... H: Remote Host

Protocol for %... h request

%... {Foobar} I: content of foobar, the header line of the request sent to the server.

%... L: Remote Login Name (from identd, if provided)

%... M Request Method

%... {Foobar} n: content of the annotation "foobar" from another module

%... {Foobar} O: foobar content, response header line

%... P: port used by the server to respond to the request

%... P: the ID of the child process that responds to the request.

%... Q query string (if a query string exists, it contains "?" The

Otherwise, it is an empty string .)

%... R: the first line of the request

%... S: status. For internal redirection requests, this refers to * original * requests

. If %...> S is used, it indicates a later request.

%... T: Time in the public log time format (or standard English format)

%... {Format} t: Time in the specified format

%... T: time spent in responding to the request, in seconds

%... U: remote user (from auth; if the returned status (% s) is 401, it may be forged)

%... U: URL path requested by the user

%... V: servername of the server responding to the request

%... V: name of the server set according to UseCanonicalName

------------------------------------------------------------------

Among all the variables listed above, "..." indicates an optional condition. If no condition is specified, the value of the variable is replaced. The default httpd is used before the analysis. the example of the logformat command in the conf file shows that it creates a log format named "common", including: remote host, remote login name, remote user, request time, the first line of the Request Code, the Request status, and the number of bytes sent.

Sometimes we only want to record some specific and defined information in the log, then we need to use "...". If one or more HTTP status codes are put between "%" and the variable, only when the returned status code belongs to the specified status code, the content represented by the variable is recorded. For example, if you want to record all invalid links of a website, you can use:

----------------------------------------------------

Logformat % 404 {Referer} I brokenlinks

---------------------------------------------------

If we want to record requests whose status code is not equal to the specified value, we only need to add a "!" Symbol:

Logformat %! 200u somethingwrong

Appendix: Apache status code:

Basically, it can be divided into five categories:

1xx is a message type, and the status code is used to indicate a temporary response from the server.

100 continue indicates that the initial request has been accepted by the server, and the browser should continue to send the rest of the request

101 The switching protocols server converts a client-compliant request to another protocol.

2XX indicates that the browser request is successfully processed.

200 OK everything is normal.

The 201 created server has created a document and the location header provides its URL.

202 accepted has accepted the request, but the processing has not been completed.

203 the non-authoritative information document has been normally returned, but some response headers may be incorrect because the document is copied.

204 NO content does not have a new document. The browser should continue to display the original document. This is very similar to the following 304.

205 there is no new content in the reset content, but the browser should reset the content displayed by it. Used to force the browser to clear the input content of the form.

206 partial content the client sent a GET request with a range header and the server completed it. Note that resumable data transfer can be implemented through range.

3xx redirection.

300 the documents requested by the multiple choices client can be found in multiple locations, which are listed in the returned documents. If the server needs to give priority, it should be specified in the location response header.

301 moved permanently the document requested by the customer is elsewhere. The new URL is provided in the location header and the browser should automatically access the new URL.

302 found is similar to 301, but the new URL should be treated as a temporary alternative, rather than permanent. Note that the corresponding status information in http1.0 is "Moved temporatily ".
When this status code appears, the browser can automatically access the new URL, so it is a very useful status code.
Note that this status code can be replaced with 301 sometimes. For example, if the browser mistakenly requests http: // host /~ User, some servers return 301, and some return 302.
Strictly speaking, we can only assume that the browser will automatically redirect only when the original request is get. See 307.

303 see other is similar to 301/302. The difference is that if the original request is post, the redirection target document specified by the location header should be extracted through get.

304 The not modified client has a buffered document and issued a conditional request (generally, the IF-modified-since header is provided to indicate that the customer only wants to update the document on a specified date ). The server tells the customer that the original buffer documentation can still be used.

305 the document requested by the use proxy client should be extracted from the proxy server specified by the location header.

307 temporary redirect and 302 (found) are the same. Many browsers mistakenly respond to the 302 response for redirection. Even if the original request is post, it can only be redirected when the POST request actually responds to 303. For this reason, HTTP 1.1 adds 307 to clear the region code in several states: When a 303 response occurs, the browser can follow the redirected get and post requests; if a 307 response occurs, the browser can only follow the redirection to get requests. 4xx Error

4xx client Error

400 syntax error in bad request.

401 unauthorized the customer attempted to access the password-protected page without authorization. The response contains a WWW-Authenticate header. the browser displays the username/password dialog box accordingly, and then sends a request again after entering the appropriate authorization header.

403 Forbidden resources are unavailable. The server understands the customer's request, but rejects the request. This is usually caused by permission settings for files or directories on the server.

404 Not found cannot find the resource at the specified position. This is also a common response.

405 method not allowed request methods (get, post, Head, delete, put, Trace, etc.) are not applicable to specified resources.

406 the resource specified by not acceptable has been found, but its MIME type is incompatible with the one specified by the customer in the accpet header.

407 proxy authentication required is similar to 401, indicating that the customer must first be authorized by the proxy server.

408 request timeout the client has not sent any request within the waiting time permitted by the server. The customer can repeat the same request later.

409 conflict is usually related to put requests. The request cannot be successful because the request conflicts with the current status of the resource.

410 the document requested by gone is no longer available, and the server does not know which address to redirect. It differs from 404 in that if 407 is returned, the document permanently leaves the specified position, and 404 indicates that the document is unavailable for unknown reasons.

The 411 length required server cannot process the request unless the customer sends a Content-Length header.

412 precondition failed due to some of the prerequisites specified in the failed request header.

413 the size of the target Request Entity too large document exceeds the size that the server is willing to process. If the server thinks it can process the request later, it should provide a retry-after header.

414 request URI Too long URI is too long.

416 the requested range not satisfiable server cannot meet the range header specified by the customer in the request.

5xx Server Error

500 the internal server error server encounters unexpected circumstances and cannot complete the customer's request. (Server program error)

501 The not implemented server does not support the functions required to implement the request. For example, the customer sends a put request not supported by the server.

502 when the Bad Gateway server acts as a gateway or proxy, the server returns an invalid response to access the next server to complete the request.

503 the service unavailable server fails to respond due to maintenance or overload. For example, Servlet may return 503 when the database connection pool is full. When the server returns 503, A retry-after header can be provided.

504 gateway timeout is used by a proxy or gateway server, indicating that the remote server cannot receive a response in a timely manner.

505 the HTTP Version Not Supported server does not support the HTTP Version specified in the request.

HTTP Error Code details

"100": continue
"101": witching protocols
"200": OK
"201": created
& Quot; 202 & quot;: accepted
"203": Non-authoritative information
"204": NO content
"205": reset content
"206": Partial content
& Quot; 300 & quot;: Multiple Choices
"301": moved permanently
"302": Found
"303": see other
& Quot; 304 & quot;: not modified
"305": use proxy
"307": Temporary redirect
HTTP 400-invalid request
HTTP 401.1-unauthorized: Logon Failed
HTTP 401.2-unauthorized: logon fails due to server configuration problems
HTTP 401.3-ACL prohibit Resource Access
HTTP 401.4-unauthorized: Authorization denied by filter
HTTP 401.5-unauthorized: ISAPI or CGI authorization failed
HTTP 403-Access prohibited
HTTP 403-access to Internet Service Manager (HTML) is limited to localhost
HTTP 403.1 Prohibit Access: Prohibit executable access
HTTP 403.2-Access prohibited: Read prohibited
HTTP 403.3-Access prohibited: Write Access prohibited
HTTP 403.4-Access prohibited: requires SSL
HTTP 403.5-Access prohibited: requires SSL 128
HTTP 403.6-Access prohibited: the IP address is denied
HTTP 403.7-Access prohibited: client certificate required
HTTP 403.8-Access prohibited: site access prohibited
HTTP 403.9-Access prohibited: too many connected users
HTTP 403.10-Access prohibited: Invalid Configuration
HTTP 403.11-Access prohibited: Password Change
HTTP 403.12-Access prohibited: mappers reject access
HTTP 403.13-Access prohibited: the client certificate has been revoked
HTTP 403.15-Access prohibited: too many access permits from customers
HTTP 403.16-Access prohibited: the client certificate is untrusted or invalid
HTTP 403.17-Access prohibited: the client certificate has expired or has not yet taken effect
HTTP 404.1-the web site cannot be found
HTTP 404-file not found
HTTP 405-the resource is forbidden
HTTP 406-unacceptable
HTTP 407-proxy authentication required
HTTP 410-never available
HTTP 412-precondition failed
HTTP 414-request-Uri is too long
HTTP 500-Internal Server Error
HTTP 500.100-Internal Server Error-Asp Error
HTTP 500-11 Server Disabled
HTTP 500-12 Application restart
HTTP 500-13-the server is too busy
HTTP 500-14-invalid Application
HTTP 500-15-requests to global. Asa are not allowed
Error 501-not implemented
HTTP 502-Gateway error

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.