1. Define the log format
A long time ago, there was only one log file format, which was a "public format" and many people were used to this format. The custom log format appeared later, and it seems that the custom log format is more popular, even though the public log format itself is also redefined in the custom log format. This article describes how to customize the log file format and how to make the log file record the desired information as needed.
The format of the custom log file involves two commands, namely the LogFormat command and the CustomLog command. By default, the httpd. conf file provides several examples of these two commands.
The LogFormat command defines the format and specifies a name for the format. Then we can directly reference this name. The CustomLog command sets the log file and specifies the format used by the log file (usually by the format name ).
The function of the LogFormat command is to define the log format and specify a name for it. For example, in the default httpd. conf file, we can find the following line of code:
LogFormat "% h % l % u % t" % r "%> s % B" common this command creates a log format named "common, the log format is specified in the content enclosed by double quotation marks. Each variable in the format string represents a specific information, which is written to the log file in the order specified by the format string.
The Apache document has provided all the variables that can be used for format strings and their meanings:
%... A: remote IP address
%... A: local IP address
%... B: Number of sent bytes, excluding the HTTP header
%... B: Number of sent bytes in CLF format, excluding the HTTP header. For example, if no data is sent, write '-' instead of 0.
%... {FOOBAR} e: content of the environment variable FOOBAR
%... F: file name
%... H: remote host
%... H request protocol
%... {Foobar} I: Content of Foobar, the header row of the request sent to the server.
%... L: remote login name (from identd, if provided) %... M request method
%... {Foobar} n: content of the annotation "Foobar" from another module %... {Foobar} o: Content of Foobar, response header line %... P: port used by the server to respond to the request
%... P: the ID of the subprocess that responds to the request.
%... Q query string (if a query string exists, it contains "?" Otherwise, it is an empty string .) %... R: The first line of the request.
%... S: status. For internal redirection requests, this refers to the status of * original * requests. If %…> S, that is, later requests.
%... T: Time expressed in public log time format (or standard English format) %... {Format} t: Time %… in the specified format... T: The time it takes to respond to the request, in seconds
%... U: Remote User (from auth; if the returned status (% s) is 401, it may be forged) %... U: URL path requested by the user
%... V: ServerName of the server responding to the request
%... V: the server name set according to UseCanonicalName is in all the variables listed above, "…" Indicates an optional condition. If no condition is specified, the value of the variable is replaced. The default httpd is used before the analysis. the example of the LogFormat command in the conf file shows that it creates a log format named "common", including: remote host, remote login name, remote user, request time, the first line of the request code, the request status, and the number of bytes sent.
Sometimes we only want to record some specific and defined information in the log, then we need to use "…". If one or more HTTP status codes are put between "%" and the variable, only when the returned status code belongs to the specified status code, the content represented by the variable is recorded. For example, if you want to record all invalid links of a website, you can use:
LogFormat % 404 {Referer} I BrokenLinks
If we want to record requests whose status code is not equal to the specified value, we only need to add a "!" Symbol:
Apache log: access log (1)
Do you want to know when someone browsed the website? You can view Apache access logs. Access logs are Apache standard logs. This document describes the access log content and configuration of related options.
1. Access log format
Apache has built-in server activity logging function, which is its log function. This Apache log series introduces Apache access logs, error logs, how to analyze log data, how to customize Apache logs, and how to generate statistical reports from log data.
If Apache is installed by default, two log files are generated when the server runs. These two files are access_log (access. log in Windows) and error_log (error. log in Windows ). When the default installation method is used, these files can be found under/usr/local/apache/logs. For Windows systems, these log files are saved in the logs subdirectory of the Apache installation directory. Different package managers place log files in different locations, so you may need to find other places or view the configuration of these log files in the configuration file.
As shown in its name, the access log access_log records all access activities to the Web server. The following is a typical access log record:
216.35.116.91--[19/Aug/2000: 14: 47: 37-0400] "GET/HTTP/1.0 & Prime; 200 654 this line consists of 7 items, the preceding example contains two blank items, but the entire line is still divided into seven items.
The first information is the address of the remote host, that is, it indicates who is accessing the website. In the preceding example, the website access host is 216.35.116.91. Just put, this address belongs to a machine named si3001.inktomi.com (to find this information, you can use the nslookup tool to find the DNS). inktomi.com is a company that creates Web search software. We can see that we can get a lot of information about visitors only starting from the first log record.
By default, the first item is only the IP address of the remote host. However, we can ask Apache to find out all the host names and use the host name in the log file to replace the IP address. However, this method is generally not recommended because it will greatly affect the server's log record speed, thus reducing the efficiency of the entire website. In addition, there are many tools that can convert the IP address in the log file into the host name, so it is not worth the candle to require Apache to record the host name replacement IP address.
However, if it is necessary for Apache to find the remote host name, we can use the following command:
HostNameLookups on
If HostNameLookups is set to double instead of on, the logging program will reverse query the host name it finds to verify that the host name actually points to the original IP address. By default, HostNameLookups is set to off.
The second item in the log record of the previous example is blank and replaced with a "-" placeholder. In fact, this is the case most of the time. This location is used to record the visitor's identity. It is not just the visitor's login name, but the viewer's email address or other unique identifier. This information is returned by identd, or directly by the browser. At that time, Netscape 0.9 was still dominant, which often recorded the email address of the browser. However, since someone used it to collect mail addresses and send spam, it was not retained for a long time, and almost all browsers in the market canceled this feature a long time ago. Therefore, today, we can see the chance of email address in the second log record.
The third log record item is blank. This location is used to record the name provided by the viewer for identity authentication. Of course, this information will not be blank if users are required to perform authentication on some content of the website. However, for most websites, this field is still blank in most records of log files.
The fourth log record is the request time. This information is enclosed in square brackets and uses the so-called "public log format" or "standard English format ". Therefore, the log record in the previous example indicates that the request was sent at 14:47:37 on Wednesday, August 19, 2000. The last "-0400" in the time information indicates that the server is located four hours before UTC.
The fifth item of log record may be the most useful information in the entire log record. It tells us what kind of request the server receives. The typical format of this information is "method resource protocol", that is, "method resource protocol ".
In the above example, the METHOD is GET, and other methods that may frequently appear include POST and HEAD. In addition, there are many possible valid methods, but these three methods are the main ones.
RESOURCE refers to the document or URL requested by the browser to the server. In this example, the visitor requests "/", that is, the homepage or root of the website. In most cases, "/expose refers to the index.html document of the documentrootdirectory, but it may also point to other files based on different server configurations.
PROTOCOL is usually HTTP, followed by the version number. The version number is either 1.0 or 1.1, but it is usually 1.0. We know that the HTTP protocol is the basis for Web work. HTTP/1.0 is an earlier version of the HTTP protocol, and 1.1 is the latest version. Currently, most Web client programs still use HTTP 1.0.
The sixth information recorded in the log is the status code. It tells us whether the request is successful or what kind of error is encountered. Most of the time, this value is 200, it indicates that the server has successfully responded to the browser request, everything is normal. A complete list of status codes and their meanings are not provided here. Please refer to the relevant information for more information. However, the status code starting with 2 indicates that the request is successful, and the status code starting with 3 indicates that the request is redirected to another location for various reasons, the status code starting with 4 indicates that the client has an error. The status code starting with 5 indicates that the server has encountered an error.
The seventh entry in the log indicates the total number of bytes sent to the client. It indicates whether the transmission is interrupted (that is, whether the value is the same as the file size ). By adding these values in the log records, you can know how much data the server sends within one day, one week, or one month.
2. Configure access logs
The location of the access log file is actually a configuration option. If we check the httpd. conf configuration file, we can see that the file contains the following lines:
CustomLog/usr/local/apache/logs/access_log common note that the content of this line may be slightly different for Apache servers with earlier versions. It may not use the CustomLog command, but the TransferLog command. If your server is in this situation, we recommend that you upgrade the server as soon as possible.
The CustomLog command specifies the location of the log file to be saved and the log format. As for how to customize the log file format and content, we will discuss several articles in this Apache log series. The preceding command specifies the common log format. Since the Web server is available, the common format is its standard format. We can also understand that although there are almost no customer programs providing user identification information to the server, the access log retains the second item.
The path in the CustomLog command is the path of the log file. Note: Because the log file is opened by an HTTP User (specified by the User command), you must ensure that this path is secure to prevent the file from being rewritten at will.
Supplement
1. apache logs are recorded by date
In the apache configuration file, find
ErrorLog logs/error_log
CustomLog logs/access_log common
Linux system configuration method:
Change it
ErrorLog "|/usr/local/apache/bin/rotatelogs/home/logs/www/% Y _ % m _ % d_error_log 86400 480 & Prime;
CustomLog "|/usr/local/apache/bin/rotatelogs/home/logs/www/% Y _ % m _ % d_access_log 86400 480 & Prime; common
Configuration method in Windows:
# ErrorLog "| bin/rotatelogs.exe logs/vicp_net_error-% y % m % d. log 86400 480 & Prime;
# CustomLog "| bin/rotatelogs.exe logs/vicp_net_access-% y % m % d. log 86400 480 & Prime; common
If you do not know how to set the 480 parameter for the first time, the log record time is 8 hours different from the server time. It turns out that rotatelogs has an offset parameter, which indicates the time difference (in minutes) relative to UTC and China is the eighth time zone, the difference is 480 minutes. 86400 indicates a day.
Appendix rotatelogs description
Rotatelogs logfile [rotationtime [offset] | [filesizeM]
Option
Logfile
The reference name is the log file name. If logfile contains '%', it is considered as a string in strftime (3) format; otherwise, it is automatically added with the. nnnnnnnnnn suffix in seconds. Both formats indicate the start time of the new log.
Rotationtime
Interval of log file rollback in seconds
Offset
The number of minutes relative to the UTC time difference. If it is omitted, it is assumed that it is 0 and UTC time is used. For example, to specify the local time of the area with the UTC time difference of-5 hours, this parameter should be-300.
FilesizeM
Specify the size of the file with the suffix M in MB when you roll back, instead of specifying the rollback time or time difference.