Previous words
Almost all servers and agents record the HTTP transaction summaries that they handle. This is done for a number of reasons: tracking usage, security, billing, error detection, and so on. This article introduces posthumous to log records
Record content
In most cases, logs are logged for two reasons: finding problems with the server or agent (for example, which requests failed), or generating statistics about how the Web site is accessed. Statistics are useful for marketing, billing, and capacity planning (for example, deciding whether to increase server or bandwidth)
All headers in an HTTP transaction can be recorded, but for servers and proxies that handle millions of of transactions per day, the volume of the data is huge and quickly gets out of control. You should not record data that is not really interesting, and that you never even take a look at.
Typically, only the basic information for a transaction is logged. Examples of fields that are typically recorded are: HTTP method, HTTP version of client and server, URL of requested resource, HTTP status code for response, size of request and response message (contains all entity body parts), timestamp at start of transaction Referer header and User-agent header values
The HTTP method and URL describe what the request is trying to do-for example, get a resource or post an order. You can use URLs to record the popularity of pages on your Web site
The version string gives some hints about the client and the server, which can be useful when there are some strange or unintended interactions between the client and the server. For example, if the requested failure rate is higher than expected, the version information may point to a new version of the browser that cannot interact with the server
The HTTP status code describes the execution status of the request: whether it was executed successfully, whether the authentication request failed, whether the resource was found, etc.
The size and timestamp of the request/response is primarily used for bookkeeping-how many bytes are logged into, out, or through the application. The time stamp is also available to correlate the observed issues with some of the requests that were initiated at that time.
Log format
Most commercial and open-source HTTP applications support logging in one or more common formats. Many of these applications support the manager to configure the log format and create a custom format
One of the main benefits of application support managers using these more standard formats is to take advantage of the tools that have been built to process these logs and generate basic statistics. There are many open source packages and commercial packages that can be used to compress logs for reporting purposes. Using the standard format, the application and its administrators can all use these packages
"Common Log Format"
Now, one of the most common log formats is the common log format. This log format is initially defined by NCSA, and many servers use this log format by default. Most commercial and open source servers can be configured to use this format, and many commercial and free tools can assist in parsing files in common log formats. The following table lists the fields in the common log format by
Several common log format entries are listed below
In these examples, the fields are assigned as follows
[Note that the]remotehost field can be a host name like Http-guide.com, or an IP address such as 209.1.32.44
The dash Description field between the second (Usemame) and third (auth-username) fields is empty. This means either no ident search (second word blank), or no authentication (the third word blank)
"Combined log Format"
Another common day lashing format is the combined log format (Combined log formats), for example, the Apache server supports this format. The combined log format is similar to the common log format. In fact, it is an exact image of a common log format, with just two fields added. The User-agent field is used to indicate which HTTP client application is initiating a request that has been logged, while the Referer field provides more information about where the request side finds the URL
The newly added combined log format fields are listed below
Field description referer referer header content user-agent user-agent header content
The following example shows an entry for a combined log format
The values for the Referer field and the User-agent field are as follows
The first seven fields in the composite log format entry example above are exactly the same as in the common log format. Two new fields Referer and User-agent appended to the end of the log entry
"Netscape Extended Log Format"
When Netscape enters the realm of commercial HTTP applications, it defines the log formats that many other HTTP application developers have accepted for their servers. Netscape's format is based on NCSA's common log format, but they extend the format to support fields related to HTTP applications such as proxies and Web caching
The first 7 fields of the Netscape Extended log format are exactly the same as those in the common log format. The following table lists the new fields introduced by the Netscape Extended log format
An entry for the Netscape Extended log format is given below
209.1.32.44 --[03/oct/2016:14:16:00-0400] "get/http/1.0" 200 1024 200 1024 0 0 215 260279 254 3
In this example, the value of the extended field is as follows
The first 7 fields in the above Netscape Extended Log Format Entry example are mirrors of the common log format entry example
Another Netscape log format, Netscape extended the 2nd log format uses the extended Logging format and adds additional information about HTTP proxies and Web cache applications. These additional fields help to better depict the interaction landscape between the HTTP client and the HTTP proxy application
Netscape extended the 2nd log format is based on the Netscape extended logs format, and the initial fields are exactly the same as the fields of the Netscape extension log
The following table lists the newly added fields of the Netscape Extended 2nd log format by series
The following example shows an entry for Netscape to extend the 2nd log format
209.1.32.44 --[03/oct/2016:14:16:00-0400] "get/http/1.0" 0 0 215 260 279 254 3 DIRECT FIN Writte N
The value of the extended field in this example is as follows
The first 16 fields in the above Netscape extension 2nd log Format entry are the image of the Netscape Extended Logs format sample entry
The following table lists the valid Netscape routing codes
The following table lists the valid Netscape completion status codes
The following table lists the valid Netscape cache codes
Like many other HTTP applications, the Netscape application has other log formats, including a flexible log format and a way for managers to output custom log fields. These formats give managers greater control and can choose which parts of the HTTP transaction (header, status, size, etc.) are reported in the log to customize their logs
The ability of managers to configure custom formats is added because it is difficult to predict what information managers would like to get from their logs. Many other proxies and servers have the ability to publish custom logs
"Squid Agent Log Format"
Squid proxy Cache (http://www.squid-cache.org) is a very old part of the web. Its origins can be traced back to an earlier Web Proxy cache project (Ftp://ftp.cs.colorado.edu/pub/techreports/schwartz/Harvest.Conf.ps.Z). Squid is an open source project that the open source community has expanded over the years. There are a number of tools that can be used to assist in managing squid applications, including tools that help to process, review, and develop their logs. Many subsequent proxy caches use squid format for their own logs in order to make better use of these tools
The format of Squid log entries is fairly simple. The following table summarizes the fields for this log format
An example of a SQUID log format entry is given below
The values for these fields are as follows
The following table lists the various squid result codes
Hit Rate measurement
The original server typically retains detailed logging for billing purposes. Content providers need to know how often the URLs are visited, and advertisers need to know how often they appear, and site authors need to know how popular they are writing content. When a client accesses a Web server directly, logging can be a good way to track this information
However, the cache server is located between the client and the server to prevent the server from processing a large number of access requests at the same time (this is the purpose of caching). The cache handles many HTTP requests and satisfies their requests without accessing the original server, and the server does not have a record of the client accessing its contents, resulting in the omission in the log file
Because the log data is lost, the content provider caches the pages that are most important to them (cache bust). Cache cleanup means that content providers intentionally set certain content to not be cached, so that all requests for this content are directed to the original server. As a result, the original server can record the access situation. Not using the cache may generate better logs, but it slows the request of the original server and network and increases its load
Because the proxy cache (and some clients) keeps its own logs, you can avoid using cache cleanup if the server has access to the logs (or at least a rough way to determine how often the proxy cache will provide its content). The hit Rate measurement protocol is an extension of HTTP, which provides a solution to this problem. The hit Rate measurement protocol requires the cache to periodically report cached access statistics to the original server
The hit-Rate measurement protocol defines an HTTP extension that provides basic functionality that caches and servers can implement to share access information and standardize the number of times a cached resource can be used
Caching provides a problem with log access, which is not a complete solution to this problem, but it does provide a basic way to get the metrics that the server wants to track. The hit rate measurement protocol is not (and may never) be widely implemented or applied. In other words, while maintaining the cache performance gain, a cooperative scenario like hit-rate measurement gives some promise of accurate access to statistics. Hopefully this will push the implementation of the hit-rate measurement protocol rather than marking the content as non-cacheable
"Meter Header"
The hit rate measurement extension recommends the use of the newly added header meter, where the cache and the server can transmit instructions related to usage and reporting from one another, similar to the Cache-control header used for cache instruction Exchange.
The following table lists the various directives defined and who can transmit these instructions at the meter header
An example of a hit string measurement in execution is given. The first part of the transaction is a normal HTTP transaction between the client and the proxy cache, but in the proxy request, be aware of the inserted meter header and the response from the server. Here, the agent is notifying the server that it can make a hit rate measurement. In response, the server requests the agent to report its hit count
From the client's point of view, the request ends normally, and the agent begins to track the number of hits on the requested resource on behalf of the server. Later, the agent tries to validate the resource again with the server. The agent embeds the metering information for its tracking records in a conditional request sent to the server
Privacy
Logging is actually a management function performed by the server and the agent, so the entire operation is transparent to the user. In general, users are not even aware that their HTTP transactions have been logged
Developers and managers of Web applications need to be aware of the potential impact of a user's HTTP transactions. He can gather a lot of information about the user based on what he gets. Obviously, this information can be used for bad purposes-discrimination, harassment, extortion and so on. Web servers and agents for logging be sure to protect the privacy of their end users
In some cases, such as in a work environment, tracking a user's use to ensure that he is not lazy is feasible, but administrators should also be monitoring the affairs of the business of the public
In simple terms, logging is a useful tool for both managers and developers. Just be aware that the use of logs that record their behavior may be subject to privacy violations without obtaining user consent or without their knowledge
Frontend Learning HTTP Logging