Most web hosting (web hosting) companies support customer access to WEB site statistics, but you often feel that the state information generated by the server is not comprehensive enough. For example, an incorrectly configured Web server does not recognize certain file types, and these types of files do not appear in state information. Fortunately, you can use PHP to customize the status information collection program so that you can get the information you need.
Structure of the Common log file format (Common Logfile FORMAT,CLF)
CLF was originally designed for HTTPD (global network Server Software). NCSA CERN httpd is a public domain Web server that is maintained by the World Wide Web Consortium (WWW consortium,w3c). The Web site lists the log file specification. Web servers based on Microsoft and UNIX can generate log files in CLF format. The CLF format is as follows:
Host identauthusertime_stamp "Request" status_codefile_size
For example:
21.53.48.83--[22/apr/2002:22:19:12-0500] "Get/cnet.gif http/1.0" 200 8237
The following is a breakdown of the log entries:
Host is the IP address or DNS name of the site visitor; In the example above, it is 21.53.48.83.
Ident is the remote identity of the visitor (RFC 931). The dash indicates "unspecified."
Authuser is the user ID (if the Web server has verified the identity of the site visitor).
Time_stam is the time that the server returns in the format "Day/month/year".
Request is a site visitor's HTTP requests, such as Get or post.
Status_code is the status code returned by the server, for example: 200 represents "correct-browser request succeeded".
File_size is the size of the file requested by the user. In this case, it is 8237 bytes.
Server Status Code
You can find the Server Status code specification developed by the Consortium in the HTTP standard. These state codes generated by the server indicate the success of the data transfer between the browser and the server. This code is typically passed to the browser (for example, a very famous 404 error "page is not found") or added to the server log.
Collect Data
The first step in creating our custom application is to get the user data. Whenever a user chooses a resource for a Web site, we want to create a corresponding log entry. Fortunately, the presence of server variables allows us to query the user's browser and get the data.
The server variable in the header carries information that is passed from the browser to the server. REMOTE_ADDR is an example of a server variable. This variable returns the user's IP address:
Example output: 27.234.125.222
The following PHP code will display the current user's IP address:
<?php echo $_server[' remote_addr '];?>
Let's look at the code for our PHP application. First, we need to define the site resources we want to track and specify the file size:
Get the name of the file we want to record
$fileName = "Cnet-banner.gif";
$fileSize = "92292";
You don't have to save these values to a static variable. If you want to track many entries, you can save them to an array or database. In this case, you might want to find each entry through an external link, as follows:
<a href= "weblogger.php?bannerid=123" ></a>
Where "123" represents the record for "Cnet-banner.gif". Then we query the user's browser through the server variables. So we get the data we need to add new entries to our log files:
Get the CLF information of the website visitor
$host =$_server[' remote_addr '];
$ident =$_server[' remote_ident '];
$auth =$_server[' Remote_user '];
$timeStamp =date ("D/m/y:h:i:s O");
$reqType =$_server[' Request_method '];
$servProtocol =$_server[' Server_protocol '];
$statusCode = "200";
We then check to see if the server returned null (NULL). According to the CLF specification, null values should be replaced with dashes. Thus, the task of the next block of code is to look for null values and use dashes to replace it:
To add a dash to a null value (by specification)
if ($host = = "") {$host = "-";}
if ($ident = = "") {$ident = "-";}
if ($auth = = "") {$auth = "-";}
if ($reqType = = "") {$reqType = "-";}
if ($servProtocol = = "") {$servProtocol = "-";}
Once we have acquired the necessary information, these values will be organized into a format that conforms to the CLF specification:
Create a string in CLF format
$clfString = $host. " ". $ident." ". $auth." [". $timeStamp."] \ "". $reqType. "/". $fileName. " ". $servProtocol." \ "". $statusCode. " ". $fileSize." \ r \ n ";
Create a custom log file
Now, the formatted data can be stored in our custom log file. First, we will create a file naming convention and write a method (function) that produces a new log file daily. In the example presented in this article, each file starts with "weblog-", followed by a month/day/year, with a. log extension of the file name extension that typically represents the server log file. (in fact, most log parsers search for. log files.) )
To name a log file with the current date
$logPath = "./log/";
$logFile = $logPath. " weblog-". Date (" Mdy ").". Log ";
Now we need to determine if the current log file exists. If it exists, we add an entry to it, otherwise the application creates a new log file. (The creation of a new log file typically occurs when the date changes, because the filename changes.) )
Check if the log file already exists
if (file_exists ($logFile)) {
If present, open a log file that already exists
$fileWrite = fopen ($logFile, "a");}
else {
Otherwise, create a new log file
$fileWrite = fopen ($logFile, "w"); }
If you receive a "Permission Denied" error message when you write or append files, change the permissions on the target log folder to allow write operations. The default permissions for most Web servers are "readable executable." You can change the permissions of a folder by using the chmod command or by using an FTP client.
We then create a file locking mechanism so that when two or more users access the log file at the same time, only one of the users can write to the file:
To create a locking mechanism for file write operations
Flock ($fileWrite, lock_sh);
Finally, we write the contents of the entry:
Write CLF Entries
Fwrite ($fileWrite, $clfString);
Unlock file lock Status
Flock ($fileWrite, lock_un);
Close log file
Fclose ($fileWrite);
Processing log data
After the system has been customized, the customer wants to get a detailed statistical analysis of the data collected for the visitors. Since all custom log files are organized in a standard format, any log analyzer can handle them. Log Analyzer is a tool that analyzes large log files and produces pie charts, histograms, and other statistical graphics. The log analyzer is also used to collect data and to synthesize information about which users have access to your site, clicks, and so on.
A few of the more popular log parsers are listed below:
WebTrends is a very good Log analyzer for large-scale Web sites and enterprise-class networks.
Analog is a fairly popular free log analyzer.
Webalizer is a free analysis program. It can generate HTML reports so that most Web browsers can view its reports.
Comply with the standard
We can easily extend the application to allow it to support other types of logging. This allows you to capture more data, such as browser type and referrer (referrer refers to the previous page where the current page is linked). The lesson here is that following standards or conventions in your programming will eventually simplify your work.