Although a log file contains a large amount of useful information, this information can be used to the maximum extent only after deep mining. This article first discusses the information that can be obtained from the log file and the information that cannot be obtained from the log file, and then introduces several excellent log analysis tools and how to program and analyze the log file by yourself.
1. What information can be obtained (January 1, April 4)
In the previous articles in this Apache Log Series, we discussed Apache standard log files-access logs and error logs, and how to customize log files. This article discusses how to analyze log files to obtain valuable statistics.
The problem we are facing is that although the log file contains a large amount of information, it does not directly help us to manage and plan websites. To manage and plan a website, we need to know how many people have browsed the website, what they are watching, how long they have stayed, where they know the website, and so on. All this information is hidden (or possibly hidden) in the log file.
As far as website operators are concerned, they also want to know the name, address, shoe size, and credit card number of the browser, but such information cannot be obtained from log files. Therefore, as a technician, we must know how to explain to these operators: This part of information is not only impossible to obtain from log files, in addition, the only way to obtain this information is to directly ask the viewer himself and prepare for rejection.
There is a lot of information that can be recorded using log files, including:
Remote machine address: "remote machine address" is similar to "Who is browsing the website", but it is not the same. Specifically, the address of the remote machine tells us where the viewer is from, for example, buglet.rcbowen.com or proxy01.aol.com.
Browsing time: When will the browser start to visit the website? We can learn a lot from the answer to this question. If most of the site visitors visit the site between AM and am, you can believe that most of the site visitors visit the site during working hours; if most of the access records appear between PM and midnight, we can be sure that the visitor accesses the internet at home. Of course, the information available from a single access record is very limited, but if we start from thousands of access records, we can get very useful and important statistics.
Resources accessed by users: which parts of the website are most popular with users? The most popular part is what we should continue to develop. Which parts of the website are always cold? The cold parts on the website may be hidden too deeply. Maybe they do not mean much, so we have to find a way to improve them. Of course, there are still content on the website, such as legal statements. Although few people visit the website, they should not be changed at will.
Invalid link: Of course, log files can also tell us what cannot be run as we want. Is there a wrong link on the website? Is there a wrong URL when other websites are linked? Is there a CGI program that cannot run normally? Is there a search engine retrieval program that sends thousands of requests per second, thus affecting the normal service of the website? The answer to these questions can be found in the log file.