--apache log analysis of user behavior analysis (i.)

Source: Internet
Author: User
Tags apache log

Apache Log Analysis

Our company's Apache log type is a mixed class of "Combined log Format", which is the official name of Apache. Structure as follows;

"%h%l%u%t/"%r/"%>s%b/"%{referer}i/"/"%{user-agent}i/"

If you want to see the Apache log for more detailed information see: http://httpd.apache.org/docs/1.3/logs.html

I. Interpretation of%h%l%u%t/"%r/"%>s%b

Give an example to explain

216.35.116.91--[19/aug/2000:14:47:37-0400] "get/http/1.0" 200 654

%h 216.35.116.91: The address of the remote host, that is, it indicates who is visiting the site;

%l%u--
%t [19/aug/2000:14:47:37-0400]: the requested time;

/"%r/" Get/http/1.0:http request type and requested page, there is no specific page;

%>s 200: Status code (500,404 and so on);

%b 654: Represents the total number of bytes sent to the client. (That is, the value is the same size as the file). Adding these values to the log records will tell you how much data the server sends in a day, week, or month.

Ii./"%{referer}i/"/"%{user-agent}i/"

127.0.0.1-frank [10/oct/2000:13:55:36-0700] "Get/apache_pb.gif http/1.0 2326" "" mozilla/4.08 [en] (Win98; I; NAV) "

/"%{referer}i/" is a popular saying is the source of the request.

/"%{user-agent}i/" client's browser for some information

The General crawler's logo information is exposed in these two pieces, the following will be detailed description.


Look at the journal text One,

10.29.101.4

- -

[26/dec/2008:00:01:37 +0800]

"Get/group_list.php?groupclass=6&countryid=1&provinceid=33 http/1.0"

200

20044

"-"

"msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

401

20393


"-" is a typical trait left by a reptile, perhaps you would say: "Not under the"-"is not the obvious" "MSNBOT/1." That is not a description of the bot, is the robot robot, a look at it is a reptile, but in consideration of the execution of the program overhead if less than some text, To see whether the content of "%{referer}i/" is "-" can determine whether it is a reptile is not very good. At least can be less than/"%{user-agent}i/",/"%{user-agent}i/" also need regular expressions to compare, the following introduction;

So if/"%{referer}i/" content is not "-", see text Two

10.29.101.4--[26/dec/2008:00:10:14 +0800]

"Get/group_view.php?hid=329956 http/1.0"

200

4576

"http://group.woyo.com/group_view.php?hid=329956" "sosospider+ (+http://help.soso.com/webspider.htm)"

362

4960

Here/"%{referer}i/" content is not "-", but a look also know is a reptile, from/"%{user-agent}i/" content can be seen,

"Sosospider+ (+http://help.soso.com/webspider.htm)", is the spider spider, here to deal with more trouble, need to collect all the reptiles, if some of the new did not collect, that there is no way, And this collection of reptile information to write regular expression is not a can, may write a lot;







Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.