Status code for website log analysis (200/301/404/302/500)

Source: Internet
Author: User
Tags win32 apache log


1. Introduction to spider names

In website logs, spider names generally include the following types: baidu-> baiduspider, Google-> Googlebot, Msn-> msnbot, yahoo-> Slurp, yodao-> YoudaoBot, sogou-> Sogou + get + spider. In the log, you only need to search for the above Spider name to see the crawling trace of this spider type.

2. Crawler return type

After crawling, the spider will return code. By viewing the loan status, you can see the crawling result. The main HTTP status codes include:

(1) code 200 indicates that crawlers can crawl normally.

(2) code 304 indicates that the content has not been updated since the previous capture. This value is often returned for website images.

(3) code 404. The Accessed link is an incorrect link. This error link, on the one hand, comes from the original existence and then deleted the web page, on the other hand, may come from the original does not exist, but other people Chain such a dead link.

(4) code 302 indicates temporary redirection.

(5) code 301 indicates permanent redirection.

(6) code 500 indicates a program error.

3. Log code interpretation

# Software: Microsoft Internet Information Services 6.0

# Version: 1.0

# Date: 16:00:39

# Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs (User- agent) cs (Cookie) cs (Referer) cs-host SC-status SC-substatus sc-win32-status SC-bytes cs-bytes time-taken

Date indicates the access record date;

Time access time;

S-sitename indicates the name of your VM.

S-ip visitor IP;

Cs-method indicates the access method. There are two common methods: GET, which is the action for opening a URL, POST, and form submission;

Cs-uri-stem is the file to access;

Cs-uri-query refers to the parameters attached to the access address, such as asp files? The string id = 12 and so on. If there is no parameter, it is represented;

The port accessed by s-port;

Cs-username: Visitor name;

C-ip source ip address;

Cs (User-Agent) access source;

SC-status, 200 indicates successful, 403 indicates no permission, 404 indicates that the page cannot be accessed, and 500 indicates that the program is wrong;

The size of the byte that the SC-substatus server sends to the client;

The size of bytes that the cs-win32-statu client sends to the server;

Case study:

2013-12-22 18:47:12 W3SVC2137573334 D-901195C886694 119.147.151.150 GET/. aspx id = 2230 & TypeId = 91 80-123.125.71.28 HTTP/1.1 Mozilla/5.0 + (compatible; + Baiduspider/2.0; ++ http://www.baidu.com/search/spider.html)---www.111cn.net 200 0 0 59004 243 2250

This log indicates that the crawling type of a spider is Baidu Spider, GET/. aspx id = 2230 & TypeId = 91 indicates that the crawler file name is. aspx id = 2230 & TypeId = 91. 200 is returned.


Tips

If you want to analyze website logs more accurately, you can try to use dedicated tools for analysis, such as iis log analysis tools and apache log professional analysis tools.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.