How to analyze search engine crawler log

Source: Internet
Author: User

When your site encounters a problem, may analyze the reasons for this analysis, but the home analysis should be the crawler has no record of your site, if not, that your link but attract crawler; if so, you should look at the returned code, and then analyze the other reasons according to this. To find the cause of things, can more effectively solve the problem.

If you want to search Baidu's crawler, then directly in the record-keeping text document Search "Baiduspider", we extract the following line to do a detailed analysis:
2011-07-22 15:02:40 CDKKIS111 198.16.12.1 get/index.html-80-61.135.168.50 baiduspider+ (+http://www.baidu.com/search /spider.htm 0 64--bbs.szr.com
This IIS log code says: At 2011-07-22 15:02:40 this time, Baidu search engine spider (IP address is 61.135.168.50, the back of the http://www.baidu.com/search/ Spider.htm is concerned about baiduspider+) accessed the server IP for 198.16.12.1 site bbs.szr.com, and crawled the home page/index.html, the log record is saved in the CDKKIS111 folder. Among them, two places worth explaining: One is the major search engine crawler (spider) name collation as follows, we can be a condemnation:
1. Google crawler name
1.1 Googlebot: Crawl Web pages from Google's Web site index and news index
1.2 Googlebot-mobile Crawl Web page for Google's mobile index
1.3 googlebot-image: Capturing Web pages for Google's image index
1.4 Mediapartners-google: Crawl Web page to determine the content of AdSense. Google will use this rover to crawl your site only if you display AdSense ads on your site.
1.5 adsbot-google: Crawl Web pages to measure the quality of AdWords target pages. Google uses this rover only if you use Google AdWords to advertise your site.
2. Baidu Crawler Name: Baiduspider
3. Yahoo's reptile name: Yahoo slurp
4. Youdao (Yodao) spider name: Yodaobot
5. Sogou (Sogou) spider name: Sogou spider
6. MSN Spider Name: msnbot
Other search engines we use very little, we do not need to ignore.

The second is the code of 200 means that the search engine spider crawling back to the status of HTTP code, the representative successfully crawled and crawled.
The specific meaning of each numeric code is as follows:
2XX success
200 normal; The request is complete.
201 Normal; Immediately following the POST command.
202 normal; Accepted for processing, but processing has not yet completed.
203 normal; Partial information-the information returned is only part of the message.
204 normal; No response-received request, but no information to echo.

3xx redirect
301 Moved-The requested data has a new location and the change is permanent.
302 found-The requested data has a different URI temporarily.
303 See other-you can find a response to a request under another URI, and you should use the Get method to retrieve the response.
304 Not modified-the document is not modified as expected.
305 Use proxy-the requested resource must be accessed through the agent provided in the Location field.
306 unused-is no longer in use, and retains this code for future use.

Error occurred in 4xx client
400 Error request-There is a syntax problem in the request or cannot satisfy the request.
401 not authorized-the client is not authorized to access the data.
402 Payment required-Indicates that the billing system is valid.
403 prohibited-access is not required even with authorization.
404 Not Found-the server cannot find the given resource; The document does not exist.
407 Proxy authentication Request-The client must first use the proxy authentication itself.
410 The requested Web page does not exist (permanent);
415 Media type not supported-server denial of Service request because the format of the request entity is not supported.

Error occurred in 5xx server
500 INTERNAL Error-The server cannot complete the request because of an unexpected condition.
501 not executed-the server does not support the requested tool.
502 Error Gateway-server received an invalid response from the upstream server.
503 cannot get service-the server cannot process the request due to temporary overload or maintenance.

Said so much, probably a lot of SEO novice friends can not find the site log where to look, this is required to configure the IIS server Oh, the steps listed, think smart you will soon be able to learn:

First step: Open the IIS server. Select the site properties that you want to set. Select Enable logging, and select the "Format of the expanded log file for the consortium."
The second step: click on "The format of the expansion log files" after the "Properties" button, the General options, select a new log schedule for "Every day", of course, you can choose Other, choose to save the log file directory. (Note: The log file is best with the site you want to set in a directory, to avoid confusion with other site logs)
The third step: Select Advanced options. Check below the user agent (CS (user-agent)) and other options, I generally will be the bottom of the three options. In this way, you can see the name of the reptile such as Baidu.
Step Fourth: Select the tab named Home directory, and check the "Record Access" checkbox. This way your IIS logs will start to log in properly.

Hope that through the above methods to enable us to better understand the search engine, in the optimization time to achieve a multiplier effect.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.