How to set up and query search engine spiders in IIS

Author: User

Yesterday happened a thing that made wood very depressing. Wood a snapshot of a website stopped at number 9th, go to the server to view the site log records! Unexpectedly did not find the spider record, thought the website wants to hang! Look carefully, the original IIS this site log property is not set to record reptiles these! Baidu has no such articles on the Internet, In order to let more brothers do not waste valuable time, wood will be the entire set of settings written out.

The settings of the log for the Web site in IIS.

Open IIS. Select the site properties you want to set. The following window pops up:


"Enable logging," Check, and select "The format of the expanded log file for the consortium."

Again click the "Properties" button here, the General options, select a new log schedule for "Every day", of course, you can choose Other, choose to save the log file directory.


According to the general situation, set up here to log, but some hosts can not find the trace of search engine crawler, similar to baiduspider+ how can not see. This time we need to enable the remaining three options!

Select advanced Options. Check below the user agent (CS (user)) and so on the following three options, so we can see Baidu Spider!


Ii. How to analyze Spiders in Web site IIS logs

First of all to understand the domestic mainstream search engine spider name:

1. Google crawler name

1 Googlebot: Crawl Web pages from Google's Web site index and news index

2 Googlebot-mobile Mobile Index crawl Web page for Google

3) Googlebot-image: Crawl Web page for Google image index

4 Mediapartners-google: Crawl Web page to determine the content of AdSense. Google will use this rover to crawl your site only if you display AdSense ads on your site.

5 adsbot-google: Crawl Web pages to measure the quality of AdWords target pages. Google uses this rover only if you use Google AdWords to advertise your site.

2. Baidu Crawler Name: Baiduspider

3. Yahoo's reptile name: Yahoo slurp

4. Youdao (Yodao) spider name: Yodaobot

5. Sogou (Sogou) spider name: Sogou spider

Because we are more concerned about Baidu at home so we have to separate Baidu Spider, from just recorded log search "baiduspider+" Select a paragraph

00:00:06 get/class/class.asp id=38 baiduspider+ (+ 200 0 214

The above log indicates that the Class/class.asp id=38 page was accessed at 0 o ' time. The spider IP address is 200 of them said the search engine spider crawling back to the status code HTTP, the representative successfully crawled and crawled.

Common numeric codes are listed below:

2XX success

200 normal;

201 normal;

202 normal; Accepted for processing, but processing has not yet completed.

203 normal; Partial information-the information returned is only part of the message.

204 normal; No response-received request, but no information to echo.

3xx redirect

301 Moved-The requested data has a new location and the change is permanent.

302 found-The requested data has a different URI temporarily.

303 See other-the response to the request can be found under another URI, and the response should be retrieved using the Get method.

304 unmodified-The document is not modified as expected.

305 Use proxy-the requested resource must be accessed through the agent provided in the Location field.

306 unused-is no longer in use;

Error occurred in 4xx client

400 Error request-There is a syntax problem in the request or cannot satisfy the request.

401 Unauthorized-The client is not authorized to access the data.

402 Payment required-Indicates that the billing system is valid.

403 prohibited-access is not required even if authorized.

404 Not Found-the server could not find the given resource;

407 Proxy authentication Request-The client must first use the proxy authentication itself.

410 The requested Web page does not exist (permanent);

415 Media type not supported-server denial of Service request because the format of the request entity is not supported.

Error occurred in 5xx server

500 INTERNAL Error-The server cannot complete the request because of an unexpected condition.

501 not executed-the server does not support the requested tool.

502 Error Gateway-server received an invalid response from the upstream server.

503 cannot get service-the server cannot process the request due to temporary overload or maintenance.

Understanding the crawling traces of spiders helps us analyze our website. Wooden wood talents, on the simple summary!

