Find search engine spiders from site IIS log favorites

Source: Internet
Author: User
Keywords IIS Spider Activity
Tags activity address analysis can make content course example file


Absrtact: In the site optimization process is not to say that all the site problems can be directly from the Webmaster tools to get information, often webmaster tools to get information are in the site after the problem can be detected. As a seoer, we need to learn more about the site



In the site optimization process is not to say that all the site problems can be directly from the Webmaster tools to get information, often webmaster tools to information is in the site after the problem can be detected. As a seoer, we need to learn the hidden information of the site. For example, what is the effect of the chain in the past few days? What are the areas where our content is more susceptible to search engine spiders? Search engine spiders for our site's crawling positive degree? And so on, these are some of the key messages that are hidden in the site's content department. This information is difficult to analyze through webmaster tools. This information can be found in our site IIS log to find the answer.



One: Why IIS logs are so important in analyzing site recessive information



1: By this diary we can more clearly analyze search engine spiders crawling information on the site, this information contains spiders crawling route and crawl depth. Through these data information, we can analyze the recent construction of the external chain effect. Because we know the chain is like a spider crawling spider silk, if the chain construction of good words, spiders crawling naturally frequent, and we can record from which "entrance" into the spider's frequency is high.



2: The content of the site to update the spider crawling with a certain relationship, generally as long as we update the stability of frequent, spiders will crawl more frequently. We can use the log in the spider visit frequency of the site content update frequency to make a fine tune.



3: Through the log we can find some of the problems of space, these failures may be some webmaster tools can not detect. For example, the near-stage very hot orange space because the technician mistakenly caused the space shielding Baidu spider events, if the webmaster analysis of the space log in advance, may be found this error.



Second: How to obtain the log file and the matters needing attention



1: To obtain the log file our space needs to have the function of IIS logging, if our space has this function, the general log file will be recorded in the Weblog folder, we can directly from this folder to download our site log files.



2: In the use of this function we need to pay attention to the creation of log time settings, the author's suggestion is if the site is a small site can make it one day, if it is relatively large site we can make it hourly updates, lest the resulting file appears too large.



Three: How to analyze and interpret spider behavior



We can notepad the way to open our site log files, using Notepad search function search Baidu and Google spiders, respectively, are Baiduspider and Googlebot.









Baidu Spider









Google Spider



We can analyze it in sections:



2012-04-5 00:47:10 was in this event dot Spider crawled into our site



116.255.109.63 This IP refers to our site



Get immediately followed by the spider crawling page, from this side we can understand what our recent pages have been crawled.



220.187.51.144 This IP search engine spider IP Address, of course this side may appear true or false two kinds of address. So how do we know if this address is true or is it a spider or a disguise? The author of this article to share a small method of their own, we can open a command window, in the window to execute nslookup+ This so-called spider's address. Join is the real spider, then there will be their own server, conversely, it is unable to find information.









Real spiders









Fake spider



So why is there a fake spider in the log? The reason is that there are other sites pseudo spiders to crawl crawl your site content. If let these false spiders rampage, will have a certain impact on the site's server consumption. We can use this method to find and block them, of course, we still need to deal with carefully, otherwise the true spider shut out is not good.



200 0 0 represents the normal status code of the Web page, and of course there are other different values of the status code, such as 500 indicates the server timeout and so on. We can use these status codes to analyze the recent performance of the site space.



We can analyze some of the most frequented pages in the log file, record them, and find out why the spiders favor the internal and external causes.



As a webmaster Most people may be familiar with those intuitive data such as flow, collection, chain analysis, and so on log file analysis may be unfamiliar. But the log is very important to the site, I hope this article can help you better analysis of log files.





Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.