Absrtact: In the site optimization process is not to say that all the site problems can be directly from the Webmaster tools to get information, often webmaster tools to get information are in the site after the problem can be detected. As a seoer, we need to learn more about the site
In the site optimization process is not to say that all the site problems can be directly from the Webmaster tools to get information, often webmaster tools to information is in the site after the problem can be detected. As a seoer, we need to learn the hidden information of the site. For example, what is the effect of the chain in the past few days? What are the areas where our content is more susceptible to search engine spiders? Search engine spiders for our site's crawling positive degree? And so on, these are some of the key messages that are hidden in the site's content department. This information is difficult to analyze through webmaster tools. This information can be found in our site IIS log to find the answer.
One: Why IIS logs are so important in analyzing site recessive information
1: By this diary we can more clearly analyze search engine spiders crawling information on the site, this information contains spiders crawling route and crawl depth. Through these data information, we can analyze the recent construction of the external chain effect. Because we know the chain is like a spider crawling spider silk, if the chain construction of good words, spiders crawling naturally frequent, and we can record from which "entrance" into the spider's frequency is high.
2: The content of the site to update the spider crawling with a certain relationship, generally as long as we update the stability of frequent, spiders will crawl more frequently. We can use the log in the spider visit frequency of the site content update frequency to make a fine tune.
3: Through the log we can find some of the problems of space, these failures may be some webmaster tools can not detect. For example, the near-stage very hot orange space because the technician mistakenly caused the space shielding Baidu spider events, if the webmaster analysis of the space log in advance, may be found this error.
Second: How to obtain the log file and the matters needing attention
1: To obtain the log file our space needs to have the function of IIS logging, if our space has this function, the general log file will be recorded in the Weblog folder, we can directly from this folder to download our site log files.
2: In the use of this function we need to pay attention to the creation of log time settings, the author's suggestion is if the site is a small site can make it one day, if it is relatively large site we can make it hourly updates, lest the resulting file appears too large.
Three: How to analyze and interpret spider behavior
We can notepad the way to open our site log files, using Notepad search function search Baidu and Google spiders, respectively, are Baiduspider and Googlebot.
Baidu Spider
Google Spider
We can analyze it in sections:
2012-04-5 00:47:10 was in this event dot Spider crawled into our site
116.255.109.63 This IP refers to our site
Get immediately followed by the spider crawling page, from this side we can understand what our recent pages have been crawled.
220.187.51.144 This IP search engine spider IP Address, of course this side may appear true or false two kinds of address. So how do we know if this address is true or is it a spider or a disguise? The author of this article to share a small method of their own, we can open a command window, in the window to execute nslookup+ This so-called spider's address. Join is the real spider, then there will be their own server, conversely, it is unable to find information.
Real spiders
Fake spider
So why is there a fake spider in the log? The reason is that there are other sites pseudo spiders to crawl crawl your site content. If let these false spiders rampage, will have a certain impact on the site's server consumption. We can use this method to find and block them, of course, we still need to deal with carefully, otherwise the true spider shut out is not good.
200 0 0 represents the normal status code of the Web page, and of course there are other different values of the status code, such as 500 indicates the server timeout and so on. We can use these status codes to analyze the recent performance of the site space.
We can analyze some of the most frequented pages in the log file, record them, and find out why the spiders favor the internal and external causes.
As a webmaster Most people may be familiar with those intuitive data such as flow, collection, chain analysis, and so on log file analysis may be unfamiliar. But the log is very important to the site, I hope this article can help you better analysis of log files.