Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Since I have been engaged in SEO site optimization, I slowly pay attention to the site log. Log analysis of the site to become their daily work of the first thing every day to the company the first time is the site of yesterday's log analysis. Perhaps some webmasters will not analyze the site log, this can refer to the crown before the small series published "A Little understanding of Web site log analysis, the site more secure," but there are also a small part of the analysis of the site log is a waste of time, every day looking at what the role of the data. There are at least four points that we can learn.
First, determine if there are spiders crawling over
How to determine if spiders come crawling:
1, by watching the site log code to watch, this analysis of the master
2, through the Web site Log analysis tool to watch, this more suitable for beginners to use
Through the Web site Log analysis tool can directly see what the site has been crawling spiders crawl.
Second, the client IP know spider type.
1, 220.181.108.*ip paragraph Baidu spider (right spider)
2, 123.125.71.*ip paragraph Baidu spider (inferior article catch spider)
3, 123.125.68.*ip paragraph Baidu spider (Study spider)
4, 117.28.255.*IP paragraph Baidu spider (fake spider)
5 、。。。。。。
Third, check the status of the page is normal or not
Through the URL log we can direct server response code to see our pages which have problems, which are normal. In general, the return of the status code is 200, the normal, appear 404, the description page problems.
Four, the search engine to the site friendly degree
From the website log, we can directly see the spider to our site's crawling times, the more crawling times that the spider on our site friendlier.
Through the above figure we can directly see the spider on our site crawling times, but the number of crawling inside there are also fake spiders, so we also need to confirm through the client IP which is the real spider, which is counterfeit. This can see "a little understanding of Web site log analysis, the site more secure," There are graphics and text tutorials, introduce how to distinguish between true and false spiders, here is no longer introduced.
Through the above three points of understanding, Crown Net hg-seo.com Small series on the above four points for further detection:
For the first, we can directly see which pages have been crawled and crawled, and which are not. With the continuous updating of the algorithm, the new station of the investigation period is getting longer, so many new webmaster updated article through the site: domain name, check included no show. This is largely because search engines are stranded and not released in time.
For the above second, we can through the customer IP to identify the site security information and article content quality how
According to the different IP we can analyze the site is what kind of state, the following common Baidu Spider IP:
1, 123.125.68.* often come, others come less, then the site into the sandbox, or the possibility of being down right is very high.
2, 220.181.68.* per day only increase did not reduce, it is to enter the sandbox or down the right of omen.
3, 220.181.7.*, 123.125.66.* search engine began to crawl things.
4, 121.14.89.* out of the new station inspection period.
5, 203.208.60.* site is not normal.
6, 210.72.225.* this IP segment uninterrupted patrol stations.
7, 220.181.108.* high-quality article content page or homepage crawl.
Generally successful crawl return code are 200 0 0 return, if the return status of 304 0 0 on behalf of the site did not update the spider, but did not crawl. If it is 200 0 64, then don't worry, this is just some dynamic page crawl.
For the above third, if the server returns status, such as 200 for normal access. 404 indicates that the page does not exist. 304 represents the Web page has not been updated. These can be seen through the code in the website log directly. If there is a large number of 404, then it is very necessary to take action on these 404 pages. We can use the Robots.txt protocol to screen these pages, not to allow search engines to crawl this page.
For the above four, the real spider to the more times the better!
OK, the website log is introduced inside. Web logs can be obtained in the following two ways:
1, FTP Space log folder
2, login to your site's server, usually site log location c:windows-system32-logfiles
Respect the original, indicate the source, you are in the Internet environment to make their own efforts to purify. This article by Crown Network http://www.hg-seo.com/huangguanseo/120.html Operation Department compiles the publication, the article first Crown Marketing Network-pays attention to the Seo,sem optimization development trend sharing Marketing Promotion Skill blog, the second hair A5 stationmaster NET, the original article, Reprint please leave the original link, thank you with ~