Want to do SEO well, be sure to search engine spiders enough to understand, to understand spiders, we must carefully analyze the log, the following I love mule based on the site's log try to do a brief analysis of spiders, the following screenshot is the previous analysis of the site log, we mainly to see Baidu Spiders and Google spiders, Other SE decisively ignores.
The depth and breadth of grasping
From the figure can be seen, Baidu Spider's visit number and crawl amount is 98600 and 224896 respectively, Google Spider's access times and crawl amount is 31157 and 172790, let us use the math teacher to calculate the average number of pages per spider visit, Baidu Spider: 224896/ 98600=2.28, Google Spiders: 172790/31157=5.54, you can see that Baidu's crawl breadth is better than Google some, and Google spiders crawl depth is significantly higher than Baidu, Baidu each spider on average to see two pages, therefore, a lot of data on a slightly larger site often appear this situation , Google included more, and Baidu included very few, for this point, do Baidu included amount of time, if the amount of data is large, internal pages of random article call this piece must do well, as to how random, their own play it.
Second, the crawl of the wrong link
The image above is a random extraction of several spiders to return the 404 status Code of the crawl page, from the figure of the callout can be seen, even if the site does not die chain, spiders will still crawl part of the dead chain, Baidu spiders like to crawl half of the URL and then be seduced to other places, and Google also has the HTM suffix to capture html, but relatively , Google's crawl error is still very small, and Baidu is a lot of reasons, it is estimated that only two companies to understand the engineers, we do not need to pay attention to this reason. For this, no matter your station has no dead chain, as far as possible to add 404 error page, because, whether you have a dead chain of wood, spiders will catch.
Third, the new station crawl
According to the previous observation of the discovery of several new station logs, Baidu Spider on a new station generally the first day will be crazy crawl, and then will be quiet for a while, and Google more honest, like step by step 1.1, more and more. The picture above is a log of my new station in June, June 15 on the day of hanging up after less than an hour Baidu home page, and crazy crawl 5,500 times, the next day began in the outbreak of silence, and Google began to crawl a little, slowly more and more (statement: The above 10 days of the log, the site did not do any outside the chain, So it will not be affected by any external cause. Often see a lot of people on the site on the first day to see Baidu a lot of crawl feel very surprised, think is the weight of their own site high, and then the next day, the spider directly to a 1800-degree adjustment, a little to see every day, and then become very depressed, I think Baidu does not like their own station, in fact, this is not the nature of Baidu Spider.
The above is just a brief analysis of Baidu and Google spiders crawl site characteristics, just the tip of the iceberg, the site log there are too many things we should go to analyze, this article from: Fuzhou seo@ I love mules, the original link: http://www.52luo.com/post/138.html, Reprint please indicate the source.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.