Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
In order to better observe the Web site by spiders crawling law, I rented the server did not provide access to the log, had to, spent a lot of time to write a special analysis of spider crawling based on PHP program, after three months of observation of several target sites, the following several small experience to share, of course, due to limited research , there must be insufficient or wrong place, please don't throw bricks at me.
I. Baidu Spider
During this period I went to two new websites and found that Baidu Spider general 1-3 days can crawl to the home page, began to update very fierce, about will last two days to one weeks, three days after the site in Baidu to the home page, although Baidu spider crawling tens of thousands of pages, but often will only include several pages, two weeks later, Baidu will crawl only one or two times a day home, other pages rarely crawl, the process will continue for some time, long is a few months, a short few days. But Baidu in this period of time included in the amount will increase. This period of time may be the study period. In this period of time, I was a station by Baidu K, spiders will not come. After this time period, Baidu Spider visit will tend to stability, I have two stations every day only to crawl 200 to 300 times, the amount of change is not small. And I another station shop.hhbmw.com probably because the chain more, Baidu spider come relatively diligent, nearly one months, visit 20,000 to 80,000 times a day, fluctuation is relatively large, but, site, Baidu is not high, this may be to the next big update Baidu will respond to the results.
Baidu spiders to visit the target page, will be the URL of the character character into Chinese characters, such as http://shop.hhbmw.com/proview/%E9%99%86%E5%BB%BA%E5%86%9B88/ 6c318ea2660bcc4b73b220e16edf96b3.htm will become http://shop.hhbmw.com/proview/Lu Jianjun 88/ 6c318ea2660bcc4b73b220e16edf96b3.htm, that is, "%e9%99%86%e5%bb%ba%e5%86%9b88" converted to "Lu Jianjun 88", so that there will be a problem, if the host to the Chinese URL does not support, May affect the inclusion of Baidu.
Baidu Spider visit a site, its visit also has certain rules, many are according to the Chinese character sequencer to visit.
Second, Google spider
Google spiders find the new site quickly, but included relatively smooth, daily crawl page number is also more stable, the higher the PR, the more external chain of the site update faster. Conversely, GOOGLE PR low web site update slower.
Three, search 捭, Sogou, Youdao spider
Update faster, but not too stable, daily access fluctuations are also relatively large, more elusive than Baidu, I have a station was searched and Sogou All k have only left home.
Four, Yahoo, MSN
Yahoo update fast, but included less, MSN update extremely slow.
For robots.txt support, Baidu, Google, search 捭, Sogou, Yahoo, MSN and other spider support is better, the Crawl-delay grammar can also be very good support.
and Youdao spiders basically ignore robots.txt crawl-delay grammar.
With today's access log screenshot: