Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Hello everyone, here is Yaan SEO optimization blog. Today we say that the search engine is included in the process taken by the capture strategy.
When the spider completes the visit to the robots.txt file, it will begin to judge whether the page entered is in conformity with the standard, if it is satisfied, then extract its content and link. After the completion of this page crawl, still not finished, the spider will follow the extracted links to explore, from this link to the next page, and from the next page of the link to climb to the next page ...
Because the Web page link structure is extremely complex, spiders need to adopt a certain strategy to crawl to all the pages on the web. The simplest search engine crawling strategy has two kinds:
1. Depth Priority strategy
As shown above, the simple point is to crawl down a line vertically, until the task is complete.
2. Breadth Priority strategy
As shown above, it is simply to crawl all the links on a given page first, and then crawl from each link in the same parallel.
In practice, these two strategies happen at the same time, theoretically, as long as enough time, search engine spiders can crawl through all the pages. But the spider's bandwidth resources, time is not unlimited, so spiders can only crawl a certain time, the higher the weight of the site natural crawling longer.
Search the purpose of spiders is to explore the value of the page and included, which is why the weight of the station crawling long time, crawl degree deep reason. Therefore, we suggest that the new station site link level should not be too deep, lest spiders crawl in a short time.
After the crawler engine spider crawling finished, will collect the Web page data to the data analysis system, the entire collection process is over. OK, today's SEO Foundation is here.
This article is from: http://www.lxmseo.com/search-engines3.html