We all know that we built the site, through Baidu Spiders to crawl the page to be included in the search engine. But you should not know what spiders are using the principle to crawl the content of your site. Spider Crawl page can be said to be divided into the following 4 working principles
One depth first crawl
What is depth first crawl? Spiders in your site crawl This page of the link will always shun this link to crawl. This page has links down the link in the grab to take down a page, the next page has a link along crawl. Until all the crawls are complete. This is the spider depth first crawl principle.
Two-width first crawl
What is width crawl? Spider a sex first crawl the entire page of your site once, then in the next page to crawl all the pages. In that case. Width crawl. Our site's outside the chain and links can not be too much. If too much, spiders will be difficult to capture all to collect.
Three weights first crawl
What is weight first crawl? In general, spiders combine depth preference with width optimization to crawl. But each have their own characteristics, the general weight of your site is good, spiders will take depth first to crawl. The weight is almost, the opposite spider will take the width first crawl way. How does the spider weigh the weight of the site? 1 is to refer to the site's external chain quantity and quality, 2 levels of the site how much. What is the level, the level refers to the site directory to the number of pages, if there are more directories and a chain of clutter, which is relatively bad. In a relatively hierarchical order. More easily let spiders crawl included.
Four revisit crawl
What is revisiting crawl? Spiders today to crawl the page again tomorrow to crawl these pages is revisit crawl. Revisit crawl can be divided into full revisit and single revisit. All revisit refers to the entire Web site, the spider will crawl again, a single revisit to a page to update the frequency of fast crawl, the general single revisit refers to a number of large sites update the frequency of new articles appear, will go out a single revisit to crawl, such as a page one months also do not update once, Then the spider came to 1 days so 2 days, so he will cut off time to crawl. Sometimes it's one months and crawling all over again. So many webmasters ask why Baidu has not been a long time to crawl included. Is that you are not constantly updated. Causing spiders not to crawl. Wait until the next time to come when you will be updated after the article all released!
In fact, Baidu's algorithm strategy has always been like this, the network is continuous progress. Maybe it was another way to crawl the page. So the webmaster are free to update the content of the site also let spiders often visit!
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.