What is the main point of search engine crawl page

Source: Internet
Author: User
Keywords Network construction station personal building station
Tags content link links network network construction personal search search engine

We all know that we built the site, through Baidu Spiders to crawl the page to be included in the search engine. But you should not know what spiders are using the principle to crawl the content of your site. Spider Crawl page can be said to be divided into the following 4 working principles

One depth first crawl

What is depth first crawl? Spiders in your site crawl This page of the link will always shun this link to crawl. This page has links down the link in the grab to take down a page, the next page has a link along crawl. Until all the crawls are complete. This is the spider depth first crawl principle.

Two-width first crawl

What is width crawl? Spider a sex first crawl the entire page of your site once, then in the next page to crawl all the pages. In that case. Width crawl. Our site's outside the chain and links can not be too much. If too much, spiders will be difficult to capture all to collect.

Three weights first crawl

What is weight first crawl? In general, spiders combine depth preference with width optimization to crawl. But each have their own characteristics, the general weight of your site is good, spiders will take depth first to crawl. The weight is almost, the opposite spider will take the width first crawl way. How does the spider weigh the weight of the site? 1 is to refer to the site's external chain quantity and quality, 2 levels of the site how much. What is the level, the level refers to the site directory to the number of pages, if there are more directories and a chain of clutter, which is relatively bad. In a relatively hierarchical order. More easily let spiders crawl included.

Four revisit crawl

What is revisiting crawl? Spiders today to crawl the page again tomorrow to crawl these pages is revisit crawl. Revisit crawl can be divided into full revisit and single revisit. All revisit refers to the entire Web site, the spider will crawl again, a single revisit to a page to update the frequency of fast crawl, the general single revisit refers to a number of large sites update the frequency of new articles appear, will go out a single revisit to crawl, such as a page one months also do not update once, Then the spider came to 1 days so 2 days, so he will cut off time to crawl. Sometimes it's one months and crawling all over again. So many webmasters ask why Baidu has not been a long time to crawl included. Is that you are not constantly updated. Causing spiders not to crawl. Wait until the next time to come when you will be updated after the article all released!

In fact, Baidu's algorithm strategy has always been like this, the network is continuous progress. Maybe it was another way to crawl the page. So the webmaster are free to update the content of the site also let spiders often visit!


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.