An enemy search engine indexing process

Source: Internet
Author: User
Keywords SEO

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

This is the day blade in the "website promotion &seo" Group for some webmaster to the spider many times to retrieve their site has not been included, as well as their own web site log found that there are several spiders climb their own station problems, do some detailed answers. Through the Sky Blade agree I put the content after the release, Hehe, should also calculate original Ah!

Search engine spiders crawl pages of the general process is this.

First, the URL of the page to be indexed is collected.

Search engine spiders are generally divided into two categories, this first kind of work is to collect the Web page in the valid URL. Their task is to constantly scan Internet resources to update their search engine's large URL list for use by its second class of spiders. In other words, when this type of spider visits our web page, it's not about indexing our pages, it's looking for all the valid links in the page.

About some stationmaster in own visit log to discover multiple spider IP crawl oneself station of circumstance.

Our common search engine handles hundreds of millions of messages a day, none of the big search engine companies (Google or Baidu) are not equipped with tens of thousands of servers to do the work together, so search engines have different data centers, which means it's normal to have multiple robots to retrieve your site. However, this is limited to the first category of spiders, in the index page, the search engine will limit a particular data center to allow the spider to search your site. So friends from the server access log can often be seen from different IP spiders, in a very short period of time to visit the site frequently. But don't get too excited, maybe it's not indexing your pages at all but just scanning URLs.

By the way paste several Baidu several commonly used spider IP

15 220.181.19.

16 159.226.50

17 202.108.11

18 202.108.22

19 202.108.23

20 202.108.249

21 202.108.250

22 61.135.145

23 61.135.146

To add, the first type of spider index recorded information mainly includes the URL of the Web page, the final modification time.

Original Yesky Editor asked: I think spiders crawl, not immediately in the search, is the cache and content screening work. Different stations have different weights, and this time will not be the same. The most typical is Yesky station, the weight of high, Yesky the link on the home page, added in the morning, in the afternoon can be reflected in the search list in Baidu.

Of course it is impossible to grasp the content immediately after the embodiment, as you said a few days ago, in the page index after a release process.

Q: There is also a phenomenon, many small stations, see Spiders climbed a new page, in the short term in the search list is not. But in search engine cache server, but can search.

For some of the Web site, as long as the second category of spiders began to index the Web page, even if the entire process has not been completed, the corresponding Web page will appear in the Search engine index library, such as we are in the query on our website, It is common to see a page marked with a supplemental result that displays only the URL of a Web page, or pages that display only the page title and URL but not described, which is the normal result of the page at this stage. When the search engine really reads, analyzes, and caches the page, it can display the normal information from the cache of supplemental results.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.