First, let's talk about the origins of GOOGLE spider:
When GOOGLE's search engine was first established, it had this very powerful server,
It releases a large number of spider every day. We call it the No. 1 Spider. Its crawling speed is very fast,
The daily collection of information on the entire Internet shows how fast the server is. In fact, the most important thing is that GOOGLE
It extends servers to many cities, so now you can find that GOOGLE's computing speed is ahead of time.
The server classifies and sorts the collected information to a large database.
One of these databases is used to store website domain names.
As long as the domain name is indexed by the search engine, it will be automatically stored in this database.
This database is the core of spider 1.
It is divided into 10 small databases with various PR levels. Although it is a small database, it is also terrible!
Databases of 10 levels have different cycles.
Basically, for a website with PR = 4, the probability that spider 1 crawls is also 7 days.
Therefore, you will also find that the record is recorded on a day within seven days.
Careful webmaster will find that sometimes 7 days is quite accurate, but only for PR = 4
The higher the PR, the shorter the cycle, and the lower the PR, the longer the cycle,
Of course, when talking about this, many webmasters may have such doubts. They may feel that the spider sometimes collects his website on a daily basis.
Here, we will include the next article about spider 2.
Spider 2 is usually released during crawling,
It is mainly used for external links of websites crawled by spider 1.
PS since it is said that the No. 2 spider must be much smaller than the No. 1 crawlers.
★Of course, not just the 2, but also the 3
The so-called No. 3 crawls site A and site 1 to site B, and site B crawls site 2 to Site C.
Currently, in order to limit its infinite circulation, GOOGLE only divides the spider into three levels and has a clear standard for its level of crawling rate.
In addition, Spider 2 and Spider 3 have the characteristics of crawling in chronological order.
★For example:
The last article on website A crawled by spider 1 is
When website A is crawled by the No. 2 Spider from another website, it is possible that
Several articles recently published, such as and May 30, will carry out 2nd and 3rd visits.
And then crawls information after-1. If your website does not have any updates, it crawls the changes in the last month twice.
If there are more Spider 2 and 3 from outside, the same article may be crawled several times.
The following are official data provided by GOOGLE: <Secret>
★1 Spider
The basic capture rate is between 5% and ~ 10%
There is no import link based on PR = 0 and the submission may be crawled for 6 months ~ 12 months
There is no import link based on PR = 1 and it is possible to be crawled every time for 4 months ~ 8 months
There is no import link based on PR = 2 and the submission may be crawled for a period of 2 months ~ 4 months
Based on PR = 3, there is no import link or the submission may be crawled for one month ~ 2 months
Based on PR = 4, there is no import link and the cycle of the captured zone may be one week ~ 1 month
Of course, websites without any import links cannot do PR = 4
The maximum value is PR = 3.
The above data is only a base provided by GOOGLE.
This means that spider 1 takes the initiative to crawl the number of cycles on your website.
Crawlers 2 or 3 crawl your website based on your import link.
So you will find that your website is sometimes updated every day.
★Spider 2
The basic capture rate is 2.5% ~ 5% <re-collect based on the data records crawled by the 1 Spider, and re-access before and after the last collection date>
★Spider 3
The basic capture rate is 1.25% ~ 2.5% <re-collect the data records of spider 1 and Spider 2, and re-access the data before and after the last collection date>
GOOGLE currently has three levels of spider
Of course, spider has different Spider
Here, only webpage spider is here, because I am only interested in this.