Abstract: "Search engine spider crawling law one of the spiders how to crawl the link" write the distance today has been more than 20 days, would have been writing down, but after the first article, suddenly no idea. I talked to my friends today. The timeliness of the chain,
"Search engine spider Crawl law one of the secrets of spiders How to crawl the link" write the distance today has been more than 20 days, would have been writing down, but after the first article, suddenly no idea. Today with friends talk about the timeliness of the chain, that is, outside the chain will not fail.
This article no longer discusses the relevant content of the theory, but will give some examples to prove the first article, but also say the timeliness of the link.
First of all, the outer chain of the page was deleted, the external chain is effective?
The answer to the outside of the chain of the page deleted is still valid. The evidence is as follows:
My blog China blog (probably because the traffic exceeded in 2006 was deleted) has long been deleted, but Baidu still has a snapshot. The snapshot of the next page is gone today, but the article page still exists. Look at the snapshot date can be seen in 2006, or even longer.
That is, although the page has been deleted for 5 years, but the snapshot of Baidu did not delete, then you said that this link spider will not crawl it?
I feel that should be crawling, and I blog in the web of this blog is a domain name a link, at that time just made a jump to the blog home page. Later, when I enabled domain A to do a blog, I immediately got a good weight, and the article is very easy to accept seconds. I believe this link 5 years ago played a lot of role.
Second, if the chain on the page search engine without a snapshot, outside the chain is effective?
The answer may surprise many people, the outer chain of the page without a snapshot can still be effective. Reasons can be seen in the spider How to crawl the link in this article, Spiders crawl the page, will be the content and links, link is the URL will add a Web site index library, and spider crawling from this URL index library set off.
First look at the evidence, this evidence from Google Webmaster tools:
This screenshot from the Google Webmaster tool fault diagnosis 404 report, before I set up a BBS under the original site, of course, as early as the N years ago has been deleted. But this does not exist page, by Google Spiders Crawl Source address unexpectedly is also nonexistent page. With Google search, there is no snapshot of these pages (pictured below). Does that mean that the export links on the page that have been 404 years old are still valid?
Third, then outside the chain for the search engine is timeliness?
Obviously, it should be time-sensitive. Then I guess the reason for the chain failure, there should be two reasons: that is, the chain of the page is deleted or links deleted.
1. For the deletion of the page, the search engine should continue to crawl this page on the outside chain, until this page 4,041 set time, will give search engine URL index library a command to delete this outside the chain.
2. In the case of page changes, search engines should also crawl the outside chain, until this contains this outside the chain of snapshots within the search engine completely deleted, will give the URL index library a command to delete this outside the chain. Because the page containing the chain will be saved according to the circumstances of the N-time snapshots, which is why sometimes search different words, the snapshot of the page is different.
In short, the chain is time-sensitive, but link modification or page deletion does not mean that the invalid. Of course, the search engine will have a complex internal calculation, the process will not be as simple as I said.