Talking about removing content from Google's index library or search results

Last Update:2017-02-28 Source: Internet

Author: User

Tags log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Remove URLs from Google Webmaster tools, or use a robots file to screen spiders to crawl a type of link, then Google will automatically remove the content from the index library," There must be a lot of people think so, including me, but in fact this is not entirely true.

First of all, Google Admin Delete Web site tools are mainly used to delete two types of URLs: one is the 404 error URL, the other is the spider in the log files in a large number of crawling the invalid address (site internal links, with parameter links, etc.). These two types of addresses may no longer appear in the search results when we commit the deletion to both addresses. However, in the Search Engine index library, there is no longer an index library for pages with a clear HTTP return code of 404 Not Found, and the invalid address for the second spider to crawl is actually still present in the index library.

Second, using a robots file to screen spiders to crawl a type of link, is to stop Google indexing these links, but it depends on whether you are in Google index these links before using a robots file screen, or that Google index these links before using a Robots file screen, the effect of these two modes of operation is different. For the first, use a robots file to block links that you don't want to be crawled before Google indexes these links. Then this part of the link since will not be crawled, let alone will be included in the Google Index Library; for the second, after Google indexed these links, use a robots file to screen a type of link, Then this type of link will still exist in the Google Index library, but the spider will not crawl these links, in search results will not show these links, but also allows spiders to crawl in a limited amount to crawl more meaningful links.

Through the above introduction, nothing more than to explain, from the Google Index library to remove content and remove content from the search results, is two different concepts; what is removed from the index library is definitely not displayed in the search results, but the content removed from the search results may still exist in the index library. This will affect our effective statistics on the index quantity of the website.

Therefore, in order to remove content from Google's index library, then these content links can be crawled by spiders, and these content links can not be blocked by the robots file, the following 3 ways to deal with:

(1) META tag

You can add <meta name= "robots" content= "Noindex,follow" > Prevent pages from being indexed in page

(2) 404 or 410 Mark

404--found no files, queries, or URLs

This resource is no longer available on the 410--server and there is no further reference address

These two tags are usually considered the same, but there is still a slight difference: 410 tags are generally not crawled again, so the link to mark 410 will be faster than the link from tag 404 to remove content from Google's indexed library. In fact, the nuances of the two are not important, but if you have the ability to use a good 410 tag, that's a good choice.

When spiders crawl to the 404/410 error link, the site log and the Administrator tool in the crawl error part of the show, then you can delete the URL tool to delete, and those deleted content will not be indexed.

(3) 301 Redirect or otherwise

301 redirects are also a good choice for removing content from Google's index library, and can deliver most of the old link weights to new links. However, this process is relatively long, and for how long it will be able to pass the weight of the old link to the new link, the proportion of transfer weight can be accounted for, these Google did not make clear.

The above content is the author's personal opinion, if has the wrong place, welcome carries on the treatise or the discussion.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More