"Remove URLs from Google Webmaster tools, or use a robots file to screen spiders to crawl a type of link, then Google will automatically remove the content from the index library," There must be a lot of people think so, including me, but in fact this is not entirely true.
First of all, Google Admin Delete Web site tools are mainly used to delete two types of URLs: one is the 404 error URL, the other is the spider in the log files in a large number of crawling the invalid address (site internal links, with parameter links, etc.). These two types of addresses may no longer appear in the search results when we commit the deletion to both addresses. However, in the Search Engine index library, there is no longer an index library for pages with a clear HTTP return code of 404 Not Found, and the invalid address for the second spider to crawl is actually still present in the index library.
Second, using a robots file to screen spiders to crawl a type of link, is to stop Google indexing these links, but it depends on whether you are in Google index these links before using a robots file screen, or that Google index these links before using a Robots file screen, the effect of these two modes of operation is different. For the first, use a robots file to block links that you don't want to be crawled before Google indexes these links. Then this part of the link since will not be crawled, let alone will be included in the Google Index Library; for the second, after Google indexed these links, use a robots file to screen a type of link, Then this type of link will still exist in the Google Index library, but the spider will not crawl these links, in search results will not show these links, but also allows spiders to crawl in a limited amount to crawl more meaningful links.
Through the above introduction, nothing more than to explain, from the Google Index library to remove content and remove content from the search results, is two different concepts; what is removed from the index library is definitely not displayed in the search results, but the content removed from the search results may still exist in the index library. This will affect our effective statistics on the index quantity of the website.
Therefore, in order to remove content from Google's index library, then these content links can be crawled by spiders, and these content links can not be blocked by the robots file, the following 3 ways to deal with:
(1) META tag
You can add <meta name= "robots" content= "Noindex,follow" > Prevent pages from being indexed in page
(2) 404 or 410 Mark
404--found no files, queries, or URLs
This resource is no longer available on the 410--server and there is no further reference address
These two tags are usually considered the same, but there is still a slight difference: 410 tags are generally not crawled again, so the link to mark 410 will be faster than the link from tag 404 to remove content from Google's indexed library. In fact, the nuances of the two are not important, but if you have the ability to use a good 410 tag, that's a good choice.
When spiders crawl to the 404/410 error link, the site log and the Administrator tool in the crawl error part of the show, then you can delete the URL tool to delete, and those deleted content will not be indexed.
(3) 301 Redirect or otherwise
301 redirects are also a good choice for removing content from Google's index library, and can deliver most of the old link weights to new links. However, this process is relatively long, and for how long it will be able to pass the weight of the old link to the new link, the proportion of transfer weight can be accounted for, these Google did not make clear.
The above content is the author's personal opinion, if has the wrong place, welcome carries on the treatise or the discussion.