Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
We know that a large part of the webmaster is looking for a way not to allow spiders to crawl their web pages on their own sites, as well as by using robot.txt files. While this is really a good practice, the problem is also present: confusion about using robot.txt to stop google/yahoo!/msn or some other search engine spiders crawling! The following brief description:
Robots.txt to prevent crawling: some URL addresses do not want to be accessed, but can still crawl and appear on the Search engine results page.
Block the noindex of meta Tags: accessible, but don't want to be crawled, and don't want to be listed in search results.
Stop crawling by stopping the links on the page: This is not a very sensible move, because there are other chains that want to index it by grabbing the page! (If you don't care if it wastes the spider's time on your page, you can do the same, but don't think it will make it out of the search engine's results page)
Here's a simple example of a robot.txt that limits spiders ' crawling but still appears in Google's search results.
(robot.txt file is also valid for child domain)
We can see that this about.com/library/nosearch/file has been blocked, as shown in the following figure when we search Google for the results of the URL address in this file:
Note that Google still has 2,760 search results in the so-called organized directory. They didn't crawl these pages, so they saw only a simple link address, no description, no title, because Google couldn't see the contents of the pages.
Let's further imagine if you have a large number of pages that you don't want to be crawled by search engines, but these URLs will be counted, and the cumulative traffic and other other unknown independent ranking factors, but they can not continue to climb down this link, So the links poured out from them can never be seen, please look at the following figure:
Here are two convenient ways to:
1. Save these link streams by using the nofollow command when linking to a directory that is blocked in robot.txt.
2. If you know which of these banned pages have a fixed link flow (especially from the chain), consider using meta Noindex,follow instead, so that the spider will skip these link streams to save time to retrieve more pages that you need on your site!
This article from Reamo Personal SEO technology, Network Promotion blog: http://www.aisxin.cn reprint please specify the source.