Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
There are times when you will be surprised to find that the pages that are not included in the robots.txt are in the search engine results list, especially when you are at site: the easiest to find. This is you should not panic, to doubt whether robots.txt grammar rules wrong.
Why is the page banned in robots.txt in search results?
Robots.txt prohibit crawling of file search engines will not be accessed and not crawled. However, it is important to note that the URL that is robots.txt forbidden to crawl is still likely to appear in the search results, as long as there is an import link to the URL, the search engine knows the existence of the URL, although will not crawl page content, but may appear in the following forms in the search results:
Only URLs are displayed with no title, description.
Displays the title and description of the development catalog or important catalogs such as Yahoo.
Displays the import link anchor text as a caption and description.
The important reason for this is that search engines do not record the page, but because there are a lot of links to the page, so it is considered that the page is valuable, and the user may be in the search for the purpose of the behavior is closely related, so show it, but in order to respect the webmaster, do not display the details of the URL.
As pictured above, Google will still be the site of a jump link displayed in the search results, and under the Web site prompts "because of this site robots.txt, the system does not provide ...", and the result of the title is not the <title> of this jump page, but to link to this jump page anchor text, You can try the address above to verify the effect.
How to truly prohibit the inclusion
Here to talk about is not "prohibited", but "no index", the above situation is prohibited included, but the search engine still index the Web page, when the user needs to return the information it believes reliable. After being added to the robots.txt as a rule, in order to completely remove these pages from the search engine results list, there are several ways we can do this.
1. Use meta-robots tag
In this is not like included (indexed) page header plus the following code
<meta name= "ROBOTS" content= "noindex,nofollow,noarchive"/>
Where noindex is prohibited from indexing this page, the search engine will not return this page as a result. Noarchive refers to do not establish snapshots, Baidu support it, Baidu seems to not support NOINDEX. Nofollow refers to spiders will not follow the link on this page to continue crawling, and will not pass the weight of this page. Note that the link flow and transfer weights are not equivalent to forbidden indexes, that is, if you only use the following code, the page will not be returned to the search results, but the link on the page will still be crawling with the spider, the weight will still follow them down.
<meta name= "ROBOTS" content= "noindex,noarchive"/>
2, link to this page of the anchor chain to add rel= "nofollow"
Since you have not been in the robots.txt to crawl this page, you certainly do not want to let other links to this page, so you can add rel= "nofollow" on the link, so the spider will not follow this link to crawl to the page you are not allowed to crawl, and will not put the weight to the page. But if the anchor is written by someone else on their own website, there is no way, only the first method can be used.
What is the impact of this phenomenon on SEO
First, we need to be clear about whether the pages should be banned. There are a number of reasons for the Web page is prohibited, perhaps you do not want others to see your privacy, perhaps because the content of the page is not important, perhaps this page is like mine is a jump page, preferably not included. However, this does not include whether the SEO has a better impact is my concern, if not included in the impact of bad SEO, and those pages do not matter, then we let it be included. However, this is not included in the impact of the two sides, sometimes good, sometimes bad, to see how to operate.
If because of the value of the page, but not the record page, it is not, but if as a result of the ban on this page included, also lost link flow of the transfer link, the site within the broken link, resulting in the weight transfer here interrupted or even disappeared, then the bad effect. For example, an online message page, webmaster feel that it is not important, also do not want to show the content in the search results, so prohibit it included, but a site structure of the page only from this page can enter, then, this page will not be able to search engine crawled, But it is because this page of the site structure in tandem, is a very critical page, so that the site is facing a huge loss. Another situation is that a large number of Web pages linked to this page, it concentrated a large number of weights, but you have not been included, so that it gets these weights are wasted, also do not pass to other pages, this is the weight black hole.
Of course, there are good effects, such as the above my site's jump page, I will ban them from the collection, they will not appear in the search results, users can not enter this meaningless page, it is not possible to enter my site, has not stopped for half a second to enter someone else's website. In addition, I added rel= "nofollow" on the link to import these jump pages, which prevented the page from being crawled, and also avoided transferring weights to other websites.
But some people use this robots.txt feature to deceive search engines, for example, someone made an H page, because its content is bad, so he used robots.txt to hide it, but do not prohibit the inclusion, and in other sites do some outside the chain, and these chain anchor text and the content of the Web page has no relationship. Under this kind of operation, I will appear in the picture given above, the title of the search results is the title of the anchor text, but the actual page content is another content, in order to deceive the search engine and user purposes.