International - English

Cart Console

Topic Center

Contact Sales

Home > Website Builders > Website Operations

Be careful not to let robots.txt block the crawl of the link

Last Update:2014-12-19 Source: Internet

Author: User

Keywords Shielding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

We know that a large part of the webmaster is looking for a way not to allow spiders to crawl their web pages on their own sites, as well as by using robot.txt files. While this is really a good practice, the problem is also present: confusion about using robot.txt to stop google/yahoo!/msn or some other search engine spiders crawling! The following brief description:

Robots.txt to prevent crawling: some URL addresses do not want to be accessed, but can still crawl and appear on the Search engine results page.

Block the noindex of meta Tags: accessible, but don't want to be crawled, and don't want to be listed in search results.

Stop crawling by stopping the links on the page: This is not a very sensible move, because there are other chains that want to index it by grabbing the page! (If you don't care if it wastes the spider's time on your page, you can do the same, but don't think it will make it out of the search engine's results page)

Here's a simple example of a robot.txt that limits spiders ' crawling but still appears in Google's search results.

(robot.txt file is also valid for child domain)

We can see that this about.com/library/nosearch/file has been blocked, as shown in the following figure when we search Google for the results of the URL address in this file:

Note that Google still has 2,760 search results in the so-called organized directory. They didn't crawl these pages, so they saw only a simple link address, no description, no title, because Google couldn't see the contents of the pages.

Let's further imagine if you have a large number of pages that you don't want to be crawled by search engines, but these URLs will be counted, and the cumulative traffic and other other unknown independent ranking factors, but they can not continue to climb down this link, So the links poured out from them can never be seen, please look at the following figure:

Here are two convenient ways to:

1. Save these link streams by using the nofollow command when linking to a directory that is blocked in robot.txt.

2. If you know which of these banned pages have a fixed link flow (especially from the chain), consider using meta Noindex,follow instead, so that the spider will skip these link streams to save time to retrieve more pages that you need on your site!

This article from Reamo Personal SEO technology, Network Promotion blog: http://www.aisxin.cn reprint please specify the source.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

How to use the following four ways to promote their own web s... 08-18

How to find a breakthrough in comparison to achieve the effec... 08-18

Old and new reasons for web site snapshot analysis and solutions 08-18

How to determine the daily number of foreign chains according... 08-18

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Be careful not to let robots.txt block the crawl of the link

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support