How to write Robots.txt

Last Update:2014-12-24 Source: Internet

Author: User

Keywords nbsp; search engine crawl prohibit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Robots.txt is the first file to view when visiting a Web site in a search engine. The Robots.txt file tells the spider what files can be viewed on the server.
When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all search spiders will be able to access all pages that are not password protected on the site.
Robots.txt must be placed at the root of a site, and the filename must be all lowercase.
Syntax: The simplest robots.txt file uses two rules:
user-agent: Bots that apply the following rules
Disallow: Web pages to intercept

Search Engine http://www.aliyun.com/zixun/aggregation/33991.html ">robots protocol, is placed in the site root directory robots.txt text file, in the file can be set search engine spider crawling rules. Set search engine spider spider Crawl content rules. Below Seoer to give examples of the rules and meanings:

First, create a robots.txt text file, place the root of the Web site, and start editing the Setup protocol file:

First, allow all search engine spiders crawl so directory files, if the file does not have content, also said to allow all spiders access, set the code as follows:

User: *

Disallow:

User: *

Allow:/

Second, prohibit a search engine spiders crawl directory files, set code as follows:

User-agent:msnbot
Disallow:/

User-agent:msnbot

Disallow:/

For example, want to ban MSN Spider Crawl on the set for, MSNBot on behalf of the spider, MSN, if you want to prohibit other search engines on the replacement of the spider name can be the other spider names are as follows:

Baidu Spider: Baiduspider

Google Spider: Googlebot

Tencent Soso:sosospider

Yahoo Spider: Yahoo slurp

MSN Spider: Msnbot

AltaVista Spider: Scooter

Lycos Spider: Lycos_spider_ (Rex)

Third, prohibit a directory by search engine spiders crawl, set code as follows:

User: *

Disallow:/directory Name 1/

Disallow:/directory Name 2/

Disallow:/directory Name 3/

Change the directory name to the directory you want to prohibit the search engine spider Crawl, directory name is not written to be search engine crawl.

Four, prohibit a directory by a search engine spiders crawl, set code as follows:

User: Search engine Spider name Description (above has the name of the spider) Disallow:/Directory Name/description (here set to prohibit spiders crawl directory name) For example, want to ban MSN Spider Crawl Admin folder, you can set code as follows:

User-agent:msnbot

Disallow:/admin/

Five, set some type of file is prohibited by a search engine spiders crawl, set code as follows:

User: *

Disallow:/*.htm description (where ". htm" means that the search engine spiders are prohibited from crawling all "htm"-suffix files)

Six, allow all search engine spider access to an extension of the Web page address is crawled, set code as follows:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More