How to write Robots.txt

Source: Internet
Author: User
Keywords nbsp; search engine crawl prohibit

Robots.txt is the first file to view when visiting a Web site in a search engine. The Robots.txt file tells the spider what files can be viewed on the server.
When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all search spiders will be able to access all pages that are not password protected on the site.
Robots.txt must be placed at the root of a site, and the filename must be all lowercase.
Syntax: The simplest robots.txt file uses two rules:
user-agent: Bots that apply the following rules
Disallow: Web pages to intercept

Search Engine http://www.aliyun.com/zixun/aggregation/33991.html ">robots protocol, is placed in the site root directory robots.txt text file, in the file can be set search engine spider crawling rules. Set search engine spider spider Crawl content rules. Below Seoer to give examples of the rules and meanings:

First, create a robots.txt text file, place the root of the Web site, and start editing the Setup protocol file:

First, allow all search engine spiders crawl so directory files, if the file does not have content, also said to allow all spiders access, set the code as follows:

User: *

Disallow:

Or

User: *

Allow:/

Second, prohibit a search engine spiders crawl directory files, set code as follows:

User-agent:msnbot
Disallow:/

User-agent:msnbot

Disallow:/

For example, want to ban MSN Spider Crawl on the set for, MSNBot on behalf of the spider, MSN, if you want to prohibit other search engines on the replacement of the spider name can be the other spider names are as follows:

Baidu Spider: Baiduspider

Google Spider: Googlebot

Tencent Soso:sosospider

Yahoo Spider: Yahoo slurp

MSN Spider: Msnbot

AltaVista Spider: Scooter

Lycos Spider: Lycos_spider_ (Rex)

Third, prohibit a directory by search engine spiders crawl, set code as follows:

User: *

Disallow:/directory Name 1/

Disallow:/directory Name 2/

Disallow:/directory Name 3/

Change the directory name to the directory you want to prohibit the search engine spider Crawl, directory name is not written to be search engine crawl.

Four, prohibit a directory by a search engine spiders crawl, set code as follows:

User: Search engine Spider name Description (above has the name of the spider) Disallow:/Directory Name/description (here set to prohibit spiders crawl directory name) For example, want to ban MSN Spider Crawl Admin folder, you can set code as follows:

User-agent:msnbot

Disallow:/admin/

Five, set some type of file is prohibited by a search engine spiders crawl, set code as follows:

User: *

Disallow:/*.htm description (where ". htm" means that the search engine spiders are prohibited from crawling all "htm"-suffix files)

Six, allow all search engine spider access to an extension of the Web page address is crawled, set code as follows:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.