Robots.txt is the first file to view when visiting a Web site in a search engine. The Robots.txt file tells the spider what files can be viewed on the server. When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all search spiders will be able to access all pages that are not password protected on the site. Robots.txt must be placed at the root of a site, and the filename must be all lowercase. Syntax: The simplest robots.txt file uses two rules: user-agent: Bots that apply the following rules Disallow: Web pages to intercept
Search Engine http://www.aliyun.com/zixun/aggregation/33991.html ">robots protocol, is placed in the site root directory robots.txt text file, in the file can be set search engine spider crawling rules. Set search engine spider spider Crawl content rules. Below Seoer to give examples of the rules and meanings:
First, create a robots.txt text file, place the root of the Web site, and start editing the Setup protocol file:
First, allow all search engine spiders crawl so directory files, if the file does not have content, also said to allow all spiders access, set the code as follows:
User: *
Disallow:
Or
User: *
Allow:/
Second, prohibit a search engine spiders crawl directory files, set code as follows:
User-agent:msnbot Disallow:/
User-agent:msnbot
Disallow:/
For example, want to ban MSN Spider Crawl on the set for, MSNBot on behalf of the spider, MSN, if you want to prohibit other search engines on the replacement of the spider name can be the other spider names are as follows:
Baidu Spider: Baiduspider
Google Spider: Googlebot
Tencent Soso:sosospider
Yahoo Spider: Yahoo slurp
MSN Spider: Msnbot
AltaVista Spider: Scooter
Lycos Spider: Lycos_spider_ (Rex)
Third, prohibit a directory by search engine spiders crawl, set code as follows:
User: *
Disallow:/directory Name 1/
Disallow:/directory Name 2/
Disallow:/directory Name 3/
Change the directory name to the directory you want to prohibit the search engine spider Crawl, directory name is not written to be search engine crawl.
Four, prohibit a directory by a search engine spiders crawl, set code as follows:
User: Search engine Spider name Description (above has the name of the spider) Disallow:/Directory Name/description (here set to prohibit spiders crawl directory name) For example, want to ban MSN Spider Crawl Admin folder, you can set code as follows:
User-agent:msnbot
Disallow:/admin/
Five, set some type of file is prohibited by a search engine spiders crawl, set code as follows:
User: *
Disallow:/*.htm description (where ". htm" means that the search engine spiders are prohibited from crawling all "htm"-suffix files)
Six, allow all search engine spider access to an extension of the Web page address is crawled, set code as follows:
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.