Stationmaster must know: Five grammar and three attention points of the robots

Last Update:2014-12-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Robots.txt is the first file to view when visiting a Web site in a search engine. When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all search spiders will be able to access all pages that are not password protected on the site. As a webmaster, we can screen out the wrong page and some can not let spiders crawl page, then how to write the robots? The syntax of a robots has the following five points:

1, user definition search engine. In general, the site is all: User: *, here * means all, meaning that all search engines are defined. For example, I want to define Baidu, then it is user-agent:baiduspider; define Google,user-agent:googlebot.

2, Disallow prohibit crawling. For example, I would like to prohibit crawling my Help folder, that is disallow:/help/. Prohibit crawling of shenmein.html,disallow under the Help folder:/help/shenmein.html.

3, Allow allowed. As we all know, by default, are allowed. Why should we allow this grammar??? For example: I want to ban all files under the Help folder, out of the. html page, how do you write that? We know we can use disallow one by one, but that's too much time and energy. At this time, the use of Allow to solve the complex problem, wrote: Allow:/help/.html$ Disallow:/help/.

4, $ terminator. Example: Disallow: aspx$ The meaning of this sentence is to screen all files ending in. aspx, regardless of what is in front of him,/a/b/ad/ba/ddddd/eee/index.aspx this is also included.

5, * 0 or more any character. Example: Disallow: *?* This means shielding all bands "?" Files, but also masks all dynamic paths.

Understand the above five points, I believe that the preparation of the robots have no problem with you. But the following three points of attention, you must remember, otherwise it may be wasted.

First, allow the first to prohibit. Baidu rules to allow first, after the ban, Google rules to prohibit the first permit. According to Chinese understanding, Baidu's statement is more consistent with some. It turns out to be the same.

Second, in the User-agent,allow,disallow ":" Behind there is a character space, look at Baidu's robots and Google's robots screenshots.

The screenshot above, I can clearly see that there is a character of the space, remember, must be good oh.

Third, Disallow:/help/meaning is to block all Help folders, not only to prevent crawling the root directory of the Help folder below, including other folders under the folder, such as:/a/b/help/will also be blocked off. Therefore, must be careful when writing the shielding. Look at the screen shot in Google.

Through the image above, we can learn that Google has shielded the places folder under the root directory, the others are not blocked. Through the above explanation, should solve the problem of robots. The rational use of robots can bring a great effect to the website, especially the 404 error pages of the website. Make good use of robots, the optimization of the site to do a better job.

This article comes from: www.szkaiyi.com, reprint please keep, thank you!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More