Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Robots.txt is the first file to view when visiting a Web site in a search engine. When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all search spiders will be able to access all pages that are not password protected on the site. As a webmaster, we can screen out the wrong page and some can not let spiders crawl page, then how to write the robots? The syntax of a robots has the following five points:
1, user definition search engine. In general, the site is all: User: *, here * means all, meaning that all search engines are defined. For example, I want to define Baidu, then it is user-agent:baiduspider; define Google,user-agent:googlebot.
2, Disallow prohibit crawling. For example, I would like to prohibit crawling my Help folder, that is disallow:/help/. Prohibit crawling of shenmein.html,disallow under the Help folder:/help/shenmein.html.
3, Allow allowed. As we all know, by default, are allowed. Why should we allow this grammar??? For example: I want to ban all files under the Help folder, out of the. html page, how do you write that? We know we can use disallow one by one, but that's too much time and energy. At this time, the use of Allow to solve the complex problem, wrote: Allow:/help/.html$ Disallow:/help/.
4, $ terminator. Example: Disallow: aspx$ The meaning of this sentence is to screen all files ending in. aspx, regardless of what is in front of him,/a/b/ad/ba/ddddd/eee/index.aspx this is also included.
5, * 0 or more any character. Example: Disallow: *?* This means shielding all bands "?" Files, but also masks all dynamic paths.
Understand the above five points, I believe that the preparation of the robots have no problem with you. But the following three points of attention, you must remember, otherwise it may be wasted.
First, allow the first to prohibit. Baidu rules to allow first, after the ban, Google rules to prohibit the first permit. According to Chinese understanding, Baidu's statement is more consistent with some. It turns out to be the same.
Second, in the User-agent,allow,disallow ":" Behind there is a character space, look at Baidu's robots and Google's robots screenshots.
The screenshot above, I can clearly see that there is a character of the space, remember, must be good oh.
Third, Disallow:/help/meaning is to block all Help folders, not only to prevent crawling the root directory of the Help folder below, including other folders under the folder, such as:/a/b/help/will also be blocked off. Therefore, must be careful when writing the shielding. Look at the screen shot in Google.
Through the image above, we can learn that Google has shielded the places folder under the root directory, the others are not blocked. Through the above explanation, should solve the problem of robots. The rational use of robots can bring a great effect to the website, especially the 404 error pages of the website. Make good use of robots, the optimization of the site to do a better job.
This article comes from: www.szkaiyi.com, reprint please keep, thank you!