Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Website Robots.txt believe that as a necessary document of the website, each stationmaster has contacted, the robots file control Spider's crawl, to the file crawl has the very important significance, may prevent does not want to let the exposure file hide, the control crawls the path, thus lets the website more suitable appear in the search engine Serp interface, then, this This article introduces some practical cases of robots and explains the techniques of robots.
(1): Sequence cases of allow and disallow
The writer's statement of writing a document
User: *
Allow:/
Disallow:/abcd/
Believe that from such a paragraph can be understood, it is certainly want to screen/abcd/folder, do not let spiders crawl this folder in the site files; But after analysis found that this statement and ultimately caused by the results are very different, spiders will still crawl/abcd/folder; The reason is that the rules that spiders get from here are obtained from the top down, the rules written below do not defeat the rules written above, and if the above is explained by the scope of influence, it is because the "Allow:/" has been set, so the effect on spiders is global, Allow spiders to access all the files on this site, and in the third line, "Disallow:/abcd/", set up, is in the previous impact range, the ABCD folder is in the Site directory, also received a second effect, so, the third is invalid, then can say, In this paragraph of robots.txt, the spider can still crawl the/abcd/folder.
User: *
Disallow:/abcd/
Allow:/
This example simply reverses the order, but to get the spider to access the/abcd/folder, if you understand the explanation of the previous example, then you certainly know why? The author, in this case, because of "Disallow:/abcd/" in front, then due to the limitations of the appearance , the folder it affects is the/abcd/folder, in the third line of "Allow:/", the scope of the impact is the global, but the effect of this statement on the spider can not hinder the previous statement, so the spider would have wanted to access the entire directory through the third statement, but the third statement can not defeat the second statement , so you can only give up and access folders other than/abcd/.
User: *
Allow:/cgi-bin/see
Allow:/tmp/hi
Allow:/~joe/look
Disallow: CGI
Disallow:/tmp/
Disallow:/~joe/
In Baidu's official description of the robots, there is such an introduction, the theme is "Example 7." Allow access to some URL in a specific directory ", allow the spider to access a specific directory of a part of the URL, and then give the above statement, do not know that you understand? Explain it, in this paragraph, the spider because of the relation of authority, can pass second, three, four line, visit"/cgi-bin/see "," Tmp/hi ","/~joe/look ", the specified file, although in the next few lines, joined the disallow, but because the latter permission cannot constrain the previous one, so for spiders, you can access the URL specified by Allow. I don't know, do you understand?
(2): "/" Slash application case
User: *
Allow: CGI
Disallow:/cgi-bin
The above is the author random thought, can explain the use of the slash here, in the simple answer above, the second and third lines, one has "/", one does not, which is in the Allow statement, because there is "/" exists, so allow spiders to crawl folder "CGI" under the Web site files, And can not control whether Spiders crawl this folder, that is, "/" Control folder files, do not control the folder itself, so in the third line, the author through the disallow control not to let spiders Crawl "CGI" folder, but can not control the second statement in the permissions, Can only control spiders crawl folder, so eventually, spiders can only crawl "CGI" folder files, can not crawl "CGI" this directory.
User: *
Disallow:regnew.asp (disallow:/regnew.asp)
In the above statement, the author writes, because a lot of friends will ignore the "/" exists, if "regnew.asp" is the registration page, then if not placed "/" Specify the location of the file, the spider may be unable to find the file, causing confusion; Remember the previous author also set the file can not be accessed, If the file is in the root directory, just put on the "Disallow: ***.asp" on it, and then found that can not stop the spider, until found a "/" missing a "/", understand that the original lack of the same mark, that is "/", when we usually set the file, please remember to add "/", whether it is a file under another directory or a root directory.
I believe that through the above case for the understanding of the robots, see friends for the robots will certainly have a more profound understanding of the case, good observation of life cases, do their own station, timely collection of good webmaster, http://www.5405.net/, reproduced please keep the link, original A5.