Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Robots file is the Web site and spider program between the "gentleman Agreement"--robots files can not only save the site's resources, but also can help spiders more effective crawl nets, thereby improving rankings.
1: only allow Google bot
If you want to intercept all reptiles except Google bot:
user-agent:*
disallow:/
Uer-agent: Allowed spider name
Disallow:
2: The difference between "/folder/" and "/folder"
For example:
user-agent:*
disallow:/folder/
Disallow:/folder
"disallow:/folder/" means that a directory is blocked, and all files under that directory file are not allowed to be crawled, but are allowed to crawl folder.hlml.
"Disallow:/folder": All Files and folder.html under/folder/cannot be crawled.
3: "*" matches any character
user-agent:*
It means shielding all spiders. When we do the pseudo static processing, will be at the same time dynamic Web pages and static pages, Web content exactly the same as the mirror state page, so we want to screen out the dynamic page, you can use the * to screen the Dynamic Web page
user-agent:*
disallow:/?*?/
4:$ Match URL End
If you want to block URLs that end with a string, you can use $, for example, to intercept URLs ending with. asp:
user-agent:*
disallow:/*.asp$
You can also open a more excellent site to see how their robots file is written, and then according to their own needs for the corresponding changes. The robots file allows spiders to spend more time on the content to crawl, so it is necessary to optimize the file.
This article from East Yanggao FU: http://mygaofu.com, reprint please indicate the link