Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Sometimes we will encounter such difficulties: we did not want to be the search engine included in the Web site is the search engine "ruthless" included, so long as the Google input a "backstage, management site:www.***.com", their own background address will be revealed undoubtedly, So web site security is also impossible to talk about. When this happens, how can we prevent the search engine from collecting files that we don't want to be included?
Generally at this time, we usually have two methods, one is to edit the robots.txt file, the other is in the page does not want to be included in the head of meta Name= "ROBOTS" label.
The so-called robots.txt file, is every search engine to your site after looking for and access to the first file, robots.txt is your search engine to develop a how to index your site rules. With this file, the search engine can tell which files are indexed and which are rejected in your site.
In many sites, webmasters are ignoring the use of robots.txt files. Because a lot of stationmaster thinks, own website has no secret to say, and oneself also do not use robots.txt grammar very much, so once write wrong will bring more trouble, still might as well simply do not need.
In fact, this approach is wrong. In the previous article we know that if a site has a large number of files can not find the time (404), the search engine will reduce the weight of the site. and robots.txt as a spider to visit the first file of the website, once the search engine if cannot find this file, will also record the next 404 information on his index server.
Although in Baidu's help file, there is such a phrase "Please note that only when your site contains content that you do not want to be included in the search engine, you need to use the robots.txt file." If you want search engines to include everything on your site, do not create robots.txt files. "But I personally think that building robots.txt is still necessary, even if the robots.txt file is a blank text document." Because our website is not just will be included in Baidu, but also will be included in other search engines, so, upload a robots.txt file or no harm.
How to write a reasonable robots.txt file?
First we need to understand some basic syntax for the robots.txt file.
Grammatical effects
Writing
Allow all search engines to access all parts of the site
or create a blank text document named Robots.txt
User: *
Disallow:
Or
User: *
Allow:/
Prohibit all search engines from accessing all parts of the site
User: *
Disallow:/
Prohibit Baidu index your website
User-agent:baiduspider
Disallow:/
Prohibit Google from indexing your site
User-agent:googlebot
Disallow:/
Prohibit all search engines except Google from indexing your site
User-agent:googlebot
Disallow:
User: *
Disallow:/
Prohibit all search engines except Baidu to index your site
User-agent:baiduspider
Disallow:
User: *
Disallow:/
Prohibit spiders from accessing a directory
(for example, prevent admin\css\images from being indexed)
User: *
Disallow:/css/
Disallow:/admin/
Disallow:/images/
Allow access to certain specific URLs in a directory
User: *
Allow:/css/my
Allow:/admin/html
Allow:/images/index
Disallow:/css/
Disallow:/admin/
Disallow:/images/
Use "*" to restrict access to a suffix's domain name
For example, index access to all ASP files under Admin directory
User: *
Disallow:/admin/*.htm
Use "$" to allow access only to files with a suffix in a directory