Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Some time ago the author of a website because robots.txt file errors were written several characters, resulting in the site is not included. At that time feel very wrong, check to check go also did not check what the problem, and then log in to Google Webmaster Center diagnostic site only to see the robots file organization any search engine spider access only to find problems, after rehabilitation, included normal.
How much do you know about the robots.txt file? Compare the following example:
1, prohibit all search engines to visit any part of the site, that is, prohibit any search engine included in your site.
User: *
Disallow:/
2. It is common usage to allow all search engines to access any part of your site.
User: *
Disallow:
Or
User: *
Allow:/
3, only prohibit a certain search engine to visit your website. Baidu is Baiduspider Google is Googlebot
User-agent:baiduspider
Disallow:/
Or
User-agent:googlebot
Disallow:/
4. Only allow a certain search engine to visit your website. Also Baidu is Baiduspider Google is Googlebot
User-agent:baiduspider
Disallow:
Or
User-agent:googlebot
Disallow:
5, if your site a directory does not want to be included in the search engine, written as follows:
User: *
Disallow:/directory name 1/
Disallow:/directory Name 2/
Disallow:/directory name 3/
Note: cannot be written as disallow:/directory name 1//directory name 2/in this form, each directory to separate one line special description.
6, prohibit search engine access to all the dynamic pages of the site (Dynamic page is any URL with "?" Page)
User: *
Disallow:/*?*
7, only allow search engine access to a specific file suffix form of the Web page.
User: *
Allow:. suffix form (e.g.. html,. htm,. php, etc.) $
Disallow:/
8, limit the search engine access to a particular file suffix form of the Web page.
User: *
Disallow:/*. suffix form (e.g.. html,. htm,. PHP, etc.)
9. Allow search engines to access pages in a specific directory
User: *
Allow:/directory 1/Directory 2 (Allow access to Web pages in Directory 2)
Allow:/directory 3/Directory 4 (Allow access to Web pages in Directory 4)
Allow:/directory 5/Directory 6 (Allow access to Web pages in directory 6)
Disallow:/directory 1/
Disallow:/Directory 3/
Disallow:/Directory 5/
10, prevent the search engine access to a particular file in the Web site file format (note is not a Web page)
User: *
Disallow:/*. (file format: gif, JPG, and so on) $
These are some of the most commonly used formats. The specific wording also depends on the requirements of each website. Write robots.txt also pay attention to the following questions:
1. robots.txt file is a txt file saved in plain text format.
2, robots.txt must be prevented in the root directory of the Web site. The top-level robots.txt file must be accessed like this: Http://www.wanseo.com/robots.txt
3, writing robots.txt should be written in strict accordance with the above case form
4, usually your site if relatively simple, then the above format is enough for you to use. If it's complicated, and you need to visit here without having to visit it, prohibit the file and allow the file to be accessed with the "?" The specific page of the symbol and so on, then you need to combine the above format to carefully study the appropriate robots.txt file for your site.
5. Robots.txt is usually available in a separate directory, but if there is a conflict with robots.txt in the top-level directory, the robots.txt command in the top-level directory.
6, only when your website contains content that you do not want to be included in search engines, you need to use robots.txt files. If you want the search engine to include all the content on the site, do not create a robots.txt file or a robots.txt file with an empty content. This is often overlooked, and actually creating empty robots.txt files is very unfriendly to search engines.
7, if you do not want to write robots.txt documents, then Google help you write. Login to Google Web site management platform, has the ability to generate robots.txt files.
8, User: *
Disallow:/
This format is not only prohibit the crawl page, more importantly, if your site is included, and then the robots.txt file modified into the above format, then your site will be deleted in the search engine, the entire deletion.
9, the META tag for the general site is dispensable, but you still got the solution:
10, need to delete some of the search engines included specific pages, reference http://www.google.com/support/webmasters/bin/answer.py?answer=35301
It looks like Google is doing just that.
This article from Anhui SEO (www.wanseo.com). Original HTTP://WWW.WANSEO.COM/NODE/47 reproduced please retain the source and the original address.