Some time ago the author of a website because robots.txt file errors were written several characters, resulting in the site is not included. At that time feel very wrong, check to check to go also did not check out what the problem, later login googlehttp://www.aliyun.com/zixun/aggregation/8984.html "> Webmaster Center Diagnostic site to see the robots file organization of any search engine spider access only to find problems, after rehabilitation, included normal.
Some time ago the author of a website because robots.txt file errors were written several characters, resulting in the site is not included. At that time feel very wrong, check to check go also did not check what the problem, and then log in to Google Webmaster Center diagnostic site only to see the robots file organization any search engine spider access only to find problems, after rehabilitation, included normal.
Robots.txt is the first file to view when visiting a Web site in a search engine. The Robots.txt file tells the spider what files can be viewed on the server. When a search spider accesses a site, it first checks to see if there is a robots.txt in the root directory of the site, and if so, the search robot will determine the scope of the access according to the contents of the file; If the file does not exist, all search spiders will be able to access all pages that are not password protected on the site.
How much do you know about the robots.txt file? Compare the following example:
1, prohibit all search engines to visit any part of the site, that is, prohibit any search engine included in your site.
User: *
Disallow:/
2. It is common usage to allow all search engines to access any part of your site.
User: *
Disallow:
Or
User: *
Allow:/
3, only prohibit a certain search engine to visit your website. Baidu is Baiduspider Google is Googlebot
User-agent:baiduspider
Disallow:/
Or
User-agent:googlebot
Disallow:/
4. Only allow a certain search engine to visit your website. Also Baidu is Baiduspider Google is Googlebot
User-agent:baiduspider
Disallow:
Or
User-agent:googlebot
Disallow:
5, if your site a directory does not want to be included in the search engine, written as follows:
User: *
Disallow:/directory name 1/
Disallow:/directory Name 2/
Disallow:/directory name 3/
Note: cannot be written as disallow:/directory name 1//directory name 2/in this form, each directory to separate one line special description.
6, prohibit search engine access to all the dynamic pages of the site (Dynamic page is any URL with "?" Page)
User: *
Disallow:/*?*
7, only allow search engine access to a specific file suffix form of the Web page.
User: *
Allow:. suffix form (e.g.. html,. htm,. php, etc.) $
Disallow:/
8, limit the search engine access to a particular file suffix form of the Web page.
User: *
Disallow:/*. suffix form (e.g.. html,. htm,. PHP, etc.)
9. Allow search engines to access pages in a specific directory
User: *
Allow:/directory 1/Directory 2 (Allow access to Web pages in Directory 2)
Allow:/directory 3/Directory 4 (Allow access to Web pages in Directory 4)
Allow:/directory 5/Directory 6 (Allow access to Web pages in directory 6)
Disallow:/directory 1/
Disallow:/Directory 3/
Disallow:/Directory 5/
10, prevent the search engine access to a particular file in the Web site file format (note is not a Web page)
User: *
Disallow:/*. (file format: gif, JPG, and so on) $
These are some of the most commonly used formats. The specific wording also depends on the requirements of each website. Write robots.txt also pay attention to the following questions:
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.