A comparison of the full robots.txt introduction

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Some time ago the author of a website because robots.txt file errors were written several characters, resulting in the site is not included. At that time feel very wrong, check to check go also did not check what the problem, and then log in to Google Webmaster Center diagnostic site only to see the robots file organization any search engine spider access only to find problems, after rehabilitation, included normal.

How much do you know about the robots.txt file? Compare the following example:

1, prohibit all search engines to visit any part of the site, that is, prohibit any search engine included in your site.

User: *

Disallow:/

2. It is common usage to allow all search engines to access any part of your site.

User: *

Disallow:

Or

User: *

Allow:/

3, only prohibit a certain search engine to visit your website. Baidu is Baiduspider Google is Googlebot

User-agent:baiduspider

Disallow:/

Or

User-agent:googlebot

Disallow:/

4. Only allow a certain search engine to visit your website. Also Baidu is Baiduspider Google is Googlebot

User-agent:baiduspider

Disallow:

Or

User-agent:googlebot

Disallow:

5, if your site a directory does not want to be included in the search engine, written as follows:

User: *

Disallow:/directory name 1/

Disallow:/directory Name 2/

Disallow:/directory name 3/

Note: cannot be written as disallow:/directory name 1//directory name 2/in this form, each directory to separate one line special description.

6, prohibit search engine access to all the dynamic pages of the site (Dynamic page is any URL with "?" Page)

User: *

Disallow:/*?*

7, only allow search engine access to a specific file suffix form of the Web page.

User: *

Allow:. suffix form (e.g.. html,. htm,. php, etc.) $

Disallow:/

8, limit the search engine access to a particular file suffix form of the Web page.

User: *

Disallow:/*. suffix form (e.g.. html,. htm,. PHP, etc.)

9. Allow search engines to access pages in a specific directory

User: *

Allow:/directory 1/Directory 2 (Allow access to Web pages in Directory 2)

Allow:/directory 3/Directory 4 (Allow access to Web pages in Directory 4)

Allow:/directory 5/Directory 6 (Allow access to Web pages in directory 6)

Disallow:/directory 1/

Disallow:/Directory 3/

Disallow:/Directory 5/

10, prevent the search engine access to a particular file in the Web site file format (note is not a Web page)

User: *

Disallow:/*. (file format: gif, JPG, and so on) $

These are some of the most commonly used formats. The specific wording also depends on the requirements of each website. Write robots.txt also pay attention to the following questions:

1. robots.txt file is a txt file saved in plain text format.

2, robots.txt must be prevented in the root directory of the Web site. The top-level robots.txt file must be accessed like this: Http://www.wanseo.com/robots.txt

3, writing robots.txt should be written in strict accordance with the above case form

4, usually your site if relatively simple, then the above format is enough for you to use. If it's complicated, and you need to visit here without having to visit it, prohibit the file and allow the file to be accessed with the "?" The specific page of the symbol and so on, then you need to combine the above format to carefully study the appropriate robots.txt file for your site.

5. Robots.txt is usually available in a separate directory, but if there is a conflict with robots.txt in the top-level directory, the robots.txt command in the top-level directory.

6, only when your website contains content that you do not want to be included in search engines, you need to use robots.txt files. If you want the search engine to include all the content on the site, do not create a robots.txt file or a robots.txt file with an empty content. This is often overlooked, and actually creating empty robots.txt files is very unfriendly to search engines.

7, if you do not want to write robots.txt documents, then Google help you write. Login to Google Web site management platform, has the ability to generate robots.txt files.

8, User: *

Disallow:/

This format is not only prohibit the crawl page, more importantly, if your site is included, and then the robots.txt file modified into the above format, then your site will be deleted in the search engine, the entire deletion.

9, the META tag for the general site is dispensable, but you still got the solution:

10, need to delete some of the search engines included specific pages, reference http://www.google.com/support/webmasters/bin/answer.py?answer=35301

It looks like Google is doing just that.

This article from Anhui SEO (www.wanseo.com). Original HTTP://WWW.WANSEO.COM/NODE/47 reproduced please retain the source and the original address.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.