Robots.txt Standard Making method

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

A website, can have robots.txt, also can not. But if there is to be, that must be done by the standard, the following personal experience for the next robots.txt production methods.

robots.txt file commands include:

Disallow-Tell the spider not to crawl some files or directories. The following code will prevent spiders from crawling all Web site files:

User: *

Disallow:/

Allow-tell spiders to crawl certain files. Allow and disallow in conjunction with the use, can tell spiders a directory, most of them do not crawl, only part of the crawl. The following code will make the spider not crawl the other files in the AB directory, but only crawl the files under the CD:

User: *

Disallow:/ab/

Allow:/AB/CD

$ wildcard-the character that matches the end of the URL. The following code will allow spiders to access URLs with the. htm suffix:

User: *

Allow: htm$

* Wildcard character-tell the spider to match any of the characters. The following code will prevent spiders from crawling all htm files:

User: *

Disallow:/*.htm

Sitemaps location-Tell the spider where your sitemap is, in the format:

Sitemap:

The three META tags supported include:

NOINDEX-Tell the spider not to index a page.

NOFOLLOW-Tell the spider not to follow the links on the page.

Nosnippet-Tell the spider not to display descriptive text in the search results.

Noarchive-Tell the spider not to show the snapshot.

NOODP-Tell the spider not to use the title and description in the Open directory.

These records or labels are now supported by three of them. One of the wildcard characters seems to have previously been Yahoo Microsoft does not support. Baidu now also supports Disallow,allow and two wildcard characters. Meta tags I did not find the official explanation of whether Baidu supports.

Only Google-supported meta tags are:

Unavailable_after-Tell spider webs when the page expires. After this date, it should not appear in the search results.

Noimageindex-Tell the spider not to index the picture on the page.

Notranslate-Tell the spider not to translate the page content.

Yahoo also supports meta tags:

Crawl-delay-the frequency at which spiders are allowed to delay crawling.

Noydir-Similar to the NOODP label, but refers to the Yahoo directory, not the Open directory.

Robots-nocontent-tells the spider that the part of the HTML that is being labeled is not part of the content of the Web page, or, in other words, tells the spider which part is the main content of the page (the content you want to retrieve).

MSN also supports META tags:

Crawl-delay

Another reminder is that when you return a 404 error, it means that the spider is allowed to crawl all the content. But when you crawl robots.txt files, and so on, and so on error, may cause the search engine does not include the website, because the spider does not know robots.txt file existence or inside has what content, this and confirms the file does not exist is not the same.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.