Analysis Robots.txt rule misunderstanding Baidu and googlerobots tools use

Source: Internet
Author: User
Tags root directory

Some time ago wrote a robots.txt document how to write, but through the actual observation, some friends of robots.txt file rules still have a certain misunderstanding.

For example, a lot of people write this:

User-agent: *
Allow:/
Disallow:/mulu/

I do not know if you have seen, this rule is actually not working, the first sentence allow:/refers to allow spiders to crawl all content, the second sentence disallow:/mulu/refers to the prohibition/mulu/all content below.

On the surface, this rule is intended to be achieved by allowing spiders to crawl all pages except/mulu/sites.

But the search engine spider executes the rules from top to bottom, which can cause the second command to fail.

The correct rule should be:

User-agent: *
Disallow:/mulu/
Allow:/

That is, to execute the Prohibition command first, then execute the Allow command, so that it will not fail.

In addition to Baidu Spider, there is an easy to make mistakes, that is disallow command and allow command after the slash/start, so some people write: Disallow: *.html This is wrong for Baidu Spider, should be written: disallow: *. Html.

Sometimes we write these rules may have some not noticed the problem, now can through Baidu Webmaster Tools (zhanzhang.baidu.com) and Google Webmaster tools to test.

Relatively speaking, Baidu Webmaster tools are relatively simple tools:

  

  

  

The Baidu robots tool can only detect whether each line command conforms to grammatical rules, but does not detect actual effects and crawl logic rules.

Google's tools for robots are relatively useful, as shown in the figure:

  

The name in Google Webmaster Tools is the ability to crawl tools and report how many URLs were intercepted when Google crawled the site page.

  

You can also test the effect of the modified robots online, of course, the changes here are only testing, if there is no problem, you can generate robots.txt files, or the command code copied to the Robots.txt text document, uploaded to the site root directory.

  

Google's test is a big difference from Baidu, which allows you to enter one or more Web sites to test whether Google spiders crawl these URLs.

  

The test result is that these URLs are crawled by Google Spiders, and this test is valid for the rules of a certain URL for a robots file.

and two tools are better to combine, so you should know exactly what the robots should be.

Reprint please specify from the carefree blog, this article address: http://liboseo.com/1170.html

Unless noted, carefree blog articles are original, reproduced please specify the source and link!



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.