International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Common misunderstanding of the rules of robots and the use of Google Baidu tools

Last Update:2017-02-28 Source: Internet

Author: User

Tags root directory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For the robots.txt file for the role of the site everyone knows, but through observation found that some friends of the robots.txt file rules still have a certain misunderstanding.

For example, a lot of people write this:

User-agent: *

Allow:/

Disallow:/mulu/

I do not know if you have seen, this rule is actually not working, the first sentence allow:/refers to allow spiders to crawl all content, the second sentence disallow:/mulu/refers to the prohibition/mulu/all content below.

On the surface, this rule is intended to be achieved by allowing spiders to crawl all pages except/mulu/sites. But the search engine spider executes the rules from top to bottom, which can cause the second command to fail.

The correct rule should be:

User-agent: *

Disallow:/mulu/

Allow:/

That is, to execute the Prohibition command first, then execute the Allow command, so that it will not fail. In addition to Baidu Spider, there is an easy to make mistakes, that is disallow command and allow command after the slash/start, so some people write: Disallow: *.html This is wrong for Baidu Spider, should be written: disallow: *. Html.

Sometimes we write these rules may have some not noticed the problem, now can through Baidu Webmaster Tools (zhanzhang.baidu.com) and Google Webmaster tools to test. Relatively speaking, Baidu Webmaster tools are relatively simple tools:

The Baidu robots tool can only detect whether each line command conforms to grammatical rules, but does not detect actual effects and crawl logic rules.

Google's tools for robots are relatively useful, as shown in the figure:

The name in Google Webmaster Tools is the ability to crawl tools and report how many URLs were intercepted when Google crawled the site page.

You can also test the effect of the modified robots online, of course, the changes here are only testing, if there is no problem, you can generate robots.txt files, or the command code copied to the Robots.txt text document, uploaded to the site root directory.

Google's test is a big difference from Baidu, which allows you to enter one or more Web sites to test whether Google spiders crawl these URLs.

The test result is that these URLs are crawled by Google Spiders, and this test is valid for the rules of a certain URL for a robots file. and two tools are better to combine, so you should know exactly what the robots should be.

Reproduced please note from the Carefree blog, this article address: http://liboseo.com/1170.html, reproduced Please specify the source and link!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

use of and while least common multiple of 11 and 13 arithmetic rules of precedence 10 rules of gmp list of validation rules rules of math functions 10 rules of netiquette

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Common misunderstanding of the rules of robots and the use of Google Baidu tools

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support