Format of the robots.txt file in the Web site

Source: Internet
Author: User

In fact, a lot of people have just started to engage in Web site construction work , do not know what is robots.txt, even if you know what the robots.txt file format is, Today, I would like to share with you, this article from the e-mentor network .

The "robots.txt" file contains one or more records that are separated by a blank line (with cr,cr/nl, or NL as The Terminator), and the format of each record is as follows:

"<field>:<optional space><value><optionalspace>"

You can use # for annotations in this file, using the same methods as in UNIX . The records in this file typically start with one or more lines of user-agent , followed by several Disallow and allow lines , detailed information is as follows:

User-agent:

The value of the item is used to describe the search engineRobot's name. In the"Robots.txt"file, if there is more than oneuser-agentThe record description has multipleRobotwill suffer"Robots.txt"of the document, there must be at least oneuser-agentRecords. If the value of the item is set to*, then for anyRobotare valid, in"Robots.txt"file,"user-agent:*"there can only be one record of this. If the"Robots.txt"file, add"User-agent:somebot"and severalDisallow, AllowOK, then the name is"Somebot"only by"User-agent:somebot"behind theDisallowand the Allowrestrictions on the line.

Disallow:

The value of the item is used to describe a set of values that you do not want to be accessedURL, this value can be a complete path, or it can be a non-unprecedented prefix of the path toDisallowThe value of the item begins with theURLwill not beRobotaccess. For example"Disallow:/help"prohibitedRobotAccess/help.html,/helpabc.html,/help/index.html, while"disallow:/help/"is allowedRobotAccess/help.html,/helpabc.html, you cannot access/help/index.html. "Disallow:"description allowsRobotaccess to all of the sitesURL, in"/robots.txt"file, you must have at least oneDisallowRecords. If"/robots.txt"does not exist or is an empty file, then for all search enginesRobot, the site is open.

Allow:

The value of the item is used to describe a set ofURL, withDisallowThe value can be either a full path or a path prefix to AllowThe value of the item begins with theURLis to allowRobotaccess to. For example"Allow:/hibaidu"AllowRobotAccess/hibaidu.htm,/hibaiducom.html,/hibaidu/com.html. All of a websiteURLdefault is Allowof, so Allowusually withDisallowuse it to allow access to a subset of web pages while preventing access to all otherURLthe function.

Use "*" and "$":

Baiduspider supports using wildcard characters "*" and "$" to blur the matching URLs.

"$" matches the line terminator.

"*" matches 0 or more arbitrary characters.

Note: We will strictly abide by the relevant robots agreement, please note that you do not want to be crawled or included in the case of the directory, we will be written in robots and you do not want to be crawled and included in the directory to do a precise match, Otherwise the robots agreement will not take effect.

is not very complex, if you have a certain code base may still be able to understand, if not, you can only read a few times.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.