In fact, a lot of people have just started to engage in Web site construction work , do not know what is robots.txt, even if you know what the robots.txt file format is, Today, I would like to share with you, this article from the e-mentor network .
The "robots.txt" file contains one or more records that are separated by a blank line (with cr,cr/nl, or NL as The Terminator), and the format of each record is as follows:
"<field>:<optional space><value><optionalspace>"
You can use # for annotations in this file, using the same methods as in UNIX . The records in this file typically start with one or more lines of user-agent , followed by several Disallow and allow lines , detailed information is as follows:
User-agent:
The value of the item is used to describe the search engineRobot's name. In the"Robots.txt"file, if there is more than oneuser-agentThe record description has multipleRobotwill suffer"Robots.txt"of the document, there must be at least oneuser-agentRecords. If the value of the item is set to*, then for anyRobotare valid, in"Robots.txt"file,"user-agent:*"there can only be one record of this. If the"Robots.txt"file, add"User-agent:somebot"and severalDisallow, AllowOK, then the name is"Somebot"only by"User-agent:somebot"behind theDisallowand the Allowrestrictions on the line.
Disallow:
The value of the item is used to describe a set of values that you do not want to be accessedURL, this value can be a complete path, or it can be a non-unprecedented prefix of the path toDisallowThe value of the item begins with theURLwill not beRobotaccess. For example"Disallow:/help"prohibitedRobotAccess/help.html,/helpabc.html,/help/index.html, while"disallow:/help/"is allowedRobotAccess/help.html,/helpabc.html, you cannot access/help/index.html. "Disallow:"description allowsRobotaccess to all of the sitesURL, in"/robots.txt"file, you must have at least oneDisallowRecords. If"/robots.txt"does not exist or is an empty file, then for all search enginesRobot, the site is open.
Allow:
The value of the item is used to describe a set ofURL, withDisallowThe value can be either a full path or a path prefix to AllowThe value of the item begins with theURLis to allowRobotaccess to. For example"Allow:/hibaidu"AllowRobotAccess/hibaidu.htm,/hibaiducom.html,/hibaidu/com.html. All of a websiteURLdefault is Allowof, so Allowusually withDisallowuse it to allow access to a subset of web pages while preventing access to all otherURLthe function.
Use "*" and "$":
Baiduspider supports using wildcard characters "*" and "$" to blur the matching URLs.
"$" matches the line terminator.
"*" matches 0 or more arbitrary characters.
Note: We will strictly abide by the relevant robots agreement, please note that you do not want to be crawled or included in the case of the directory, we will be written in robots and you do not want to be crawled and included in the directory to do a precise match, Otherwise the robots agreement will not take effect.
is not very complex, if you have a certain code base may still be able to understand, if not, you can only read a few times.