Supplement the prohibition of search engines,
Is the robots.txt file?
A search engineProgramRobot (also known as Spider) automatically accesses webpages on the Internet and obtains webpage information.
You can create a pure robot file robots.txt on your website, which declares that the website does not want to be accessed by the robot. In this way, some or all of the content of the website will not be indexed by the search engine, or the specified search engine only contains the specified content.
2. Where can I store the robots.txt file?
The robots.txt file should be placed in the root directory of the website. For example, when robots accesses a website (suchHttp://www.abc.com), First check whether the website existsHttp://www.abc.com/robots.txtIf the robot finds this file, it will determine its access permission range based on the file content.
The URL of robots.txt corresponding to the website URL
Http://www.w3.org/ Http://www.w3.org/robots.txt
Http://www.w3.org: 80/ Http://www.w3.org: 80/robots.txt.
Http://www.w3.org: 1234/ Http://www.w3.org: 1234/robots.txt
Http://w3.org/ Http://w3.org/robots.txt
Iii. robots.txt File Format
The “robots.txt file contains one or more records separated by empty rows (with Cr, CR/NL, or NL as the terminator). The format of each record is as follows:
";:;".
In this file, you can use # For annotation. The usage is the same as that in UNIX. The record in this file usually starts with one or more lines of User-Agent, followed by several disallow lines. The details are as follows:
User-Agent:
The value of this item is used by the worker to search for the name of the engine robot. in the "Robot robots.txt" file, if Multiple User-Agent records indicate that multiple robots are restricted by this Protocol, at least one User-Agent record is required for this file. If the value of this item is set to *, the Agreement applies to all robots. In the robot robots.txt file, there can be only one record such as "User-Agent.
Disallow:
The value of this item is used to describe a URL that you do not want to access. This URL can be a complete or partial path, no URLs starting with disallow will be accessed by the robot. For example, "disallow:/help" does not allow access by search engines to/help.html and/help/index.html, while "disallow:/help/" allows robot access to/help.html, but cannot access/help/index.html.
If any disallow record is blank, it means that all parts of the website are allowed to be accessed. At least one disallow record is required in the "/robots.txt" file. If "/robots.txt" is an empty file, the website is open to all search engine robots.
Iv. Examples of robots.txt File Usage
Example 1. prohibit all search engines from accessing any part of the website
Download the robots.txt file User-Agent :*
Disallow :/
Example 2. Allow all robots to access
(Or you can create an empty file "/robots.txt" file)
User-Agent :*
Disallow:
Example 3. Disable access to a search engine
User-Agent: badbot
Disallow :/
Example 4: Allow a search engine to access User-Agent: baiduspider
Disallow:
User-Agent :*
Disallow :/
Example 5: A simple example
In this example, the website has three directories that restrict access to the search engine, that is, the search engine does not access these three directories.
Note that each directory must be declared separately, rather than "disallow:/cgi-bin/tmp /".
User-Agent: * after a special meaning, represents "any robot", so the file cannot contain "disallow:/tmp/*" or "disallow :*. GIF.
User-Agent :*
Disallow:/cgi-bin/
Disallow:/tmp/
Disallow :/~ JOE/
Bytes -------------------------------------------------------------------------------------------------------------------------------
Example of robots.txt File Usage
- Example 1. prohibit all search engines from accessing any part of the website
download the robots.txt file
|
User-Agent: * disallow:/ |
Example 2. allow all robots to access (or you can create an empty file "/robots.txt" file) |
User-Agent: * disallow: |
example 3. Disable access to a search engine |
User-Agent: badbot disallow:/ |
Example 4. allow access to a search engine |
User-Agent: baiduspider disallow: User-Agent: * disallow: / |
Example 5. A simple example in this example, the website has three directories that limit the access to the search engine, that is, the search engine does not access these three directories. note that each directory must be declared separately, rather than "disallow:/cgi-bin/tmp /". User-Agent: the "*" after "has a special meaning, which indicates" any robot ". Therefore, the file cannot contain" disallow:/tmp/* "or" disallow: *. GIF. |
User-Agent: * disallow:/cgi-bin/ disallow:/tmp/ disallow :/~ JOE/ |
what is the name of a hundred-degree spider in robots.txt?
"baiduspider" is a lowercase letter.