Robots protocol and forbidden search engine Indexing

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Supplement the prohibition of search engines,

Is the robots.txt file?

A search engineProgramRobot (also known as Spider) automatically accesses webpages on the Internet and obtains webpage information.

You can create a pure robot file robots.txt on your website, which declares that the website does not want to be accessed by the robot. In this way, some or all of the content of the website will not be indexed by the search engine, or the specified search engine only contains the specified content.

2. Where can I store the robots.txt file?

The robots.txt file should be placed in the root directory of the website. For example, when robots accesses a website (suchHttp://www.abc.com), First check whether the website existsHttp://www.abc.com/robots.txtIf the robot finds this file, it will determine its access permission range based on the file content.

The URL of robots.txt corresponding to the website URL
Http://www.w3.org/ Http://www.w3.org/robots.txt
Http://www.w3.org: 80/ Http://www.w3.org: 80/robots.txt.
Http://www.w3.org: 1234/ Http://www.w3.org: 1234/robots.txt
Http://w3.org/ Http://w3.org/robots.txt

Iii. robots.txt File Format

The “robots.txt file contains one or more records separated by empty rows (with Cr, CR/NL, or NL as the terminator). The format of each record is as follows:

";:;".

In this file, you can use # For annotation. The usage is the same as that in UNIX. The record in this file usually starts with one or more lines of User-Agent, followed by several disallow lines. The details are as follows:

User-Agent:
The value of this item is used by the worker to search for the name of the engine robot. in the "Robot robots.txt" file, if Multiple User-Agent records indicate that multiple robots are restricted by this Protocol, at least one User-Agent record is required for this file. If the value of this item is set to *, the Agreement applies to all robots. In the robot robots.txt file, there can be only one record such as "User-Agent.
Disallow:
The value of this item is used to describe a URL that you do not want to access. This URL can be a complete or partial path, no URLs starting with disallow will be accessed by the robot. For example, "disallow:/help" does not allow access by search engines to/help.html and/help/index.html, while "disallow:/help/" allows robot access to/help.html, but cannot access/help/index.html.
If any disallow record is blank, it means that all parts of the website are allowed to be accessed. At least one disallow record is required in the "/robots.txt" file. If "/robots.txt" is an empty file, the website is open to all search engine robots.

Iv. Examples of robots.txt File Usage

Example 1. prohibit all search engines from accessing any part of the website

Download the robots.txt file User-Agent :*
Disallow :/

Example 2. Allow all robots to access

(Or you can create an empty file "/robots.txt" file)

User-Agent :*
Disallow:

Example 3. Disable access to a search engine
User-Agent: badbot
Disallow :/

Example 4: Allow a search engine to access User-Agent: baiduspider
Disallow:

User-Agent :*
Disallow :/

Example 5: A simple example

In this example, the website has three directories that restrict access to the search engine, that is, the search engine does not access these three directories.
Note that each directory must be declared separately, rather than "disallow:/cgi-bin/tmp /".
User-Agent: * after a special meaning, represents "any robot", so the file cannot contain "disallow:/tmp/*" or "disallow :*. GIF.
User-Agent :*
Disallow:/cgi-bin/
Disallow:/tmp/
Disallow :/~ JOE/

Bytes -------------------------------------------------------------------------------------------------------------------------------

Example of robots.txt File Usage

Example 1. prohibit all search engines from accessing any part of the website download the robots.txt file	User-Agent: * disallow:/
Example 2. allow all robots to access (or you can create an empty file "/robots.txt" file)	User-Agent: * disallow:
example 3. Disable access to a search engine	User-Agent: badbot disallow:/
Example 4. allow access to a search engine	User-Agent: baiduspider disallow: User-Agent: * disallow: /
Example 5. A simple example in this example, the website has three directories that limit the access to the search engine, that is, the search engine does not access these three directories. note that each directory must be declared separately, rather than "disallow:/cgi-bin/tmp /". User-Agent: the "" after "has a special meaning, which indicates" any robot ". Therefore, the file cannot contain" disallow:/tmp/ "or" disallow: *. GIF.	User-Agent: * disallow:/cgi-bin/ disallow:/tmp/ disallow :/~ JOE/

what is the name of a hundred-degree spider in robots.txt?
"baiduspider" is a lowercase letter.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Robots protocol and forbidden search engine Indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Robots protocol and forbidden search engine Indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support