Where can I write robots.txt?

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to robots.txt

Example: http://www.baidu.com/robots.txt
Robots.txt is a plain text file in which the website administrator can declare that the website does not want to be accessed by robots, or specify a search engine to include only specified content.

When a search robot (called a search spider) crawls a site, it first checks that the site root directory contains robots.txt. If so, the search robot determines the access range based on the content in the file. If the file does not exist, the search robot crawls the link.

In addition, robots.txt must be placed in the root directory of a site, and all file names must be in lowercase.

Robots.txt writing syntax

First, let's look at a robots.txt example: http://www.seovip.cn/robots.txt

The specific content of robots.txt is as follows:

# Robots.txt file from http://www.seovip.cn
# All robots will spider the domain

User-Agent :*
Disallow:

The above text indicates that all search robots are allowed to access all files under www.seovip.cn.

Specific syntax analysis: The # text is the description information, the User-Agent is the name of the search robot, and the * text is the name of all search robots. disallow: the following is the file directory that cannot be accessed.

Next, let me list the specific usage of robots.txt:

Allow access by all robots

User-Agent :*
Disallow:

Alternatively, you can create an empty file "/robots.txt" file.

Prohibit all search engines from accessing any part of the website

User-Agent :*
Disallow :/

Prohibit all search engines from accessing the website (in the following example, the 01, 02, and 03 Directories)

User-Agent :*
Disallow:/01/
Disallow:/02/
Disallow:/03/

Prohibit Access to a search engine (badbot in the following example)

User-Agent: badbot
Disallow :/

Only access to a search engine is allowed (The crawler in the following example)

User-Agent: Crawler
Disallow:

User-Agent :*
Disallow :/

In addition, I think it is necessary to expand the description to introduce robots meta:

The robots meta tag mainly targets specific pages. Like other meta tags (such as the language used, page description, and keywords), robots meta tags are also placed in the

Syntax of the robots meta tag:

The robots meta tag is case-insensitive. Name = "Robots" indicates all search engines. You can enter name = "baiduspider" for a specific search engine ". The content part has four Command Options: Index, noindex, follow, and nofollow. commands are separated by commas.

The Index Command tells the search robot to capture the page;

The follow command indicates that the search robot can continue crawling along the link on the page;

The default values of the robots meta tag are index and follow, except Inktomi. The default values are index and nofollow.

In this way, there are four combinations:

Where

<Meta name = "Robots" content = "index, follow"> <meta name = "Robots" content = "all">;

Currently, a huge number of search engine robots comply with the robots.txt rules. Currently, the robots meta tag does not support much, but is gradually increasing. For example, Google, a famous search engine, is fully supported, in addition, Google also adds the command "ARCHIVE" to limit whether Google retains web snapshots. For example:

It indicates that the page in the site is crawled along the link on the page, but the page snapshot is not retained on goolge.

# Html/XHTML/XML Column

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Where can I write robots.txt?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Where can I write robots.txt?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support