Search Engine Guide robots.txt files

Source: Internet
Author: User
Tags access
Search engine

There is a power in fact has been infiltration of a large number of sites and pages, we usually do not see them, and they are often very outrageous, most people do not even know it exists, we do not misunderstand, in fact, I say is search engine crawler and robot. Every day, hundreds of such reptiles come out to search the site quickly. Whether or not Google intends to index the entire network, or if the spam robot is going to collect a large number of email addresses, they are often looking for aimless purposes. As a website owner, we can control which actions robots can do through a file called robots.txt.

Creating robots.txt Files

Okay, now let's get to the action. To create a text file called robots.txt, make sure that it has the correct file name. The file must be uploaded to the root of your site, not the level two directory (for example, it should be http://www.mysite.com, not http://www.mysite.com/stuff), only meet the above two points, that is, the filename is correct and the path is correct, The search engine will work according to the rules of the file, otherwise robots.txt is just a regular file, it has no effect.

Now that you know how to name the file and where it should be uploaded, then you'll learn to type the command in this file, and the search engine will follow a protocol called Robot exclusion Protocol. In fact, its format is very simple, and can meet most of the control needs. The first is a line of useragent used to identify the reptile type, followed by one or more rows of disallow that restrict the crawler's access to some parts of the site.

1) robots.txt Basic settings

User-agent: *

Disallow:/According to the above statement, all reptiles (shown here with *) are not allowed to index any part of your site, here/represent all pages. Normally this is not what we need, but here is just a concept for everyone.

2 Now let's make some minor changes. Although every webmaster likes Google, you probably don't want Google's Mirror robot to dig up your site, and you don't want it to put a mirror of your site on the Web, to achieve online search, and if it's just to save the bandwidth of the server on which your site is located, the following statement can do it

User-agent:googlebot-image

Disallow:/3 The following code does not allow any search engine and robot to mine directory and page information

User-agent: *

Disallow:/cgi-bin/

Disallow:/privatedir/

Disallow:/TUTORIALS/BLANK.HTM4) You can also set different targets for multiple robots, and look at the following code

User-agent: *

Disallow:/

User-agent:googlebot

Disallow:/cgi-bin/

Disallow:/privatedir/This setting is very interesting, here we prohibit all search engines for our site mining operations, in addition to Google, where Google is allowed to access in addition to/cgi-bin/and/privatedir/all sites. This shows that rules can be customized, but not inherited.

3 There is another way to use disallow: that is, to allow access to all content of the site, in fact, as long as the colon does not enter anything on the

User-agent: *

Disallow:/

User-agent:ia_archiver

Disallow: Here, all reptiles except Alex are not allowed to search our site.

4 Finally, some reptiles now support the Allow rule, the most famous is Google. As the name of this rule says, "Allow:" allows you to accurately control which files or folders are accessible. However, this document is not currently part of the robots.txt protocol, so I recommend that it be used only when it must be used, because some of the less intelligent reptiles may think it is wrong.

The following is from Google's FAQs for webmasters, if you want to not dig your site except Google, the following code is a good choice

User-agent: *

Disallow:/

User-agent:googlebot

Allow:/

Original: http://javascriptkit.com/howto/robots.shtml Translator: Tony qu,blueprint translation Team



Related Article

Alibaba Cloud 10 Year Anniversary

With You, We are Shaping a Digital World, 2009-2019

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.