How to shield spider in the robots.txt File

Source: Internet
Author: User

 

The robots.txt file is a dialog between each website and a search engine spider following the robots protocol.Code.

Let's take a look at a column and let the search engine capture all the content. The Code is as follows:

 
User-Agent :*

Allow :/
Here, the User-Agent is followed by the name of the Spider. If all Spider comply with the name, you can use * to replace all Spider, if only for a specific spider,
You only need to name the spider. If you do not want the spider to crawl, you only need to change allow to disallow and prohibit crawling./The content that follows is
Content that is prohibited or allowed to be crawled.

Sometimes, when crawlers crawl too frequently, we need to add the crawl-delay code, which means to tell the spider how many seconds to wait before crawling. Let's look at the instance:

User-Agent :*
Crawl-delay: 500

 

The preceding content is the same. The difference is that crawl-delay can only be followed by numbers and can only be positive integers.

Common codes include User-Agent, disallow, allow, and crawl-delay.

 

The best practice is to try setting up the robots.txt file. Add the following content to the robots.txt file on the website:
Take Baidu Spider as an Example

User-Agent: baiduspider
Disallow :/

If this method cannot completely block Baidu crawlers, that is, if the SPIDER does not comply with the robots protocol, we need to completely block Baidu crawlers. some statements can be added to htaccess. The following describes two methods.

Method 1:

Rewriteengine on
Rewritecond % {http_user_agent} ^ baiduspider [Nc]
Rewriterule. *-[F]

Method 2:

Setenvifnocase User-Agent "^ baiduspider" bad_bot

Order allow, deny
Allow from all

 

Anhui Children's Network Co., http://www.ahyuer.com.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.