Use the. htaccess file to shield unwanted spider

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A week ago, I shared an article "Seo diagnosis: finding a website to die through log" and attached two suggestions for improvement. Due to the limitation of objective conditions, the robots shielding method is used at last. First, let's take a look at the spider changes a week later. The total volume of crawlers from the three major spider crawlers decreases sharply, proving that the robots file has taken effect. From the view of the number of visits, total stay time, and total captures on the graph, the progress is still far away.

However, since November 11, many non-mainstream Chinese spider accesses began to appear in the website log, including yandexbot, a well-known Russian search engine, and ahrefsbot, an unidentified "Flying Object ".
& Ezooms. Bot. Disallow. I thought this was done, but this morning I opened the log of the last three days and saw that spider crawlers were more frequent and fierce, especially ezooms. Bot.

In general, the search engine has a validity period for the robots.txt file, which is about 2-7 days. However, due to the rapid development of ezooms. Bot, I have to doubt that it is a robots protocol violation.

Today, we are going to share with you how to shield the spider from the. htaccess file.

The following are the website logs on the 14th, which are imported to excel for filtering and analysis. It is found that there are as many as 342 access records in all access logs (including real users and spider) on the current day. Most of them are ahrefsbot & ezooms. Bot.

(Note: In this example, input "BOT" for screening. Foreign spider is generally called Bot, and domestic spider is generally called)

What is ezooms. bot? Therefore, Baidu searches for related records, but it is not ideal. Baidu does not have any related records. No way. You can only turn to Google for help. The full space is in English. If your head is too big, chew your teeth slowly.

I read seven or eight articles about ezooms abroad. BOT blog articles do not have a clear saying about the bot. Some think it is seomoz bot, some think it is an article collector, but everyone does not evaluate it very well, it is depicted as a vampire or a shuffles. Take a foreign comment:

According to the knowledge, ezooms. Bot has no help for crawling websites and is determined to block it. Since ezooms.botdoes not comply with the robots.txt protocol, it reminds me of the method of shielding IP segments through the htaccess file. This method is also mentioned many times in blog posts outside China,

The IP segment shown in the figure is basically consistent with the ezooms. Bot crawling IP segment recorded on the website. It should be a method. However, blocking the entire IP segment may lead to some false positives (after all, you cannot confirm that all
All IP addresses are ezooms. Bot ). Is there a safer way? Finally, I found a solution through. htaccess. The rules are as follows:

Rewriteengine on

Rewritecond % {http_user_agent} ^ ezooms

Rewritecond % {http_user_agent} ^ ezooms/1.0

Rewriterule ^ (. *) $ http: // getlostbadbot/

For more information about rewritecond rule parameters in Apache, I am also a newbie.

From: http://www.seowhy.com/bbs/forum.php? MoD = viewthread & tid = 2945114.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use the. htaccess file to shield unwanted spider

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use the. htaccess file to shield unwanted spider

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support