A week ago, I shared an article "Seo diagnosis: finding a website to die through log" and attached two suggestions for improvement. Due to the limitation of objective conditions, the robots shielding method is used at last. First, let's take a look at the spider changes a week later. The total volume of crawlers from the three major spider crawlers decreases sharply, proving that the robots file has taken effect. From the view of the number of visits, total stay time, and total captures on the graph, the progress is still far away.
However, since November 11, many non-mainstream Chinese spider accesses began to appear in the website log, including yandexbot, a well-known Russian search engine, and ahrefsbot, an unidentified "Flying Object ". & Ezooms. Bot. Disallow. I thought this was done, but this morning I opened the log of the last three days and saw that spider crawlers were more frequent and fierce, especially ezooms. Bot.
In general, the search engine has a validity period for the robots.txt file, which is about 2-7 days. However, due to the rapid development of ezooms. Bot, I have to doubt that it is a robots protocol violation.
Today, we are going to share with you how to shield the spider from the. htaccess file.
The following are the website logs on the 14th, which are imported to excel for filtering and analysis. It is found that there are as many as 342 access records in all access logs (including real users and spider) on the current day. Most of them are ahrefsbot & ezooms. Bot.
(Note: In this example, input "BOT" for screening. Foreign spider is generally called Bot, and domestic spider is generally called)
What is ezooms. bot? Therefore, Baidu searches for related records, but it is not ideal. Baidu does not have any related records. No way. You can only turn to Google for help. The full space is in English. If your head is too big, chew your teeth slowly.
I read seven or eight articles about ezooms abroad. BOT blog articles do not have a clear saying about the bot. Some think it is seomoz bot, some think it is an article collector, but everyone does not evaluate it very well, it is depicted as a vampire or a shuffles. Take a foreign comment:
According to the knowledge, ezooms. Bot has no help for crawling websites and is determined to block it. Since ezooms.botdoes not comply with the robots.txt protocol, it reminds me of the method of shielding IP segments through the htaccess file. This method is also mentioned many times in blog posts outside China,
The IP segment shown in the figure is basically consistent with the ezooms. Bot crawling IP segment recorded on the website. It should be a method. However, blocking the entire IP segment may lead to some false positives (after all, you cannot confirm that all All IP addresses are ezooms. Bot ). Is there a safer way? Finally, I found a solution through. htaccess. The rules are as follows:
Rewriteengine on
Rewritecond % {http_user_agent} ^ ezooms
Rewritecond % {http_user_agent} ^ ezooms/1.0
Rewriterule ^ (. *) $ http: // getlostbadbot/
For more information about rewritecond rule parameters in Apache, I am also a newbie. From: http://www.seowhy.com/bbs/forum.php? MoD = viewthread & tid = 2945114. |