True experience sharing: through. htaccess file shielding Bad spiders

Source: Internet
Author: User

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

A week ago, the author shared an article, "SEO Diagnosis: Find the Web site through log log knot", and finally attached two suggestions for improvement. Because the objective condition is limited, finally uses the method which the robots shield. First look at the spider changes after a week, the three major spiders total crawl volume sharply reduced, proving that the document began to enter into force. From the number of visits, total stay time and total crawl volume, there is progress but the road is still far.

  

However, starting from 11th, the site log log began to appear numerous Chinese non-mainstream spiders visit, including many well-known Russian search engine spider Yandexbot, there are unknown "flying objects" Ahrefsbot & Ezooms.bot. In accordance with the stereotypes of the shielding spider method, instinctively will all above spiders (Chinese website seo, the above spider is a garbage spider) through Robots.txt file disallow off. I thought this would be done, but this morning opened the log log of the last 3 days a look, garbage spiders crawl more frequent, ferocious, especially to Ezooms.bot.

Generally speaking, the search engine treats the Robots.txt file to have the effective period, about 2-7 days. But the rapid development of Ezooms.bot, I have to suspect that he is a robots protocol violators.

What to share today is how to pass the. htaccess file to screen bad spiders.

The following is the 14th Web site log, imported to Excel for screening analysis, found in all the day access logs (including real users and spiders), unexpectedly there are up to 342 access records. and especially Ahrefsbot & Ezooms.bot mostly.

(Note: This example input "bot" for screening; foreign spiders generally called bot, domestic generally called spider)

  

What is Ezooms.bot? So through Baidu to search related records, but is not ideal, Baidu has no relevant records. No way, can only turn to Google, full length is English, head big, bite teeth slowly chew it.

Read seven or eight of foreign blog articles about Ezooms.bot, the Bot also does not have a clear statement, some think is SEOmoz Bot, and some think it is a collection of articles, but the evaluation of it is very bad, is portrayed as such as vampires, leeches and so on. To intercept a foreign comment:

  

From the understanding of knowledge, Ezooms.bot crawling site without any help, so determined to shield it. Because Ezooms.bot does not comply with the Robots.txt protocol, it also reminds me of the method of shielding IP segment by htaccess file. Foreign bloggers have also mentioned this method many times,

  

The IP segment on the graph is basically consistent with the Ezooms.bot crawl IP segment of the website record, should be a method. But shielding the entire IP segment, may cause a certain amount of manslaughter (after all, can not confirm that all IP is Ezooms.bot). Is there a more secure way to go about it? or flip through the data and finally find a way to solve it by htaccess. The rules are as follows:

Rewriteengine on

Rewritecond%{http_user_agent} ^ezooms

Rewritecond%{http_user_agent} ^ezooms/1.0

Rewriterule ^ (. *) $ http://getlostbadbot/

Why write, online more about Apache Rewritecond rule parameters introduced, I am also a rookie is learning.

The above is the author of personal about how to screen bad spiders real experience, I hope to help. SEO is in constant learning progress. This article by the Weight Loss products List Www.shou68.net original feeds, welcome everyone reprint, reprint please keep this link, thank you for your cooperation!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.