Nginx Shielding individual user-agent spider access to the site method

Source: Internet
Author: User

For me to do the domestic station, I do not want foreign spiders to visit my website, especially the individual garbage spiders, they visit particularly frequently. After much of this garbage flow, the server's bandwidth and resources are wasted heavily. By judging the user agent, disabling these spiders in Nginx can save some traffic and can prevent some malicious access.

Steps

1, enter the Nginx configuration directory, such as cd/usr/local/nginx/conf

2. Add agent_deny.conf configuration file

#禁止Scrapy等工具的抓取if ($http _user_agent ~* (scrapy|curl|HttpClient)) { Return403;}  #禁止指定UA及UA为空的访问 if ( $http _user_agent ~  "feeddemon| jikespider| Indy library| Alexa toolbar| asktbfxtv| Ahrefsbot| crawldaddy| Coolpadwebkit| java| feedly| universalfeedparser| apachebench| Microsoft URL control| Swiftbot| zmeu|obot|jaunty| Python-urllib|lightdeckreports bot| yyspider| Digext| yisouspider| Httpclient| Mj12bot|heritrix| easouspider| Linkpadbot| ezooms|^$ ") { return 403;}  #禁止非GET | head| Post mode crawl if ( $request _method!~ ^ (get|head| post) $) { return 403;}       

3. Insert the code "include agent_deny.conf;" Into the relevant configuration file of the website.

Location ~ [^/]\.php (/|  $) {  $uri =404;  Unix:/tmp/php-cgi.sock; Fastcgi_index index.php; include fastcgi.conf; include agent_deny.conf;}        

4. Reload Nginx

/etc/init.d/nginx Reload

Test

Simulate spider crawl access via curl.

[Email protected]:~# curl-i-A "Baiduspider" www.sijitao.netHTTP/1.1 OKServer:nginxDate:Mon, Geneva 03:37:20 G mtcontent-type:text/html; Charset=utf-8connection:keep-alivevary:accept-encodingx-powered-by:php/5.5.19vary:accept-encoding, Cookiecache-control:Max-age=3, must-revalidatewp-super-cache:served Supercache fileFrom Php[email protected]199:~# Curl-i-A"Jikespider" Www.sijitao.net http/1.1 403 Forbiddenserver:nginxDate:mon,   Geneva: $gmtcontent-type:text/ Htmlcontent-length: 162Connection:keep-alive[email protected]199:~# curl-i-A "" Www.sijitao.net http/1.1 403 Forbiddenserver:nginxDate:mon,   Geneva: $52 Gmtcontent-type:text/htmlcontent-length: 162connection:keep-alive      

The effect on the Nginx log is as follows.

Here, Nginx by judging user-agent screen Spider visited the site has been completed, can be based on the actual situation of the spider in the agent_deny.conf to add, delete or modify.

      • This article is from: Linux Learning Network

Nginx Shielding individual user-agent spider access to the site method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.