For me to do the domestic station, I do not want foreign spiders to visit my website, especially the individual garbage spiders, they visit particularly frequently. After much of this garbage flow, the server's bandwidth and resources are wasted heavily. By judging the user agent, disabling these spiders in Nginx can save some traffic and can prevent some malicious access.
Steps
1, enter the Nginx configuration directory, such as cd/usr/local/nginx/conf
2. Add agent_deny.conf configuration file
#禁止Scrapy等工具的抓取if ($http _user_agent ~* (scrapy|curl|HttpClient)) { Return403;} #禁止指定UA及UA为空的访问 if ( $http _user_agent ~ "feeddemon| jikespider| Indy library| Alexa toolbar| asktbfxtv| Ahrefsbot| crawldaddy| Coolpadwebkit| java| feedly| universalfeedparser| apachebench| Microsoft URL control| Swiftbot| zmeu|obot|jaunty| Python-urllib|lightdeckreports bot| yyspider| Digext| yisouspider| Httpclient| Mj12bot|heritrix| easouspider| Linkpadbot| ezooms|^$ ") { return 403;} #禁止非GET | head| Post mode crawl if ( $request _method!~ ^ (get|head| post) $) { return 403;}
3. Insert the code "include agent_deny.conf;" Into the relevant configuration file of the website.
Location ~ [^/]\.php (/| $) { $uri =404; Unix:/tmp/php-cgi.sock; Fastcgi_index index.php; include fastcgi.conf; include agent_deny.conf;}
4. Reload Nginx
/etc/init.d/nginx Reload
Test
Simulate spider crawl access via curl.
[Email protected]:~# curl-i-A "Baiduspider" www.sijitao.netHTTP/1.1 OKServer:nginxDate:Mon, Geneva 03:37:20 G mtcontent-type:text/html; Charset=utf-8connection:keep-alivevary:accept-encodingx-powered-by:php/5.5.19vary:accept-encoding, Cookiecache-control:Max-age=3, must-revalidatewp-super-cache:served Supercache fileFrom Php[email protected]199:~# Curl-i-A"Jikespider" Www.sijitao.net http/1.1 403 Forbiddenserver:nginxDate:mon, Geneva: $gmtcontent-type:text/ Htmlcontent-length: 162Connection:keep-alive[email protected]199:~# curl-i-A "" Www.sijitao.net http/1.1 403 Forbiddenserver:nginxDate:mon, Geneva: $52 Gmtcontent-type:text/htmlcontent-length: 162connection:keep-alive
The effect on the Nginx log is as follows.
Here, Nginx by judging user-agent screen Spider visited the site has been completed, can be based on the actual situation of the spider in the agent_deny.conf to add, delete or modify.
Nginx Shielding individual user-agent spider access to the site method