No crawler in Nginx

Last Update:2015-06-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Simulation Crawl:

Curl-i- A ' Baiduspider ' hello.net

The resulting effect:

http/1.1 OK
Server:nginx

date:wed, 07:26:48 GM

The above instructions allow crawlers

If it's http/1.1 403 Forbidden

----------------------------------------------------------------------------------------

Method 1,

Writing in the server segment

multiple HTTP User Agent Pipelines |

server {

if ($http _user_agent ~* "qihoobot| baiduspider| Googlebot ")

{
return 403;

}

Refuse to httpuseragent in wget way, add the following content
# # Block HTTP User Agent-wget # #
if ($http _user_agent ~* (Wget)) {
return 403;
}
# # Block software Download user Agents # #
if ($http _user_agent ~* lwp::simple| Bbbike|wget) {
return 403;
}

Method 2

Use the robots.txt file: for example, to prevent crawling of all crawlers, but this effect is not very obvious

User-agent: *
Disallow:/

Method 3. Separate separation

Enter the Conf directory under the Nginx installation directory and save the following code as agent_deny.conf
Cd/usr/local/nginx/conf
Vim agent_deny.conf

#禁止Scrapy等工具的抓取
if ($http _user_agent ~* (scrapy| curl| HttpClient)) {
return 403;
}

#禁止指定UA及UA为空的访问
if ($http _user_agent ~ "feeddemon| jikespider|^$ ") {
return 403;
}

#禁止非GET | head| Post-mode fetching
if ($request _method!~ ^ (get| head| POST) ($) {
return 403;
}
Then, insert the following code after location/{in the site-related configuration:
Include agent_deny.conf;

The last recommendation is to use method one

No crawler in Nginx

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

No crawler in Nginx

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support