A tutorial for blocking specific user agents in Nginx _nginx

Source: Internet
Author: User
Tags http request nginx server

The modern internet has spawned a vast array of malicious robots and web crawlers, such as malware bots, spam programs, or content scrapers, which have been surreptitiously scanning your site, doing things like detecting potential web sites, harvesting e-mail addresses, or simply stealing content from your site. Most robots can be identified by their "User agent" signature string.

As a first line of defense, you can try to prevent these malware bots from accessing your site by adding their user agent string to the robots.txt file. Unfortunately, however, the operation is only for those "behaving well" robots that are designed to conform to robots.txt specifications. Many malware bots can easily ignore robots.txt and then scan your site randomly.

Another way to block a particular robot is to configure your network server to deny requests for content through a specific user agent string. This article is a description of how to block a specific user agent on a nginx network server.

Blacklist specific user agents in Nginx

To configure the user agent blocking list, open the Nginx configuration file for your site and locate the Server Definition section. The file may be placed in a different place depending on your nginx configuration or Linux version (e.g.,/etc/nginx/nginx.conf,/etc/nginx/sites-enabled/<your-site>,/usr/ local/nginx/conf/nginx.conf,/etc/nginx/conf.d/<your-site>).

Copy Code code as follows:
server {
Listen default_server;
server_name xmodulo.com;
root/usr/share/nginx/html;
....
}

After you open the profile and locate the Server section, add the following if declaration to somewhere within that section.

Copy Code code as follows:
server {
Listen default_server;
server_name xmodulo.com;
root/usr/share/nginx/html;
# Case Sensitive Matching
if ($http _user_agent ~ (antivirx| Arian) {
return 403;
}

#大小写无关的匹配
Copy Code code as follows:
if ($http _user_agent ~* (netcrawl|npbot|malicious)) {
return 403;
}
....
}

As you would expect, these if declarations use regular expressions to match any bad user string and return the 403 HTTP status code to the matching object. $http _user_agent is a variable in the HTTP request that contains a user agent string. The ' ~ ' operator matches the case sensitivity of the user agent string, while the ' ~* ' operator does case-insensitive matching. | The operator is logical OR, so you can put a lot of user agent keywords in the If declaration and then block them all out.

After modifying the configuration file, you must reload the nginx to activate the blocking:

 $ sudo/path/to/nginx-s Reload

You can block by using the wget test User agent with the "--user-agent" option.

 $ wget--user-agent "malicious bot" http://<nginx-ip-address>

Manage the user agent blacklist in Nginx

So far, I've shown how to block some of the user agent HTTP requests in Nginx. What if you have many different types of web crawler robots to block?

Because the user agent blacklist will grow very large, it is not a good idea to put them in the Nginx Server section. Instead, you can create a separate file that lists all of the blocked user agents. For example, let's create a/etc/nginx/useragent.rules and define an atlas of all blocked user agents in the following format.

  $ sudo vi/etc/nginx/useragent.rules

Copy Code code as follows:
Map $http _user_agent $badagent {
Default 0;
~*malicious 1;
~*backdoor 1;
~*netcrawler 1;
~antivirx 1;
~arian 1;
~webbandit 1;
}

Similar to previous configurations, ' ~* ' matches keywords in a case-insensitive manner, while ' ~ ' will match the keyword with a case sensitive regular expression. The "Default 0" line means that a user agent that is not listed in any other file will be allowed.

Next, open the Nginx configuration file for your site, find the part that contains HTTP, and add the following line to the HTTP section.

Copy Code code as follows:
HTTP {
.....
Include/etc/nginx/useragent.rules
}

Note that the include declaration must appear before the server section (which is why we added it to the HTTP section).

Now, open the Nginx configuration definition section of your server and add the following if statement:

Copy Code code as follows:
server {
....
if ($badagent) {
return 403;
}
....
}

Finally, reload the nginx.

 $ sudo/path/to/nginx-s Reload

Any user agents that contain the keywords listed in/etc/nginx/useragent.rules will now be automatically banned by Nginx.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.