How to intercept a specific user proxy in Nginx
This article mainly introduces how to intercept specific user proxies in Nginx and sets a blacklist for these intercepted users for convenient management. For more information, see
The modern Internet breeds a large variety of malicious robots and web crawlers, such as malware bots, spam programs, or content spammers, which have been scanning your website secretly, do something like detecting potential website vulnerabilities, obtaining email addresses, or just stealing content from your website. Most robots can identify them through their "User proxy" signature strings.
To prevent malicious software robots from accessing your website, you can try to import the Robot User Agent to the robots.txt file. However, unfortunately, the operator is designed to comply with the specifications of robots.txt. Some malicious software robots can easily skip robots.txt and scan your website at will.
Another way to block a specific robot is to configure your network server to reject requests that require content through a specific user proxy string. This article describes how to block specific user proxies on the nginx network server.
Blacklist specific user proxies in Nginx
To configure the user proxy blocking list, open the nginx configuration file of your website and find the server definition section. This file may be stored in different places, depending on your nginx configuration or Linux version (for example,/etc/nginx. conf,/etc/nginx/sites-enabled/ ,/Usr/local/nginx/conf/nginx. conf,/etc/nginx/conf. d/ ).
The Code is as follows:
Server {
Listen 80 default_server;
Server_name xmodulo.com;
Root/usr/share/nginx/html;
....
}
After opening the configuration file and finding the server section, add the following if declaration to a certain part of the section.
The Code is as follows:
Server {
Listen 80 default_server;
Server_name xmodulo.com;
Root/usr/share/nginx/html;
# Case-sensitive matching
If ($ http_user_agent ~ (Antivirx | Arian ){
Return 403;
}
# Case-insensitive matching
Copy the Code as follows:
If ($ http_user_agent ~ * (Netcrawl | npbot | malicious )){
Return 403;
}
....
}
As you think, these if statements use regular expressions to match any bad user string and return the 403 HTTP status code to the matched object. $ Http_user_agent is a variable in an HTTP request that contains a user proxy string. '~ 'Operator performs case-sensitive matching on user proxy strings, while '~ * 'Operator is case-insensitive. The '|' operator is logical or. Therefore, you can add many user proxy keywords in the if statement and block them all.
After modifying the configuration file, you must re-load nginx to activate blocking:
?
| 1 |
$ Sudo/path/to/nginx-s reload |
You can use wget with the "-- user-agent" option to test user proxy blocking.
?
| 1 |
$ Wget -- user-agent "malicious bot" http: // <nginx-ip-address> |
Manage the user proxy blacklist in Nginx
So far, I have demonstrated how to block HTTP requests from some user proxies in nginx. What if you have many different types of Web Crawler robots to block?
As the user proxy black list will increase significantly, it is not a good idea to place them on the nginx server. Instead, you can create an independent file that lists all blocked user proxies. For example, let's create/etc/nginx/useragent. rules and define the following format to define the map of all blocked user proxies.
?
| 1 |
$ Sudo vi/etc/nginx/useragent. rules |
Copy the Code as follows:
Map $ http_user_agent $ badagent {
Default 0;
~ * Malicious 1;
~ * Backdoor 1;
~ * Netcrawler 1;
~ Antivirx 1;
~ Arian 1;
~ Webbandit 1;
}
Similar to the previous configuration ,'~ * 'Match keywords in both upper and lower case insensitive modes, while '~ 'A case-sensitive regular expression is used to match keywords. The "default 0" line indicates that user agents not listed in any other files are allowed.
Next, open the nginx configuration file of your website, find the section containing http, and add the following line to a location in the http section.
Copy the Code as follows:
Http {
.....
Include/etc/nginx/useragent. rules
}
Note that the include declaration must appear before the server section (that is why we add it to the http section ).
Now, open the nginx configuration to define the part of your server and add the following if statement:
The Code is as follows:
Server {
....
If ($ badagent ){
Return 403;
}
....
}
Finally, reload nginx.
?
| 1 |
$ Sudo/path/to/nginx-s reload |
Now, any user agent containing the keywords listed in/etc/nginx/useragent. rules will be automatically disabled by nginx.