This article mainly introduced in the nginx to intercept a specific user agent tutorial, and for these intercepted users to set a blacklist for easy management, the need for friends can refer to the
The modern internet has spawned a vast array of malicious robots and web crawlers, such as malware bots, spam programs, or content scrapers, which have been surreptitiously scanning your site, doing things like detecting potential web sites, harvesting e-mail addresses, or simply stealing content from your site. Most robots can be identified by their "User agent" signature string.
As a first line of defense, you can try to prevent these malware bots from accessing your site by adding their user agent string to the robots.txt file. Unfortunately, however, the operation is only for those "behaving well" robots that are designed to conform to robots.txt specifications. Many malware bots can easily ignore robots.txt and then scan your site randomly.
Another way to block a particular robot is to configure your network server to deny requests for content through a specific user agent string. This article is a description of how to block a specific user agent on a nginx network server.
Blacklist specific user agents in Nginx
To configure the user agent blocking list, open the Nginx configuration file for your site and locate the Server Definition section. The file may be placed in a different place depending on your nginx configuration or Linux version (e.g.,/etc/nginx/nginx.conf,/etc/nginx/sites-enabled/ ,/usr/local/nginx/ Conf/nginx.conf,/etc/nginx/conf.d/ ).
The code is as follows:
server {
Listen default_server;
server_name xmodulo.com;
root/usr/share/nginx/html;
....
}
After you open the profile and locate the Server section, add the following if declaration to somewhere within that section.
The code is as follows:
server {
Listen default_server;
server_name xmodulo.com;
root/usr/share/nginx/html;
# Case Sensitive Matching
if ($http _user_agent ~ (antivirx| Arian) {
return 403;
}
#大小写无关的匹配
Copy code code as follows:
if ($http _user_agent ~* (netcrawl|npbot|malicious)) {
return 403;
}
....
}
As you would expect, these if declarations use regular expressions to match any bad user string and return the 403 HTTP status code to the matching object. $http _user_agent is a variable in the HTTP request that contains a user agent string. The ' ~ ' operator matches the case sensitivity of the user agent string, while the ' ~* ' operator does case-insensitive matching. | The operator is logical OR, so you can put a lot of user agent keywords in the If declaration and then block them all out.
After modifying the configuration file, you must reload the nginx to activate the blocking:
?
1 |
$ sudo/path/to/nginx-s Reload |
You can block by using the wget test User agent with the "--user-agent" option.
?
1 |
$ wget--user-agent "malicious bot" http://<nginx-ip-address> |
Manage the user agent blacklist in Nginx
So far, I've shown how to block some of the user agent HTTP requests in Nginx. What if you have many different types of web crawler robots to block?
Because the user agent blacklist will grow very large, it is not a good idea to put them in the Nginx Server section. Instead, you can create a separate file that lists all of the blocked user agents. For example, let's create a/etc/nginx/useragent.rules and define an atlas of all blocked user agents in the following format.
?
1 |
$ sudo vi/etc/nginx/useragent.rules |
Copy code code as follows:
Map $http _user_agent $badagent {
Default 0;
~*malicious 1;
~*backdoor 1;
~*netcrawler 1;
~antivirx 1;
~arian 1;
~webbandit 1;
}
Similar to previous configurations, ' ~* ' matches keywords in a case-insensitive manner, while ' ~ ' will match the keyword with a case sensitive regular expression. The "Default 0" line means that a user agent that is not listed in any other file will be allowed.
Next, open the Nginx configuration file for your site, find the part that contains HTTP, and add the following line to the HTTP section.
Copy code code as follows:
HTTP {
.....
Include/etc/nginx/useragent.rules
}
Note that the include declaration must appear before the server section (which is why we added it to the HTTP section).
Now, open the Nginx configuration definition section of your server and add the following if statement:
The code is as follows:
server {
....
if ($badagent) {
return 403;
}
....
}
Finally, reload the nginx.
?
1 |
$ sudo/path/to/nginx-s Reload |
Any user agents that contain the keywords listed in/etc/nginx/useragent.rules will now be automatically banned by Nginx.