Preface
I recently applied for an SSL certificate for fun, so my blog has also enabled HTTPS (on the one hand, it is also because there are too many spam comments. Should I enable HTTPS to prevent some simple water filling machines ?).
Then I heard a friend remind me that Baidu and other search engine crawlers do not capture HTTPS, which may cause indexing problems.
Start
After some thought, I thought about how to solve this problem through NGINX rules. Then I searched the web crawler UA and wrote the following rules.
Note that the NGINX rules do not support logic and/or, nor do they support conditional nesting, unlike the code written at ordinary times.
The code is as follows: |
Copy code |
Set $ flag 0; If ($ host! = 'Jerry. hk '){ Set $ flag 1; } If ($ server_port = 80 ){ Set $ flag 1; } If ($ scheme = http ){ Set $ flag 1; } If ($ http_user_agent ~ * (Baiduspider | googlebot | soso | bing | sogou | yahoo | sohu-search | yodao | YoudaoBot | robozilla | msnbot | MJ12bot | NHN | Twiceler )){ Set $ flag 2; } If ($ flag = 1 ){ Rewrite ^/(. *) $ https://jerry.hk/#1 redirect; } Error_page 497 https://jerry.hk $ request_uri; |
Effect
This rule can distinguish crawlers from common users to achieve the following results:
Normal user (normal access)
1. Access a domain name with www (www.111cn.net) to jump to a domain name without www (111cn.net)
2. Enable HTTPS forcibly (users will jump to HTTPS when using HTTP access)
Search engine crawler
Normal access to the HTTP page