This article is to my previous years in the search engine project can not solve a problem to draw a full stop, the use is not very good, but can make up for a regret.
At that time, the scene is like this, the normal habit always in the search box to enter the normal search words and then search, but there are always some users think smart, from the address bar to copy the URL, and then change parameter and then access, similar to http://www.xxx.com/ Search?keyword =%e4%b8%ad%e6%96%87 (ie display, as for Chrome and Firefox will display in the Address bar in Chinese), when the user submitted a request under IE is Http://www.xxx.com/search? Keyword = Chinese, you will find that the server (Web processing backend) does not recognize the character at all, this is the browser in the back end of the submission of the request, its parameters must be the ISO-8859-1 specification of the UrlEncode, when writing a Web program, ie must let us manually to convert the code , and Chrome and Firefox can turn around, because they will be converted automatically when they are transferred.
The back end does not recognize characters, which is what we often say garbled. This garbled result is also due to decoding errors, our web container (framework, similar to Java jetty/tomcat/ This string of characters is automatically urldecode in JBoss and Python. At this time, ie committed not encoded characters are decoded, it can be imagined, no longer come back (how many people once like me to see this garbled, sick and disorderly touyi).
OK, there are two ways to solve this problem, the first is to reach the web back end (no way in the JS layer, because the user is directly knocking on the address bar of the carriage return), that is, the front end of the server (Nginx) preprocessing, no encoded characters for URL encoding. The second is to recompile the logic in the Web container about the decode of the servlet processing parameters to determine if it needs to be urldecode.
In view of the difficulty of implementation, I chose the first, in nginx processing, in Nginx using LUA to transcode parameters, and then reverse proxy to the Web backend.
Here, depending on your project, there are a few things to note, such as whether your project is UTF-8 encoding or GBK coding, the customer's environment is UTF-8 or GBK, these have to do different processing, such as my system is the browser is the system is windows, So my client's code is GBK, and then my project is UTF-8, so I need to do the gbk-UTF-8 before urlencoding.
Set_by_lua $arg _name ' local iconv = require ("Luaiconv") Local cd = Iconv.new ("Utf-8", "GBK") if (String.find (ngx.var.arg_ Name, "%")) {ngx.var.arg_name, err = Cd:iconv (ngx.var.arg_name)}return Ngx.escape_uri (ngx.var.arg_name) ';
In this scenario, my parameter name is name, and then I use the Luaiconv library to convert it. In fact, my logic is not very rigorous, such as I did not judge the code, in determining whether the need to encode the scene, but also only through the string contains% to make judgments.
Three years ago, the IE address bar will be manually input Chinese processing search engine, but also only Google, but today, there are a lot of companies have done.
The above describes the IE browser in the address bar for Param directly into the Chinese cause garbled solution, including the aspects of the content, I hope that the PHP tutorial interested in a friend helpful.