First, let's talk about the functions of several encodings in JSP/servlet.
In JSP/servlet mainly has the following areas can be set encoding, pageencoding = "UTF-8", contenttype = "text/html; charset = UTF-8", request. setcharacterencoding ("UTF-8") and response. setcharacterencoding ("UTF-8"), where the first two can only be used in JSP, and the last two can be used in JSP and servlet.
1. pageencoding = "UTF-8" is used to set the encoding used when JSP is compiled into servlet.
As we all know, JSP is first compiled into servlet on the server. Pageencoding = "UTF-8" is used to tell the JSP compiler the encoding used to compile the JSP file into a servlet. Generally, many of the garbled characters defined in JSP (defined directly in JSP, rather than the data submitted from the browser) are caused by incorrect parameter settings. For example, your JSP file is saved in GBK encoding, while pageencoding = "UTF-8" is specified in JSP, it will cause garbled characters defined inside JSP.
In addition, this parameter also has the function of specifying the re-encoding of the server response when the contenttype parameter is not specified in JSP or the response. setcharacterencoding method is not used.
2. contenttype = "text/html; charset = UTF-8" is used to specify the encoding of the server response.
If the response. setcharacterencoding method is not used, this parameter is used to specify the re-encoding of the server response. 3. Request. setcharacterencoding ("UTF-8") is used to set the re-encoding of client requests.
This method is used to specify the encoding used to re-encode (or decode) the data sent by the browser.
4. response. setcharacterencoding ("UTF-8") is used to specify the encoding for reencoding the server response.
This encoding is used when the server re-encodes the data before sending the data to the browser.
Next, let's talk about how the browser encodes the received and sent data.
Response. setcharacterencoding ("UTF-8") specifies the encoding of the server response. In addition, the browser uses this parameter to recode (or decode) the received data ). Therefore, whether you set response in JSP. setcharacterencoding ("UTF-8") or response. setcharacterencoding ("GBK") allows the browser to correctly display Chinese characters (provided that the data encoding you send to the browser is correct, for example, setting the pageencoding parameter correctly ). You can create an experiment and set response in JSP. setcharacterencoding ("UTF-8"). When this page is displayed in IE, select "View (v)" à "encoding (d) in the IE menu) "can see is" Unicode (UTF-8) ", and set response in JSP. setcharacterencoding ("GBK"). When this page is displayed in IE, select "View (v)" à "encoding (d) in IE's menu) you can see that it is "simplified Chinese (gb2312 )".
When the browser sends data, it will encode the URL and parameters. The browser also uses the response. setcharacterencoding parameter to encode the Chinese characters in the parameters. Take Baidu and Google for example. If you search for "Chinese characters" in Baidu, Baidu will encode it as "% Ba % D7 % D6 ". Google will encode "% E6 % B1 % 89% E5 % ad % 97" when searching for "Chinese characters" in Google, because of Baidu's response. the setcharacterencoding parameter is GBK, while Google's response. the setcharacterencoding parameter is the UTF-8.
The encoding used by the browser to receive server data and send data to the server is the same, which is the response of the JSP page by default. the setcharacterencoding parameter (or the contenttype and pageencoding parameters) is called browser encoding. Of course, you can modify the browser encoding in IE (select "View (v)" à "encoding (d)" in the IE menu), but normally, modifying this parameter will cause garbled characters on the correct page. An interesting example is that when you browse Google's homepage in IE, you can change the browser code to "simplified Chinese (gb2312)". At this time, Chinese characters on the page will become garbled and ignore it, enter "Chinese character" in the text box and submit it. Google will encode it as "% Ba % D7 % D6". It can be seen that when the browser performs URL encoding on Chinese characters, the browser encoding is used.
I figured out how the browser encodes the data when receiving and sending data. Let's take a look at how the server encodes the data when receiving and sending data.
For data sending, the server will encode the data to be sent according to the priority of response. setcharacterencoding-contenttype-pageencoding.
There are three situations for receiving data. One is the data submitted by the browser directly using the URL, and the other two are the data submitted using the form get and post methods.
Because various web servers have different processing methods for these three methods, we take tomcat5.0 as an example.
Regardless of the method used for submission, if the parameter contains Chinese characters, the browser uses the URL encoding of the current browser.
For the data submitted in post mode in the form, as long as the request is correct in the JSP that receives the data. the setcharacterencoding parameter is used to re-encode the client request and set it to browser encoding. This ensures that the obtained parameter encoding is correct. Some readers may ask how to get the browser code? As mentioned above, by default, browser encoding is the value set by response. setcharacterencoding In the JSP page that responds to the request. Therefore, for the data submitted by the post form, the request. setcharacterencoding must be set to the same value as the response. setcharacterencoding value on the JSP page that generates the submitted form.
For the data submitted by the URL and the data submitted by the get method in the form, set the request in the JSP that receives the data. the setcharacterencoding parameter does not work, because in tomcat5.0, by default, the ISO-8859-1 is used to recode (decode) the data submitted by the URL and the data submitted by the get method in the form ), this parameter is not used to re-encode (decode) the data submitted by the URL and the data submitted by the get method in the form ). To solve this problem, set the usebodyencodingforuri or uriencoding attribute in the Tomcat ctor tag of the tomcat configuration file. The usebodyencodingforuri parameter indicates whether to use the request. the setcharacterencoding parameter re-encodes the data submitted by the URL and the data submitted by the get method in the form. By default, this parameter is false (the default value is true in tomcat4.0 ); the uriencoding parameter specifies the uniform re-encoding (Decoding) of all get requests (including data submitted by URL and data submitted by get in the form. The difference between uriencoding and usebodyencodingforuri is that uriencoding uniformly recodes (decodes) the data of all get requests, while usebodyencodingforuri is based on the request on the page responding to the request. the setcharacterencoding parameter re-decodes data. Different pages can have different re-encoding (Decoding) codes. Therefore, you can modify uriencoding to browser encoding or usebodyencodingforuri to true for URL-submitted data and get-submitted data in the form, and request data on the JSP page that obtains data. the setcharacterencoding parameter is set to browser encoding.
The following is a summary of how to prevent Chinese garbled characters when Tomcat is used as a web server.
1, for the same application, it is best to unified encoding, recommended for UTF-8, of course, GBK can also.
2. Set the pageencoding parameter of JSP correctly.
3. Set contenttype = "text/html; charset = UTF-8" or response. setcharacterencoding ("UTF-8") in all JSP/servlet to indirectly implement browser encoding settings.
4. For requests, you can use a filter or set request. setcharacterencoding ("UTF-8") in each JSP/servlet "). At the same time, to modify the default configuration of Tomcat, we recommend that you set the usebodyencodingforuri parameter to true, or you can set the uriencoding parameter to UTF-8 (which may affect other applications, so it is not recommended ).