JSP in Pageencoding, Charset=utf-8 "
In the Jsp/servlet, there are several places to set the code, pageencoding= "UTF-8", contenttype= "Text/html;charset=utf-8", Request.setcharacterencoding ("UTF-8") and Response.setcharacterencoding ("UTF-8"), where the first two can be used only in JSPs, The latter two can be used in JSPs and Servlets.
1, pageencoding= "UTF-8" function is to set the JSP compiled into a servlet using the encoding.
As we all know, JSP on the server is to be compiled into a servlet first. The role of pageencoding= "UTF-8" is to tell the JSP compiler what encoding to use when compiling a JSP file into a servlet. In general, many of the strings that are defined inside the JSP (defined directly in the JSP, rather than data submitted by the browser) are garbled because of this parameter setting error. For example, your JSP file is stored in GBK encoding, but in the JSP specifies pageencoding= "UTF-8", it will cause the JSP internal definition of the string is garbled.
In addition, this parameter also has a function, that is, do not specify the contenttype parameter in the JSP, and do not use the Response.setcharacterencoding method, specify the encoding to re-encode the server response.
2. The function of contenttype= "Text/html;charset=utf-8" is to specify the encoding to recode the server response.
Use this parameter to specify the encoding to recode the server response when you do not use the Response.setcharacterencoding method. The encoding is used by the server to re-encode data before it is sent to the browser.
3, the role of Request.setcharacterencoding ("UTF-8") is to set the encoding to re-encode the client request.
This method is used to specify the encoding to use when the data sent by the browser is re-encoded (or decoded).
4. The role of response.setcharacterencoding ("UTF-8") is to specify the encoding to recode the server response.
The encoding is used by the server to re-encode data before it is sent to the browser.
Second, to say how the browser is to encode the data received and sent
The role of response.setcharacterencoding ("UTF-8") is to specify the encoding to recode the server response. At the same time, the browser is also based on this parameter to re-encode the data it receives (or is called decoding). So whether you set response.setcharacterencoding ("UTF-8") or response.setcharacterencoding ("GBK") in the JSP, The browser can display Chinese correctly (provided that the data you send to the browser is encoded correctly, such as the pageencoding parameter is set correctly). Readers can do an experiment, in the JSP set Response.setcharacterencoding ("UTF-8"), in the IE display of the page, in the IE menu, select "View (V)" à "code (D)" can be viewed in the " Unicode (UTF-8) ", and in the JSP set Response.setcharacterencoding (" GBK "), in IE, when the page is displayed, in the IE menu, select" View (V) "à" encoding (D) "can be viewed in the" Simplified Chinese (GB2312) ".
When the browser sends the data, the URL and parameters are URL-encoded, and the browser uses the response.setcharacterencoding parameter to encode the URL in the Chinese parameter. Take Baidu and Google as an example, if you search for "Chinese characters" in Baidu, Baidu will encode it as "%ba%ba%d7%d6". Google's search for "Chinese characters", Google will encode it as "%e6%b1%89%e5%ad%97", this is because Baidu's response.setcharacterencoding parameter is GBK, and Google's The response.setcharacterencoding parameter is UTF-8.
The encoding used by the browser to receive server data and send data to the server is the same, by default the response.setcharacterencoding parameter (or contenttype and pageencoding parameters) of the JSP page. We call it a browser code. Of course, in IE, you can modify the browser encoding (in the IE menu, select "View (V)" à "code (D)" in the modification), but usually, modify this parameter will make the original correct page garbled. An interesting example is that when browsing Google's home page in IE, the browser encoding is changed to "Simplified Chinese (GB2312)", at this time, the Chinese on the page will become garbled, ignore it, enter "Kanji" in the text box, submit, Google will encode it as "%ba%ba%d7%d6" , it can be seen that browser encoding is used by browsers when encoding the Chinese URL.
Figuring out how the browser encodes the data when it receives and sends the data, let's look at how the server encodes the data when it receives and sends the data.
For sending data, the server encodes the data to be sent in the order of precedence of the response.setcharacterencoding-contenttype-pageencoding.
There are three scenarios for receiving data. One is the data that the browser submits directly with the URL, and the other two are data submitted using the form's get and post methods.
Because the various web servers handle these three different ways, we take Tomcat5.0 as an example.
Regardless of whether the file is submitted in that way, if the parameter contains Chinese, the browser will use the current browser encoding to encode the URL.
For the data submitted by post in the form, as long as the request.setcharacterencoding parameter is set correctly in the JSP receiving the data, the encoded encoding of the client request is set to the browser code, which guarantees that the obtained parameters are correctly encoded. Some readers may ask, how do you get the browser code? As we mentioned above, in the default case, the browser encoding is the value that you response.setcharacterencoding set in the JSP page that should be requested. So for the data submitted by the Post form, in the JSP page that gets the data, the request.setcharacterencoding is set to the same value as the response.setcharacterencoding that generated the JSP page that submitted the form.
It is not possible to set the request.setcharacterencoding parameter in the JSP that receives the data for the data submitted by the URL and the data submitted in the form, because iso-is used by default in Tomcat5.0. 8859-1 re-encodes (decodes) the data submitted by the URL and the data submitted by the Get method in the form, without using the parameter to Recode (decode) The data submitted by the URL and the data submitted in the form by get. To resolve this issue, you should set the Usebodyencodingforuri or Uriencoding property in the Connector tab of the Tomcat configuration file. Where the Usebodyencodingforuri parameter indicates whether the data submitted by the URL and the data submitted in the form are re-encoded with the request.setcharacterencoding parameter, by default, This parameter is False (this parameter is true by default in Tomcat4.0), and the uriencoding parameter specifies a uniform recoding (decoding) encoding for all get method requests, including data submitted by the URL and the Get method submitted in the form. The difference between uriencoding and Usebodyencodingforuri is that uriencoding is a uniform recoding (decoding) of all the data requested by the Get method, Usebodyencodingforuri is the re-encoding (decoding) of the data according to the request.setcharacterencoding parameter of the page that should be requested, and the different pages can have different encodings (decoding). So for the data submitted by the URL and the data that is submitted in the form, you can modify the uriencoding parameter to encode the browser or modify Usebodyencodingforuri to true, and in the JSP page that gets the data The request.setcharacterencoding parameter is set to the browser encoding.
The following summarizes how to prevent Chinese garbled when Tomcat5.0 is a Web server.
1. For the same application, the best unified coding, recommended for UTF-8, of course GBK can also.
2. Correctly set the pageencoding parameters of the JSP
3. Set contenttype= "Text/html;charset=utf-8" or response.setcharacterencoding ("UTF-8") in all Jsp/servlet, Thus, the setting of the browser encoding is indirectly implemented.
4. For requests, you can use a filter or set request.setcharacterencoding ("UTF-8") in each jsp/servlet. Also, to modify the default configuration of Tomcat, it is recommended to set the Usebodyencodingforuri parameter to True, or you can set the uriencoding parameter to UTF-8 (which may affect other apps, so it is not recommended).