Functions and principles of several encodings in JSP and Servlet

Source: Internet
Author: User
Tags recode

First, let's talk about the functions of several encodings in JSP and Servlet.

In JSP and Servlet mainly have the following areas can be set encoding, pageEncoding = "UTF-8", contentType = "text/html; charset = UTF-8", request. setCharacterEncoding ("UTF-8") and response. setCharacterEncoding ("UTF-8"), where the first two can only be used in JSP, and the last two can be used in JSP and Servlet.

1. pageEncoding = "UTF-8" is used to set the encoding used when JSP is compiled into Servlet.

As we all know, JSP is first compiled into Servlet on the server. PageEncoding = "UTF-8" is used to tell the JSP compiler the encoding used to compile the JSP file into a Servlet. Generally, strings defined in JSP are directly defined in JSP, rather than data submitted from a browser.) Many of the garbled characters are caused by incorrect parameter settings. For example, your JSP file is saved in GBK encoding, while pageEncoding = "UTF-8" is specified in JSP, it will cause garbled characters defined inside JSP.

In addition, this parameter also has the function of specifying the re-encoding of the server response when the contentType parameter is not specified in JSP or the response. setCharacterEncoding method is not used.

2. contentType = "text/html; charset = UTF-8" is used to specify the encoding of the server response. If the response. setCharacterEncoding method is not used, this parameter is used to specify the re-encoding of the server response.

3. request. setCharacterEncoding ("UTF-8") is used to set the re-encoding of client requests. This method is used to specify the encoding used to re-encode or decode the data sent by the browser.

4. response. setCharacterEncoding ("UTF-8") is used to specify the encoding for reencoding the server response. This encoding is used when the server re-encodes the data before sending the data to the browser.

Next, let's talk about how the browser encodes the received and sent data.

Response. setCharacterEncoding ("UTF-8") specifies the encoding of the server response. In addition, the browser uses this parameter to recode or decode the received data ). Therefore, whether you set response in JSP. setCharacterEncoding ("UTF-8") or response. setCharacterEncoding ("GBK"), the browser can correctly display Chinese only if the data encoding you send to the browser is correct, for example, the pageEncoding parameter is set correctly ). You can create an experiment and set response in JSP. setCharacterEncoding ("UTF-8"). When this page is displayed in IE, select "View (V)" à "encoding (D) in the IE menu) "can see is" UnicodeUTF-8) ", and set response in JSP. setCharacterEncoding ("GBK"). When this page is displayed in IE, select "View (V)" à "encoding (D) in IE's menu) ", you can see it is" simplified Chinese GB2312 )".

When the browser sends data, it will encode the URL and parameters. The browser also uses the response. setCharacterEncoding parameter to encode the Chinese characters in the parameters. Take Baidu and GOOGLE for example. If you search for "Chinese characters" in Baidu, Baidu will encode it as "% BA % D7 % D6 ". GOOGLE will encode "% E6 % B1 % 89% E5 % AD % 97" when searching for "Chinese characters" in GOOGLE, because of Baidu's response. the setCharacterEncoding parameter is GBK, while GOOGLE's response. the setCharacterEncoding parameter is the UTF-8.

The encoding used by the browser to receive server data and send data to the server is the same, which is the response of the JSP page by default. setCharacterEncoding parameter or contentType and pageEncoding parameter), which is called browser encoding. Of course, you can modify the browser encoding in IE and select "View (V)" à "encoding (D)" in IE menu, modifying this parameter will cause garbled characters on the correct page. An interesting example is that when you browse GOOGLE's homepage in IE, you can change the browser code to "simplified Chinese GB2312)". At this time, Chinese characters on the page will become garbled and ignore it, enter "Chinese character" in the text box and submit it. GOOGLE will encode it as "% BA % D7 % D6". It can be seen that when the browser performs URL encoding on Chinese characters, the browser encoding is used.

I figured out how the browser encodes the data when receiving and sending data. Let's take a look at how the server encodes the data when receiving and sending data.

For data sending, the server will encode the data to be sent according to the priority of response. setCharacterEncoding-contentType-pageEncoding.

There are three situations for receiving data. One is the data submitted by the browser directly using the URL, and the other two are the data submitted using the form GET and POST methods.

Because various WEB servers have different processing methods for these three methods, we take Tomcat5.0 as an example.

Regardless of the method used for submission, if the parameter contains Chinese characters, the browser uses the URL encoding of the current browser.

For the data submitted in POST mode in the form, as long as the request is correct in the JSP that receives the data. the setCharacterEncoding parameter is used to re-encode the client request and set it to browser encoding. This ensures that the obtained parameter encoding is correct. Some readers may ask how to get the browser code? As mentioned above, by default, browser encoding is the value set by response. setCharacterEncoding In the JSP page that responds to the request. Therefore, for the data submitted by the POST form, the request. setCharacterEncoding must be set to the same value as the response. setCharacterEncoding value on the JSP page that generates the submitted form.

For the data submitted by the URL and the data submitted by the GET method in the form, set the request in the JSP that receives the data. the setCharacterEncoding parameter does not work, because in Tomcat5.0, The ISO-8859-1 is used by default to recode the data submitted by the URL and the data submitted by the GET method in the form ), this parameter is not used to re-encode and decode the data submitted by the URL and the data submitted by the GET method in the form ). To solve this problem, set the useBodyEncodingForURI or URIEncoding attribute in the Tomcat ctor tag of the Tomcat configuration file. The useBodyEncodingForURI parameter indicates whether to use the request. the setCharacterEncoding parameter recodes the data submitted by the URL and the data submitted by the GET method in the form. By default, this parameter is set to true in falseTomcat4.0 ); the URIEncoding parameter specifies the uniform re-encoding and decoding of all GET requests, including the data submitted by the URL and the data submitted by the GET method in the form. The difference between URIEncoding and useBodyEncodingForURI is that URIEncoding performs unified re-encoding and decoding on the data of all GET requests, while useBodyEncodingForURI is based on the request on the page responding to the request. the setCharacterEncoding parameter re-encoding and decoding of data). Different pages can have different re-encoding and decoding. Therefore, you can modify URIEncoding to browser encoding or useBodyEncodingForURI to true for URL-submitted data and GET-submitted data in the form, and request data on the JSP page that obtains data. the setCharacterEncoding parameter is set to browser encoding.

The following is a summary of how to prevent Chinese garbled characters when Tomcat is used as a WEB server.

1, for the same application, it is best to unified encoding, recommended for UTF-8, of course, GBK can also.

2. Set the pageEncoding parameter of JSP correctly.

3. Set contentType = "text/html; charset = UTF-8" or response. setCharacterEncoding ("UTF-8") in all JSP and Servlet to indirectly implement browser encoding settings.

4. For requests, you can use a filter or set request. setCharacterEncoding ("UTF-8") in each JSP and Servlet "). At the same time, to modify the default configuration of Tomcat, it is recommended to set the useBodyEncodingForURI parameter to true, you can also set the URIEncoding parameter to UTF-8 may affect other applications, so not recommended ).

  1. Brief Introduction to advanced operations on JSP Databases
  2. Interactive use of JSP data and JavaScript data
  3. Functions and principles of several encodings in JSP and Servlet
  4. How to solve the problem of garbled JSP page display
  5. Connect to various databases using JDBC in JSP

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.