Java Chinese garbled solution (6) ----- coding and decoding in java Web, java ----- javaWeb

Source: Internet
Author: User
Tags form post

Java Chinese garbled solution (6) ----- coding and decoding in java Web, java ----- javaWeb

In the previous blog, LZ introduced the java encoding and decoding operations in the previous two scenarios (IO and memory, in fact, in these two scenarios, we only need to set the correct encoding and decoding method during the encoding and decoding process. In general, there will be no garbled characters. For those who are engaged in java development, the most common cause of garbled code is the web part. First, let's take a look at where there is an encoding conversion operation in javaWeb.

Encoding & Decoding

We can see where transcoding is available in javaWeb:

The user wants the server to send an HTTP request. The url, cookie, and parameter must be encoded. After encoding, the server accepts the HTTP request and parses the HTTP request, then, the url, cookie, and parameter are decoded. During the server's business logic processing process, you may need to read databases, local files, or other files in the network. These processes all require encoding and decoding. After the processing is complete, the server encodes the data and sends it to the client. The browser decodes the data and displays it to the user. There are many encoding and decoding involved in this process. The most common cause of garbled characters is the interaction between the server and the client.

The above process can be summarized as follows: The page encoding data is transmitted to the server, the server decodes the obtained data, and passes the final result encoding to the client after some business logic processing, client decoding is displayed to users. So next I will describe the Java Web encoding and decoding.

Request

When the client wants the server to send a request, it simply goes through four steps:

1. Direct URL access.

2. Page Link.

3. Form get submission

4. form post submission

URL

For a URL, if all the URLs are in English, there is no problem. If there is a Chinese character, encoding is required. How to encode it? According to what rules? How to decode it? The following LZ will answer each question! First, let's look at the URL components:

In this URL, the browser will encode path and parameter. To better interpret the encoding process, use the following URL:

Http: // 127.0.0.1: 8080/perbank/I am cm? Name = My name is cm

Enter the above address in the browser URL input box. by viewing the http header information, we can see how the browser is encoded. The following is the encoding of IE, Firefox, and Chrome browsers:

We can see that the encoding of "I am" in various browsers is as follows:


Path Section

Query String

Firefox

E6 88 91 E6 98 AF

E6 88 91 E6 98 AF

Chrome

E6 88 91 E6 98 AF

E6 88 91 E6 98 AF

IE

E6 88 91 E6 98 AF

CE D2 CA C7

Refer to the previous Blog Code shows that for the path part of Firefox, chrome, IE are using UTF-8 encoding format, for the Query String part of Firefox, chrome using UTF-8, IE using GBK. As for why % is added, this is because the URL encoding specification requires the browser to encode non-ASCII characters into hexadecimal numbers in a certain encoding format and then add "%" before each hex representation ".

Of course, for different browsers, different versions of the same browser, different operating systems and other environments will lead to different encoding results. In one case in the above table, any conclusion under the URL encoding rules is too early. Because the url uri and QueryString encoding of various browsers and operating systems may be different, decoding the server will inevitably cause a lot of trouble. Below we will have tomcat, see how tomcat decodes URLs.

The URL of the Resolution request is at org. apache. coyote. in the parseRequestLine method of HTTP11.InternalInputBuffer, this method sets the byte [] of the uploaded URL to org. apache. coyote. request. Here the URL is still in byte format and converted to char is completed in the convertURI method of org. apache. catalina. connector. CoyoteAdapter:

Protected void convertURI (MessageBytes uri, Request) throws Exception {ByteChunk bc = uri. getByteChunk (); int length = bc. getLength (); CharChunk cc = uri. getCharChunk (); cc. allocate (length,-1); String enc = connector. getURIEncoding (); // obtain the URI decoding set if (enc! = Null) {B2CConverter conv = request. getURIConverter (); try {if (conv = null) {conv = new B2CConverter (enc); request. setURIConverter (conv) ;}} catch (IOException e ){...} if (conv! = Null) {try {conv. convert (bc, cc, cc. getBuffer (). length-cc. getEnd (); uri. setChars (cc. getBuffer (), cc. getStart (), cc. getLength (); return;} catch (IOException e ){...}}} // Default encoding: fast conversion byte [] bbuf = bc. getBuffer (); char [] cbuf = cc. getBuffer (); int start = bc. getStart (); for (int I = 0; I <length; I ++) {cbuf [I] = (char) (bbuf [I + start] & 0xff );} uri. setChars (cbuf, 0, length );}

The code above shows that the URI decoding operation is to first obtain the Connector decoding set, which is configured in server. xml

<Connector URIEncoding="utf-8"  />

If not defined, it will be parsed using the default encoding ISO-8859-1.

For the Query String part, we know that no matter whether we submit the request in get or POST mode, all Parameters are saved in Parameters, and then we use request. getParameter: decoding is performed when the getParameter method is called for the first time. Within the getParameter method, it calls the parseParameters method of org. apache. catalina. connector. Request. This method will decode the passed parameters. The following code is only part of the parseParameters method:

// Obtain the encoding String enc = getCharacterEncoding (); // obtain the Charset boolean useBodyEncodingForURI = connector. getUseBodyEncodingForURI () defined in ContentType; if (enc! = Null) {// If the encoding is not empty, set the encoding to enc parameters. setEncoding (enc); if (useBodyEncodingForURI) {// if Chartset is set, decode queryString to ChartSet parameters. setQueryStringEncoding (enc) ;}} else {// set the default decoding method parameters. setEncoding (org. apache. coyote. constants. DEFAULT_CHARACTER_ENCODING); if (useBodyEncodingForURI) {parameters. setQueryStringEncoding (org. apache. coyote. constants. DEFAULT_CHARACTER_ENCODING );}}

From the code above, we can see that the decoding format of the query String either uses the ChartSet or the default decoding format ISO-8859-1. Note that the ChartSet in this setting is the ContentType defined in the http Header. If we need to change the specified attribute to take effect, we also need to configure it as follows:

<Connector URIEncoding="UTF-8" useBodyEncodingForURI="true"/>

The above section details the encoding and decoding process of URL-based requests. In fact, we use forms to submit data.

Form GET

We know that submitting data through URL is prone to garbled characters, so we prefer to use forms. When you click "submit" to submit a form, the browser will encode the data and pass it to the server. The data submitted in the GET method is spliced after the URL (can be used as query String ??) So the tomcat server uses URIEncoding during decoding. The tomcat server decodes Based on the set URIEncoding, which uses the default ISO-8859-1 if not set. If we set encoding to UTF-8 on the page, and URIEncoding is not or not set, the server will produce garbled code when decoding. In this case, we can obtain the correct data in the form of new String (request. getParameter ("name"). getBytes ("iso-8859-1"), "UTF-8.

Form POST

For the post method, the encoding is determined by the page, that is, the contentType. When I click the submit button on the page to submit a form, the browser will first encode the parameters of the POST form based on the ontentType charset encoding format and then submit them to the server, the server also uses the character set in contentType for decoding (here it is different from the get method). This means that the parameters submitted through the POST form generally do not have garbled characters. Of course, we can set this character set encoding: request. setCharacterEncoding (charset ).

----- Original from: http://cmsblogs.com /? P = 1510Please respect the author's hard work and repost the source.

----- Personal site:Http://cmsblogs.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.