1.JAVA WEB Note Chinese garbled

Source: Internet
Author: User

The reason of JAVA WEB garbled problem parsing garbled

In the Java Web development process, often encountered garbled problems, resulting in garbled reasons, summed up is the character encoding and decoding way does not match.

Since the garbled reason is that the character encoding and decoding way does not match, then why do we have to encode the characters, do not encode it? This is because the basic unit of data stored in a computer is 1 bytes, or 8 bits, so it can express a maximum of 28 = 256 characters, and the characters that exist in our real world (Kanji, English, other words, etc.) are far more than this number, so in order to solve the contradiction between character and Byte, Characters are encoded to be stored on the computer.

Encoding and decoding

  The common encoding methods in computer are ASCII, iso-8859-1, GB2312, UTF-16, UTF-8 and several encoding methods.

The ASCII code is represented by the low 7 bits of a byte, so the total number of characters that can be expressed is 27 = 128. ISO-8859-1 is an ISO organization that is extensible based on ASCII code and is compatible with ASCII code, covering most Western European characters. Iso8859-1 is represented by a byte, so it can express up to 256 characters. GB2312, the use of double-byte encoding, encoding range is A1-F7, wherein A1-A9 is the symbol area, B0-f7 is the Chinese character area, contains 6,763 Chinese characters. GBK is to extend the GB2312 encoding and add more Chinese characters, which can always be expressed in 21,003 characters. UTF-16 is a fixed-length encoding, regardless of what characters are represented by 2 bytes, which is the storage format for characters in Java memory. In contrast to UTF-16, UTF-8 uses a variable-length encoding, with different types of characters that can be made up of 1-6 bytes.

The following is a string "to the field" to see the different encoding methods in the computer, such as.

Garbled analysis and resolution

  For the Java Web garbled problem, we divide the bit request caused by garbled and response caused by garbled, for different garbled we want to analyze its garbled reason, that is, the way the character encoding, decoding the way is what.

We want to parse the HTTP request and see how it is encoded because the request is garbled because the HTTP request is divided into a GET request and a POST request, which we will discuss separately.

For GET requests, is the browser's default request method, and the way the form is submitted when it is set to "get". We look at the details of this via Firefox:

The Address bar is:

The requested content is:

  

From the above request, we can see that the query string in the GET request is placed in the request line, sent to the Web server, through the "Tian Tien" code, we can see that the browser encoding the string is "UTF-8".

Look at the server code we can see garbled (such as), because the server in the acceptance of the string encoding of the data by default through the Iso-8859-1 way to decode, so that the encoding and decoding is not uniform.

  

The solution is as follows:

First get the string before the user decodes the encoding, and then specify how the string is encoded, such as:

The solution is as follows:

  In the Java Web development process, we pass parameters in hyperlinks and often encounter Chinese situations. In this case, we need to encode the Chinese, we can set as UTF-8, decoding scheme ibid.

  

<a href= "${pagecontext.request.contextpath}/test?user=<%=urlencoder.encode (" Hita "," UTF-8 ")%>" > click </a>

  For a POST request, is how the form is submitted when it is set to "post". We look at the details of this via Firefox:

The Address bar and its pages are:

  

The POST request content is:

  

As we can know, in the POST request, the request content is sent directly in the request body to the Web server, encoded as "Utf-8".

In this response servlet, the Dopost method body is as follows:

  

public void DoPost (HttpServletRequest request, httpservletresponse response) throws Servletexception, IOException { String user=request.getparameter ("user"); SYSTEM.OUT.PRINTLN (user);//output is Æ?¥å?? E?? Ç?°}

  The reason for garbled here is still in the code getparameter ("user"), the Web server with the default decoding scheme "Iso-8859-1" to decode, resulting in the encoding and decoding scheme of the disagreement, the solution can be used to get the request garbled solution, But there is a simpler solution that directly specifies that the encoding/decoding scheme for the method body is "Utf-8". The scenario is as follows.

  

public void DoPost (HttpServletRequest request, httpservletresponse response) throws Servletexception, IOException {  Response.setcharacterencoding ("Utf-8"); Set the encoding/decoding scheme of the request body to utf-8string user=request.getparameter ("user");          SYSTEM.OUT.PRINTLN (user); The output is the Hita field}

  

1.JAVA WEB Note Chinese garbled

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.