Multi-language display in Servlet and Jsp

Source: Internet
Author: User

Because I have never believed that Java will have a BUG that does not allow mixed display of multiple languages, this weekend I studied the problem of multi-language display of Servlet and Jsp, that is, the problem of multi-Character Set of Servlet, since I am not very clear about the character set concept, what I wrote may not be accurate. I understand the character set in Java in this way: At runtime, each string object stores UNICODE Internal codes (I think all languages are encoded accordingly, because strings are always represented by internal codes in the computer, generally, the platform is related to string encoding in computer languages, while Java uses platform-independent UNICODE ).

When Java reads a string from a byte stream, it converts platform-related bytes into platform-independent Unicode strings. During output, Java will convert the Unicode string to the byte stream related to the platform. If a Unicode character does not exist on a platform, ′? ′. For example, in Chinese Windows, Java reads a "GB2312" encoded file (which can be any stream) to the memory to construct a String object, the GB2312 encoded text will be converted to a Unicode-encoded string. If this string is output, the Unicode string will be converted to a byte stream or array of GB2312: "Chinese test" -----> "u4e2du6587u6d4bu8bd5" -----> "Chinese test ".

Example:

Byte [] bytes = new byte [] {(byte) 0xd6, (byte) 0xd0, (byte) 0xce, (byte) 0xc4, (byte) 0xb2, (byte) 0xe2, (byte) 0xca, (byte) 0xd4}; // GBK-encoded "Chinese test"

Java. io. ByteArrayInputStream bin = new java. io. ByteArrayInputStream (bytes );

Java. io. BufferedReader reader = new java. io. BufferedReader (new java. io. InputStreamReader (bin, "GBK "));

String msg = reader. readLine ();

System. out. println (msg)

This program is placed in a system containing the words "Chinese test" (such as a Chinese System), which can print the words correctly. The msg string contains the correct Unicode code for the "Chinese test": "u4e2du6587u6d4bu8bd5". It is converted to the default Character Set of the operating system during printing. Can it correctly display the character set dependent on the operating system, our information can be correctly output only in systems that support the corresponding character set, otherwise it will be junk.

Let's take a look at the multi-language problems in Servlet/Jsp. Our goal is that clients in any country send information to the Server through Form. The Server saves the information to the database, and the client still can see the correct information sent by itself during retrieval. In fact, we must ensure that the SQL statement stored on the Server contains the correct Unicode encoding of the text sent by the client; the encoding method used for communication between DBC and the database can contain the text information sent by the client. In fact, it is best to allow JDBC to directly use UNICODE/UTF8 to communicate with the database! In this way, the information will not be lost. The Server must adopt a non-lost encoding method when sending the information to the client, or Unicode/Utf8.

If the Enctype attribute of Form is not specified, Form submits the input content according to the urlencode of the encoding character set on the current page. The server obtains the urlencoding string. The encoded urlencoding string is related to the page encoding. For example, if the gb2312 encoding page submits a "Chinese test ", the result is "% D6 % D0 % CE % C4 % B2 % E2 % CA % D4". Each "%" is followed by a hexadecimal string; in UTF8 encoding, "% E4 % B8 % AD % E6 % 96% 87% E6 % B5 % 8B % E8 % AF % 95 ", because a Chinese character in GB2312 encoding is 16 bits, while a Chinese character in UTF8 is 24 bits. Internet Explorer 4 and later browsers in China, Japan, and South Korea all support UTF8 encoding. This solution certainly covers these three languages. Therefore, if we enable UTF8 encoding for Html pages, we can support at least these three languages.

However, if the html/Jsp page is UTF-8 encoded, the application server may not know this situation, because if the information sent by the browser does not contain charset information, at most, the Server knows to read the Accept-Language request for bidding. We know that the browser uses no encoding only for this bidding. Therefore, the application Server cannot correctly parse the submitted content. Why? Because all strings in Java are Unicode16-bit encoded, HttpServletRequest. the request (String) function is to convert the Urlencode encoding information submitted by the client to a Unicode String. Some servers can only consider that the client encoding is the same as that of the Server platform and simply use URLDecoder. the decode (String) method is directly decoded. If the client encoding is exactly the same as that of the Server, the correct String can be obtained. Otherwise, if the submitted String contains local characters, this will cause junk information.

In the solution I proposed, we have already specified Utf8 encoding. To avoid this problem, we can customize the decode method:

Public static String decode (String s, String encoding) throws Exception {

StringBuffer sb = new StringBuffer ();

For (int I = 0; I Char c = s. charAt (I );

Switch (c ){

Case '+ ′:

Sb. append (′′);

Break;

Case '% ′:

Try {

Sb. append (char) Integer. parseInt (

S. substring (I + 1, I + 3), 16 ));

}

Catch (NumberFormatException e ){

Throw new IllegalArgumentException ();

}

I + = 2;

Break;

Default:

Sb. append (c );

Break;

}

}

// Undo conversion to external encoding

String result = sb. toString ();

Byte [] inputBytes = result. getBytes ("8859_1 ");

Return new String (inputBytes, encoding );

}

This method can specify encoding. If you specify it as UTF8, it meets our needs. For example, use it to parse: "% E4 % B8 % AD % E6 % 96% 87% E6 % B5 % 8B % E8 % AF % 95" to get the Unicode string of the correct Chinese character "Chinese test.

The problem is that we must obtain the Urlencode string submitted by the client. You can use HttpServletRequest for the information submitted by form whose method is get. the getQueryString () method is read, while the form submitted information of the post method can only be read from ServletInputStream. In fact, after the standard getParameter method is called for the first time, the information submitted by form is read, and ServletInputStream cannot be read repeatedly. Therefore, we should read and parse the information submitted by form before using the getParameter method for the first time.

I did this. I created a Servlet base class that overwrites the service method and read and parse the content submitted by form before calling the service method of the parent class. Please refer to the following source code:

Package com. hto. servlet;

Import javax. servlet. http. HttpServletRequest;

Import java. util .*;

/**

* Insert the type's description here.

* Creation date: (15:43:46)

* @ Author: Qian weichun

*/

Public class UTF8ParameterReader {

Hashtable pairs = new Hashtable ();

/**

* UTF8ParameterReader constructor comment.

*/

Public UTF8ParameterReader (HttpServletRequest request) throws java. io. IOException {

Super ();

Parse (request. getQueryString ());

Parse (request. getReader (). readLine ());

}

/**

* UTF8ParameterReader constructor comment.

*/

Public UTF8ParameterReader (HttpServletRequest request, String encoding) throws java. io. IOException {

Super ();

Parse (request. getQueryString (), encoding );

Parse (request. getReader (). readLine (), encoding );

}

Public static String decode (String s) throws Exception {

StringBuffer sb = new StringBuffer ();

For (int I = 0; I Char c = s. charAt (I );

Switch (c ){

Case '+ ′:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.