About the multi-national language display in servlet and JSP--from High man

Source: Internet
Author: User
Tags date character set constructor insert integer tostring urlencode stringbuffer
js|servlet| Display/**
* Have friends have such a painstaking work, thank you, I believe it will not blame me, his email is vividq@china.com
*/

About the multi-language display in the servlet, JSP

Because it has been not believed that Java can not mix to display a number of languages of the bug, this weekend to study the servlet, JSP in the multinational language display problem, that is, the servlet's multiple character set problem, because I am not very clear on the concept of character set, so the writing is not necessarily accurate, This is how I understand the character set in Java: At run time, each string object is stored encoded in Unicode (I think all languages are encoded, because inside the computer the string is always expressed in code), Only the string encoding in a generic computer language is platform-dependent, while Java uses platform-independent Unicode.
When Java reads a string from a byte stream, it converts the platform-dependent byte into a platform-independent Unicode string. In the output, Java converts the Unicode string into a platform-dependent byte stream, and if a Unicode character does not exist on a platform, it outputs a '? '. For example: In Chinese windows, Java reads a "GB2312" encoded file (which can be any stream) into memory to construct a string object that converts GB2312 encoded text into a Unicode encoded string. The output of this string will also convert the Unicode string into a GB2312 byte stream or array: "Chinese test"-----> "\U4E2D\U6587\U6D4B\U8BD5"-----> "Chinese test".
The following routine:
byte[] bytes = new byte[]{(byte) 0xd6, (Byte) 0xd0, (Byte) 0xce, (Byte) 0xc4, (Byte) 0xb2, (Byte) 0xe2, (Byte) 0xca, (byte) 0xd4} ;//GBK coded "Chinese test"
Java.io.ByteArrayInputStream bin = new Java.io.ByteArrayInputStream (bytes);
Java.io.BufferedReader reader = new Java.io.BufferedReader (new java.io. InputStreamReader (Bin, "GBK"));
String msg = Reader.readline ();
SYSTEM.OUT.PRINTLN (msg)
This program is placed in a system (such as the Chinese system) containing the four words "Chinese test", which can be printed correctly. The MSG string contains the correct "Chinese test" Unicode Encoding: "\u4e2d\u6587\u6d4b\u8bd5", which is converted to the operating system's default character set when printed, and whether the character set that relies on the operating system can be displayed correctly, only in systems that support the corresponding character set Our information can be correctly exported, otherwise the resulting will be rubbish.
Let's take a look at the multilingual problem in servlet/jsp. Our goal is that any country's clients send information to the server via form, server stores the information in the database, and the client can still see the correct information it sends when retrieving it. In fact, we want to make sure that the SQL statements in the final server contain the correct Unicode encoding of the client-sent text, and that the encoding used to communicate with the database can contain the text messages sent by the client, and in fact, it is best to let JDBC use the DBC directly unicode/ UTF8 and Database Communication! This ensures that the information is not lost, and that the server sends the message to the client with the encoding of not losing information or Unicode/utf8.
If you do not specify the Enctype property of the form, the form will submit the input according to the encoded character set UrlEncode the current page, and the server will get the urlencoding string. The urlencoding string that is encoded is related to the encoding of the page, such as GB2312 encoded pages submit "Chinese test", get "%d6%d0%ce%c4%b2%e2%ca%d4", each "%" followed by a 16-string , while the UTF8 encoding is "%e4%b8%ad%e6%96%87%e6%b5%8b%e8%af%95", because one of the characters in the GB2312 code is 16 digits, while the UTF8 one is 24 digits. China, Japan and South Korea ie4 above the browser support UTF8 encoding, this scheme must contain the three languages, so if we let HTML pages using UTF8 encoding will be able to support at least the three languages.
However, if we html/jsp the page using UTF8 encoding, because the application server may not know this, because if the browser sends a message that does not contain charset information, up to server knows to read Accept-language request bids, We know that only this bid is not known by the browser code, so the application server can not correctly parse the content submitted, why? Because all strings in Java are UNICODE16-bit encoded, the function of Httpservletrequest.request (string) is to convert the UrlEncode encoded information submitted by the client into a Unicode string. Some servers can only assume that the client's encoding is the same as the server platform, simply using the Urldecoder.decode (string) method to decode directly, if the client code is exactly the same as the server, you can get the correct string, otherwise, If the local character is included in the submit string, it will result in garbage information.
In my solution, I have already specified the use of UTF8 encoding, so that we can avoid this problem, we can customize the Decode method:
public static string decode (String s,string encoding) throws Exception {
StringBuffer sb = new StringBuffer ();
for (int i=0; i<s.length (); i++) {
char C = S.charat (i);
Switch (c) {
Case ' + ':
Sb.append (");
Break
Case '% ':
try {
Sb.append ((char) integer.parseint (
S.substring (i+1,i+3), 16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException ();
}
i + 2;
Break
Default
Sb.append (c);
Break
}
}
Undo Conversion to external encoding
String result = Sb.tostring ();
byte[] Inputbytes = result.getbytes ("8859_1");
return new String (inputbytes,encoding);
}
This method can specify encoding, and if it is specified as UTF8, it satisfies our needs. For example, use it to parse: "%e4%b8%ad%e6%96%87%e6%b5%8b%e8%af%95" can get the correct Chinese character "Chinese test" Unicode string.
The problem now is that we have to get the client-submitted UrlEncode string. The information submitted by method for form of get can be read in httpservletrequest.getquerystring (), and the information submitted by the form of the Post method can only be read from the ServletInputStream , in fact, when the standard GetParameter method is first invoked, the information submitted by form is read out, and ServletInputStream cannot be read out repeatedly. So we should read and parse the information submitted by form before using the GetParameter method for the first time.
That's what I did, set up a servlet base class, override the service method, read and parse the form submission before calling the parent's service method, and see the following source code:
Package com.hto.servlet;

Import Javax.servlet.http.HttpServletRequest;
Import java.util.*;
/**
* Insert The type ' s description here.
* Creation Date: (2001-2-4-15:43:46)
* @author: Chan Weichun
*/
public class Utf8parameterreader {
Hashtable pairs = new Hashtable ();
/**
* Utf8parameterreader constructor comment.
*/
Public Utf8parameterreader (HttpServletRequest request) throws java.io.ioexception{
Super ();
Parse (request.getquerystring ());
Parse (Request.getreader (). ReadLine ());
}
/**
* Utf8parameterreader constructor comment.
*/
Public Utf8parameterreader (HttpServletRequest request,string encoding) throws java.io.ioexception{
Super ();
Parse (request.getquerystring (), encoding);
Parse (Request.getreader (). ReadLine (), encoding);
}
public static string decode (string s) throws Exception {
StringBuffer sb = new StringBuffer ();
for (int i=0; i<s.length (); i++) {
char C = S.charat (i);
Switch (c) {
Case ' + ':
Sb.append (");
Break
Case '% ':
try {
Sb.append ((char) integer.parseint (
S.substring (i+1,i+3), 16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException ();
}
i + 2;
Break
Default
Sb.append (c);
Break
}
}
Undo Conversion to external encoding
String result = Sb.tostring ();
byte[] Inputbytes = result.getbytes ("8859_1");
return new String (Inputbytes, "UTF8");
}
public static string decode (String s,string encoding) throws Exception {
StringBuffer sb = new StringBuffer ();
for (int i=0; i<s.length (); i++) {
char C = S.charat (i);
Switch (c) {
Case ' + ':
Sb.append (");
Break
Case '% ':
try {
Sb.append ((char) integer.parseint (
S.substring (i+1,i+3), 16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException ();
}
i + 2;
Break
Default
Sb.append (c);
Break
}
}
Undo Conversion to external encoding
String result = Sb.tostring ();
byte[] Inputbytes = result.getbytes ("8859_1");
return new String (inputbytes,encoding);
}
/**
* Insert The method ' s description here.
* Creation Date: (2001-2-4-17:30:59)
* @return java.lang.String
* @param name Java.lang.String
*/
public string GetParameter (string name) {
if (pairs = null | |!pairs.containskey (NAME)) return null;
Return (String) ((ArrayList) pairs.get (name)). Get (0));
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.