Java servlet/jsp Multi-language solution (i)

Source: Internet
Author: User
Tags character set contains urlencode client stringbuffer
Js|servlet| Solution for Java servlet/jsp Multi-language solution

Because it has never been believed that Java should be able to mix the display of multiple language bugs, this weekend studied the servlet,
The multi-language JSP display problem, that is, the servlet's multiple character set problem, because I have the concept of character set
It's not very clear, so what you write is not necessarily accurate, so I understand the character set in Java: at run time
, each string object is stored in encoded Unicode code (I think all languages have a corresponding
Encoded, because inside the computer the string is always expressed in code, except that the word in the general computer language
String encoding is platform-dependent, while Java uses platform-independent Unicode.
When Java reads a string from a byte stream, it converts the platform-related byte into a platform-independent UN
Icode string. In the output, Java converts the Unicode string into a platform-dependent byte stream, if a UN
Icode characters do not exist on a platform and will output a '? '. For example: In Chinese windows, Jav
A reads a "GB2312" encoded file (which can be any stream) into memory to construct a string object that will put the GB2
312 encoded text into a Unicode encoded string, if the output of this string will be the Unicode word
String into a byte stream or array of GB2312: "Chinese test"-----> "\U4E2D\U6587\U6D4B\U8BD5"--
---> "Chinese test".
The following routine:
byte[] bytes = new byte[]{(byte) 0xd6, (Byte) 0xd0, (Byte) 0xce, (Byte) 0xc4, (b
Yte) 0xb2, (Byte) 0xe2, (Byte) 0xca, (byte) 0XD4};//GBK encoded "Chinese test"
Java.io.ByteArrayInputStream bin = new Java.io.ByteArrayInputStream (bytes);
Java.io.BufferedReader reader = new Java.io.BufferedReader (new java.io. Inpu
Tstreamreader (Bin, "GBK"));
String msg = Reader.readline ();
SYSTEM.OUT.PRINTLN (msg)
This program is placed in a system (such as a Chinese system) containing the four words "Chinese test", which can be printed correctly
Out of these words. The MSG string contains the correct Unicode encoding for the "Chinese test": "\u4e2d\u6587\u6d4
B\u8bd5 ", when printing to the operating system's default character set, whether it is possible to display the operating system-dependent
Character set, only in the system that supports the corresponding character set, our information can be correctly output, otherwise the obtained will
It would be rubbish.
Let's take a look at the multilingual problem in servlet/jsp. Our goal is to have a guest in any country
The user sends the information to the server through the form, the server stores the information in the database, and the client can still
Enough to see the correct information you sent. In fact, we want to make sure that the time that is saved in the SQL statement in the final server
Contains the correct Unicode encoding of the client-sent text, and the encoding used to communicate with the database can include DBC
The text message sent by the user, in fact, it is best to let JDBC directly use Unicode/utf8 to communicate with the database! Such
To ensure that no information is lost, and that the server sends information to the client with a coded square that does not lose information
, or it can be unicode/utf8.
If you do not specify the Enctype property of the form, the form will input the content according to the encoding character set U of the current page
Rlencode is then submitted and the server gets the urlencoding string. Urlencodi after coding
The NG string is related to the encoding of the page, such as the GB2312-encoded page submission "Chinese test" and gets the "%d6
%d0%ce%c4%b2%e2%ca%d4 ", each"% "followed by a 16-binary string; UTF8 encoded
It's "%e4%b8%ad%e6%96%87%e6%b5%8b%e8%af%95," because one of the characters in the GB2312 code is 16 digits.
, and one of the characters in UTF8 is 24-digit. China, Japan and South Korea ie4 above browsers are supported UTF8 encoding, this side
The case must be inclusive of these three languages, so if we let HTML pages use UTF8 encoding then we can at least support
These three languages.
However, if we html/jsp the page using UTF8 encoding, because the application server may not know this
Situation, because if the browser sends a message that does not contain charset information, at most server knows to read the Accept-la
Nguage Request a bid, we know that only this bid is not informed by the browser code, so the application
The ordered server does not parse the submitted content correctly, why? Because all the strings in Java are Unicode16 bits
Encoded, Httpservletrequest.request (String) is the function of the client submitted UrlEncode
Code to a Unicode string, some servers can only assume that the client's encoding and server platform are the same, Jane
To decode directly using the Urldecoder.decode (String) method if the client code is exactly the same as the server
, you can get the correct string, otherwise, if the submission string contains local characters, then the
can lead to spam information.
In my solution, I have already specified the use of UTF8 encoding, so I can avoid this problem
, we can customize the Decode method ourselves:
public static string decode (String s,string encoding) throws Exception {
StringBuffer sb = new StringBuffer ();
for (int i=0; i<s.length (); i++) {
char C = S.charat (i);
Switch (c) {
Case ' + ':
Sb.append (");
Break
Case '% ':
try {
Sb.append ((char) integer.parseint (
S.substring (i+1,i+3), 16));
}
catch (NumberFormatException e) {
throw new IllegalArgumentException ();
}
i + 2;
Break
Default
Sb.append (c);
Break
}
}
Undo Conversion to external encoding
String result = Sb.tostring ();
byte[] Inputbytes = result.getbytes ("8859_1");
return new String (inputbytes,encoding);
}
This method can specify encoding, and if it is specified as UTF8, it satisfies our needs. Like using it
Analysis: "%e4%b8%ad%e6%96%87%e6%b5%8b%e8%af%95" can get the correct Chinese character "Chinese test"
A Unicode string.
The problem now is that we have to get the client-submitted UrlEncode string. For the FO
RM submits information that can be read in the Httpservletrequest.getquerystring () method, and for the post
The form of the law submits information that can only be read from the ServletInputStream, in fact the standard getparameter
When the method is first invoked, the information submitted by form is read out, and ServletInputStream is not
Read it over and over again. So we should read and parse the information submitted by form before using the GetParameter method for the first time.




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.