When faced with a stream of byte throttling, if you do not specify its encoding, its actual meaning is not known.
This sentence should also be in the face of "character to Byte, byte turn character" problem time in mind. Otherwise garbled problems may ensue.
In fact, the nature of the garbled problem is encoding and decoding use is not a code, understand the truth is very good to solve the garbled problem.
The common times in Java are as follows:
1. The string class uses the constructor string (byte[] bytes) of byte[], and the string class provides two overloads at the same time
(1) String (byte[] bytes, Charset Charset)
(2) string (byte[] bytes, string charsetname) is used to specify the encoding.
2. The GetBytes function byte[] getBytes () of the string class also has the following two overloads:
(1) byte[] GetBytes (Charset Charset)
(2) byte[] GetBytes (String charsetname)
All without the need to specify the encoding is the use of the platform ' s default charset, you can use System.getproperty ("file.encoding"), Charset.defaultcharset () obtained.
3. PrintStream's print (string s) is also designed to this problem, and for this PrintStream constructor, in addition to PrintStream (file file) and PrintStream (file file, string Csn
Otherwise the string ' s characters are converted into bytes according to the platform ' s default character encoding,
DataOutputStream constructs do not have a method to specify the encoding, but it provides a writeutf (String str)
The first example illustrates the need to specify the encoding:
If a page specifies that the encoding is Utf-8, <meta http-equiv= "Content-type" content= "text/html"; Charset=utf-8 "/>, there is a form on the page, submitted to a servlet
Then the user entered the character of the word stream is according to the specified encoding encoding, for example, you entered the "Hello hi", if it is utf-8, then passed over is the following:
, we see the back of the Chinese characters each used 3 bytes, this can refer to Utf-8 related knowledge.
But if your page specifies GBK, it will be different:
[104, 101, 108, 108, 111,-60,-29,-70,-61]
So the servlet side, when using Request.getparameter, should be called
string s = new string (bytes, response.getencoding ()), if you response not set the encoding, then the default encoding null will be converted to the Java platform GBK, then the Chinese will become garbled.
So in order to avoid garbled, JSP site generally set a filter, all the pages, Servet are set up a unified code. Response.setencoding, request.setencoding.
Inside a Java string is a char[], char is a utf-16 encoded cell with 16-bit storage. To do this, when you want to convert characters, strings into byte output to a file, a network, or a stream of bytes read from a file or network to a meaningful character, you need to understand what the code is.
Some experience
the 1.String class is always stored in Unicode encoding.
2. Note the use of string.getbytes ():
Without character set parameters, it relies on the JVM's character set encoding, which is generally GBK on Linux unicode,windows. (To change the JVM default character set encoding, use the option-dfile.encodeing=utf-8 when starting the JVM.)
For security reasons, it is recommended to always call with parameters, for example: String s; S.getbytes ("UTF-8").
The 3.Charset class is very handy,
(1) Charset.encode is encoded by encoding a string to output a byte array in the character set encoding format that you specify.
(2) Charset.decode is the decoding, that is, a byte array in your specified character set encoding format for decoding and output to a string.
Examples are as follows:
String s = Charset.defaultcharset (). DisplayName ();
String S1 = "I like you, My Love";
Bytebuffer BB1 = Bytebuffer.wrap (S1.getbytes ("UTF-8"));
For (Byte Bt:bb1.array ()) {
System.out.printf ("%x", BT);
}
Char[] Use
char[] charray={' I ', ' L ', ' o ', ' V ', ' e ', ' You '};
Charbuffer usage
charbuffer cb = Charbuffer.wrap (Charray);
Reposition the pointer
cb.flip ();
String s2= new String (Charray);
Bytebuffer usage
bytebuffer bb2 = Charset.forname ("Utf-8"). Encode (CB);
Bytebuffer Bb3 = Charset.forname ("Utf-8") using Charset encoding for the specified character set. Encode (S1);
byte [] b = Bb3.array ();
Use CharSet to decode the specified character set to the string
bytebuffer bb4= bytebuffer.wrap (b);
String s2 = charset.forname ("Utf-8"). Decode (Bb4). toString ();