Use java String class getBytes (String charsetName) and String (byte [] bytes, String charsetName) to solve the garbled problem, string. getbytes
How to store String data in Java, you can see from the source code that the String data is stored inChar [] valueIn such a member variable, the size of the char type is 2 bytes in java.
We also know that the unicode version that is currently commonly used is the UCS-2, that is, using 2 bytes to represent the unicode version of a character, which is correct, java uses the UCS-2 standard, so, values in String are stored in numbers.
For example, if your unicode code is 4f60, check the following test code.
Char c = 'you'; System. out. println (Integer. toHexString (c); System. out. println (Integer. valueOf (c); System. out. println (c );
The result is:
4f60
20320
You
So now we know that the String is actually stored in the unicode code without any encoding, that is, the encoding of the corresponding character. Then let's look at these two methods:
GetBytes (charsetname)
This encoding is used to obtain the byte array.
What does this mean?
This means to convert the unicode encoding in the memory to the byte array corresponding to the charsetname format.
For example, if you convert to UTF-8, the obtained byte array contains three bytes.
[E4 bd a0]
What about String (bytes, charsetname )?
The bytes byte array is interpreted as charsetname and assembled into a String for storage.
For example, if the byte array [e4 bd a0] is interpreted as UTF-8, it is stored as "you". If it is interpreted as another encoding, it will not be interpreted as "you"
To put it another question, why do the processing parameters in the servlet usually need such a sentence to control the encoding:
String str = new String(param.getBytes(“ISO-8859-1”),”UTF-8”);
In fact, this is a good understanding, the browser passed the byte data is UTF-8 encoding, and then the web Container default this byte data is ISO-8859-1 encoding, so the use of ISO-8859-1 to convert this byte data into a String to store, equivalent to the following operation:
String s = new String(UTF8Bytes,”ISO-8859-1”);
Note that this encoding is single-byte, that is, converting every byte into unicode encoding. Fortunately, this gives us the opportunity to convert this String into an identical array of bytes, that's why we have the most frequently used encoding code.
Finally, let's talk about the reason we don't know about encoding. What we need to know is:
Unicode encoding used by the java internal storage string
We often hear people say, "I need to convert the String from the ISO-8859-1 to GBK encoding", what is the problem? In fact, we do not want to "convert a String encoded by a ISO-8859-1 to a String encoded by GBK", it is repeated that the String in JAVA is unicode encoded, therefore, there is no such statement as "ISO-8859-1 encoded String" or "GBK encoded String. The only reason for conversion is that the String is incorrectly encoded. We often encounter the need to convert from ISO-8859-1 to such as GBK/UTF-8. The so-called conversion process is: String-> byte []-> String