How the data of string in Java is stored, the source code can be seen that the string data is stored in a member variable such as char[] value , the size of the char type is 2 bytes in Java
We also know that the Unicode version that is commonly used now is UCS-2, which is to use 2 bytes to represent the Unicode version of a character, which is right, Java is using the UCS-2 standard, so the value in string is stored in a number
For example, ' You ' Unicode encoding is 4F60, see the test code below
char c = ‘你‘;System.out.println(Integer.toHexString(c));System.out.println(Integer.valueOf(c));System.out.println(c);
The result is:
4f60
20320
You
So, now we know that the inside string is actually stored without any encoded Unicode encoding, that is, the corresponding character encoding, and then look at our two methods:
GetBytes (CharsetName)
It means to get a byte array based on this code.
What does that mean?
That is, converting the in-memory Unicode encoding to a byte array corresponding to the CharsetName format
Like ' You ', the conversion to Utf-8 is three words, so the resulting byte array is three bytes
i.e. [E4 BD A0]
And then string (bytes,charsetname)?
It means to bytes this byte array in accordance with CharsetName, and assemble it as a string to save it.
For example, the above byte array [e4 BD A0], according to Utf-8 interpretation, stored is "You" this string, if interpreted according to other codes, will not be interpreted as "you"
Say something else, why it's usually necessary to manipulate the parameters in the servlet to control the encoding:
String str = new String (Param.getbytes ("iso-8859-1"), "UTF-8");
In fact, this is very good understanding, the browser passed the byte data is UTF-8 encoded, and then the Web container default this byte data is iso-8859-1 encoded, so using iso-8859-1 to convert this byte data into a string storage, equivalent to do the following:
string s = new string (utf8bytes, "iso-8859-1");
Note that this code is single-byte, that is, each byte is converted to Unicode encoding, fortunately, so that we have the opportunity to convert the string to sing Woo the same byte array, so that we usually use the most of the code of the coding process
Finally, to say again, the reason for not understanding the code is that we understand the error, we must know:
Unicode encoding used by the Java internal storage string
We usually hear someone say, "I need to convert string from iso-8859-1 to GBK code", what's going on? In fact, we are not going to "convert a string encoded by iso-8859-1 into a GBK encoded string", and it is repeatedly stated that the string in Java is Unicode encoded, so there is no "iso- 8859-1 encoded string "or" GBK encoded string "is said. The only reason for the conversion is that the string was incorrectly encoded. we often encounter the need to convert from iso-8859-1 to such things as gbk/utf-8 and so on. The so-called conversion process is:string–> byte[]–>string
Solve garbled problems with the Java String Class GetBytes (String CharsetName) and string (byte[] bytes, string charsetname)