The GetBytes () method of the string is a byte array that gets the default encoded format of the system
GetBytes ("Utf-8") gets a byte array in UTF-8 format
Convert string to bytes, various encoding conversion to bytes different, such as UTF-8 each Chinese character to 3bytes, and GBK to 2bytes, so to explain the encoding method, otherwise with the default encoding.
In Java, the GetBytes () method of string is a byte array that is given the default encoding format of the operating system. This means that under different operating systems, the returned things are not the same!
The String.getbytes (string decode) method returns a byte array representation of a string under the encoding, based on the specified decode encoding, such as:
BYTE[]B_GBK = "Medium". GetBytes ("GBK");
byte[] B_utf8 = "Medium". GetBytes ("UTF-8");
byte[] b_iso88591 = "Medium". GetBytes ("iso8859-1");
Returns a byte array in the GBK, UTF-8, and iso8859-1 encodings for the Chinese character, respectively, at which point
The length of the B_GBK is 2,
The length of the B_utf8 is 3,
The length of the b_iso88591 is 1.
In contrast to GetBytes, this "medium" Word can be restored by means of the new String (Byte[],decode),
This new string (Byte[],decode) actually parses byte[] into a string using the specified encoding decode .
String S_GBK = new String (B_GBK, "GBK");
String S_utf8 = new String (B_utf8, "UTF-8");
String s_iso88591 = new String (b_iso88591, "iso8859-1");
by outputting S_GBK, S_utf8, and s_iso88591, you will find that both S_GBK and S_utf8 are "medium", and only s_iso88591 is an unrecognized character (which can be interpreted as garbled), why you cannot restore "in after using ISO8859-1 encoding Character The reason is simple, because the ISO8859-1 encoded table does not contain Chinese characters at all , and of course it cannot pass "medium". GetBytes ("Iso8859-1") to get the correct "medium" in the iso8859-1 of the encoded value, so, Again through the newstring () to restore it is impossible to talk about.
Therefore, when using the String.getbytes (Stringdecode) method to get byte[], it is important to make sure that the code value of the string representation exists in the Decode encoding table, so that the resulting byte[] array can be correctly restored.
Attention:
Sometimes, in order for Chinese characters to accommodate certain special requirements ( such as httpheader requires that their content be iso8859-1 encoded ), it is possible to encode Chinese characters in bytes, such as:
strings_iso88591 = NewString ("Medium". GetBytes ("UTF-8"), "iso8859-1"), so that the resulting s_iso8859-1 string is actually three characters in Iso8859-1, After these characters are passed to the destination, the destination program then Strings_utf8 = newstring (S_iso88591.getbytes ("iso8859-1"), "UTF-8" in the opposite way to get the correct Chinese kanji "medium", This guarantees both compliance with the Agreement and the support of Chinese.
The Chinese coding problem in the String.getbytes () method (goto)