Reprinted from: http://zhuhuide2004.iteye.com/blog/562739; reprint please indicate the original author's address;
In Java, the String.getbytes (string decode) method returns a byte array representation of a string under the encoding according to the specified decode encoding, as
byte [] B_GBK = "Medium". GetBytes ("GBK"); byte [] B_utf8 = "Medium". GetBytes ("UTF-8"); byte [] b_iso88591 = "Medium". GetBytes ("iso8859-1");
The byte array in the GBK, UTF-8, and Iso8859-1 encodings is returned for the character "medium" respectively, at which time the length of the B_GBK is 2,b_utf8 and the length of 3,b_iso88591 is 1.
In contrast to GetBytes, the "medium" Word can be restored by means of the new string (byte[], decode), and the new string (byte[], decode) is actually using the encoding specified by decode to byte[] parsed into a string.
New String (B_GBK, "GBK"new string (B_utf8, "UTF-8"new string (b_iso88591, "iso8859-1" );
By printing S_GBK, S_utf8 and s_iso88591, you will find that S_GBK and S_utf8 are "medium", and only s_iso88591 is an unrecognized character, why can't I restore the word "medium" after using ISO8859-1 encoding and then combining it? In fact, the reason is very simple, because iso8859-1 encoded in the encoding table, there is no Chinese characters, of course, can not pass the "medium". GetBytes ("Iso8859-1"), to get the correct "medium" in the iso8859-1 of the encoded value, so again through the new String () to restore it is impossible to talk about.
Therefore, when using the String.getbytes (String decode) method to get byte[], it is important to make sure that the code value of the string representation exists in the Decode encoding table, so that the resulting byte[] array can be correctly restored.
Sometimes, in order for Chinese characters to accommodate certain special requirements (such as HTTP header headers requiring their content to be iso8859-1 encoded), it is possible to encode Chinese characters in bytes, such as
New String ("Medium". GetBytes ("UTF-8"), "iso8859-1"),
The resulting s_iso8859-1 string is actually three characters in the iso8859-1, after passing these characters to the destination, the destination program then passes the opposite way to string S_utf8 = new String (S_iso88591.getbytes (" Iso8859-1 ")," UTF-8 ") to get the correct Chinese kanji" medium ". This guarantees both compliance with the Agreement and the support of Chinese.
java_web___ string transcoding string.getbytes () and new string ()--(GO)