In Java,The string. getbytes (string decode) method returns the byte array representation of a string based on the specified decode encoding., Such
Byte [] B _gbk = "medium". getbytes ("GBK ");
Byte [] B _utf8 = "medium". getbytes ("UTF-8 ");
Byte [] B _iso88591 = "medium". getbytes ("ISO8859-1 ");
Return "medium" in GBK, UTF-8 and ISO8859-1 encoding of the byte array, the length of B _gbk is 2, B _utf8 is 3, B _iso88591 length is 1.
See the following test classes:
1 package COM. test. j2se; 2 3 Import Java. io. unsupportedencodingexception; 4 5 public class testgetbytes {6 7 public static void main (string [] ARGs) {8 // byte [] B _gbk = "medium ". getbytes ("GBK"); 9 // byte [] B _utf8 = "medium ". getbytes ("UTF-8"); 10 // byte [] B _iso88591 = "medium ". getbytes ("ISO8859-1"); 11 testgetbytes T = new testgetbytes (); 12 try {13 T. printbyte ("medium", "GBK"); 14 t. printbyte ("medium", "UTF-8"); 15 t. printbyte ("medium", "ISO8859-1"); 16} catch (unsupportedencodingexception e) {17 E. printstacktrace (); 18} 19 20} 21/** 22 * string. the getbytes (string decode) method returns the byte array of a string based on 23 * Specified decode encoding, indicating 24 */25 public void printbyte (string content, string decode) throws unsupportedencodingexception {26 byte [] B = content. getbytes (decode); 27 system. out. print (decode + ":"); 28 for (INT I = 0; I <B. length; I ++) {29 system. out. print (B [I] + ""); 30} 31 system. out. print ("the array length is" + B. length); 32 system. out. println (); 33} 34 35}
Background printing:
GBK:-42-48 the array length is 2
UTF-8:-28-72-83 array length is 3
ISO8859-1: 63 array length is 1
In contrast to getbytes, you can use the new string (byte [], decode) method to restore the word "medium,This new string (byte [], decode) uses the encoding specified by decode to parse byte [] into a string.
String s_gbk = new string (B _gbk, "GBK ");
String s_utf8 = new string (B _utf8, "UTF-8 ");
String s_iso88591 = new string (B _iso88591, "ISO8859-1 ");
By Printing s_gbk, s_utf8 and s_iso88591, we will find that s_gbk and s_utf8 are both "medium", and only s_iso88591 is an unknown character, why use ISO8859-1 encoding after combination, the reason why the word "medium" cannot be restored is actually very simple,Because the ISO8859-1 encoding of the encoding table, there is no Chinese characters, of course, can not pass "in ". getbytes ("ISO8859-1"); to get the correct encoding value for the "medium" word in the ISO8859-1So it is impossible to restore it through new string.
Therefore. when getbytes (string decode) method is used to obtain byte [], you must make sure that the encoding table of decode does have the code value represented by string, so that the obtained byte [] array can be restored correctly.
Sometimes, in order to adapt Chinese characters to certain special requirements (for example, the HTTP header requires their content to be iso8859-1-encoded), it may be possible to encode Chinese Characters in bytes, such
String s_iso88591 = new string ("medium ". getbytes ("UTF-8"), "ISO8859-1"), the resulting s_iso8859-1 string is actually three characters in the ISO8859-1, after passing these characters to the destination, the destination program then gets the correct Chinese character "in" through the opposite way string s_utf8 = new string (s_iso88591.getbytes ("ISO8859-1"), "UTF-8 ". In this way, both compliance with the Agreement and Chinese characters are supported.