1. character encoding
Let's take a look at the different character encodings.
String s = " Java ABC ";
1. 1 utf-8 encoding
UTF8 code: Chinese Occupation 3 bytes, English occupation 1 of bytes
byte [] bytes = S.getbytes ("Utf-8");
for ( byte B:bytes) {
// convert bytes (converted to int ) with 16 byte display
System. out . Print (Integer. tohexstring (b & 0xff) + " " );
}
system. out .println (); // wrap
output:e6 e8 af 12> be
1.2 gbk encoding
GBK code: Chinese Occupation 2 bytes, English occupation 1 of bytes
byte [] Bytes2 = S.getbytes ("GBK");
for ( byte
System. out . Print (Integer. tohexstring (b & 0xff) + " " );
}
system. out .println ();
output:c4 BD BF ce 2>
1.3 utf-16be encoding
Utf-16be encoding is Java the encoding format, Chinese occupation 2 bytes, English Chinese occupation 2 of bytes
byte [] Bytes3 = S.getbytes ("utf-16be");
for ( byte
System. out . Print (Integer. tohexstring (b & 0xff) + " " );
}
system. out .println ();
output: 8b Fe 0 2> 0 0 43< /c21>
1.4 garbled problem
The reason for the garbled problem is simple, because the encoding and decoding are inconsistent with the character coding. Such as: The above bytes3 byte data is a utf-16be format byte sequence, if decoding the time with utf-8 words, will appear garbled.
String str1 = new String (BYTES3);
System. out . println (str1);
output:aU??
How to solve the garbled problem? The method is also very simple, that is, the encoding and decoding using the same character code.
String str2 = new String (Bytes3, "Utf-16be");
System. out . println (str2);
1.5 text file
* a text file is a sequence of bytes, which can be any encoded sequence of bytes
* If a text file is created on a Chinese machine, the text file only knows the ANSI encoding
Java IO Stream sequence column one: character encoding