Java is similar to Python, in Java, the encoding of the string is a Java modified Unicode encoding, so see the string in Java, psychology to meditate this thing is a Java modified Unicode encoding.
Packagestring;ImportJava.nio.charset.Charset; Public classUTF82GBK { Public Static voidMain (string[] args)throwsException {//the default encoding for the system is GBKSystem.out.println ("Default charset=" +Charset.defaultcharset ()); String THFJKDS China China China China China China China China China China China China China hfsdkj fjldsajflkdsjaflkdsjalf sfdsfadas '; //idea: First to Unicode, then to GBKString UTF8 =NewString (T.getbytes ("UTF-8")); //equivalent to://String UTF8 = new String (T.getbytes ("UTF-8"), Charset.defaultcharset ());System.out.println (UTF8); String Unicode=NewString (Utf8.getbytes (), "UTF-8"); //equivalent to://String unicode = new String (Utf8.getbytes (Charset.defaultcharset ()), "UTF-8"); System.out.println (Unicode); String GBK=NewString (Unicode.getbytes ("GBK")); //equivalent to://string GBK = new String (Unicode.getbytes ("GBK"), Charset.defaultcharset ()); System.out.println (GBK); }}
PackageCom.mkyong;ImportJava.io.BufferedReader;ImportJava.io.File;ImportJava.io.FileInputStream;ImportJava.io.InputStreamReader; Public classUTF8TOGBK { Public Static voidMain (string[] args)throwsException {File Filedir=NewFile ("/home/user/desktop/unsaved Document 1"); BufferedReader in=NewBufferedReader (NewInputStreamReader (NewFileInputStream (Filedir), "UTF-8")); String str; while(str = in.readline ())! =NULL) {System.out.println (str);//only Unicode encoding inside Java, so STR is Unicode encodedString str2 =NewString (Str.getbytes ("GBK"), "GBK");//str.getbytes ("GBK") is GBK encoded, but STR2 is Unicode encodedSystem.out.println (STR2); } in.close (); }}
The point is that the new String (Xxx.getbytes ("GBK"), "GBK") is what this phrase means, xxx.getbytes ("GBK") gets the array encoding is GBK, so you must tell Java: The array I passed to you is GBK encoded, When you convert to your internal code, remember to do some processing, new string (Xxx.getbytes ("GBK"), "GBK"), the second "GBK" is to tell Java to pass it the GBK encoded string.
New String (Str.getbytes ("UTF-8"), "UTF-8"); // Normal New String (Str.getbytes ("UTF-8"), "GBK"); // not normal, Java built-in encoding->utf8 is converted to Java built -in encoding as GBK encoding
Take a look at what the JDK documentation says.
Public String (byte[] bytes, Charset Charset)
Constructs a new String by decoding the specified array of bytes using the specified charset.
So the question now is, how do I hold GBK encoded in a string?
New String (Str.getbytes ("GBK"), "iso-8859-1"); System.out.println (new String (str3.getbytes ("iso-8859-1"), "GBK"));
Java character transcoding UTF-8 to gbk/gb2312