First, the preface
When analyzing comparable and comparator, the CompareTo method of the string class was analyzed, and the string was used to store the elements in a char[array, compared to the two string characters that were compared, and the characters were stored with Char, at which point, It suddenly occurred to me that the char inside Java can be stored in Chinese? Later found that it is possible, and this also leads to the Java character encoding format problem.
Second, Java storage format
In Java, the following code obtains various encoding formats for the character ' Zhang '.
Import java.io.UnsupportedEncodingException;
public class Test {public
static string GetCode (string content, string format) throws Unsupportedencodingexception { c3/>byte[] bytes = content.getbytes (format);
StringBuffer sb = new StringBuffer ();
for (int i = 0; i < bytes.length i++) {
sb.append (integer.tohexstring (bytes[i) & 0xff). toUpperCase () + "");
} return
sb.tostring ();
}
public static void Main (string[] args) throws Unsupportedencodingexception {
System.out.println ("GBK:" + GetCode (" Zhang "," GBK "));
System.out.println ("gb2312:" + getcode ("Zhang", "gb2312"));
System.out.println ("iso-8859-1:" + getcode ("Zhang", "iso-8859-1"));
System.out.println ("Unicode:" + getcode ("Zhang", "Unicode"));
System.out.println ("utf-16:" + getcode ("Zhang", "utf-16"));
System.out.println ("Utf-8:" + getcode ("Zhang", "Utf-8"));
}
Run Result:
Gbk:d5 C5
gb2312:d5 C5
iso-8859-1:3F
unicode:fe ff 5F
utf-16:fe FF 5F 20
Description: From the result we can know that the character ' Zhang ' gbk and gb2312 encoding is the same, Unicode and utf-16 encoding the same, but its iso-8859-1, Unicode, UTF-8 encoding are not the same. So, in the JVM, what kind of coded format is the character ' Zhang ' stored in? Let's start our analysis below.
Third, the idea of exploration
1. View the storage format of the. class file Constant Pool
The test code is as follows
public class Test {public
static void Main (string[] args) {
String str = "Zhang";
}
}
Using Javap-verbose Test.class to decompile, the constant pool is found as follows:
Then use Winhex to open the class file and find that the character ' Zhang ' is stored in the constant pool as follows
Description: The above two pieces can be stored in utf-8 format in the class file.
But is the utf-8 format at run time? Continue our quest for adventure.
2. In the process of a probe
Use the following code
public class Test {public
static void Main (string[] args) {
String str = "Zhang";
System.out.println (integer.tohexstring (str.codepointat (0)). toUpperCase ());
}
Run Result:
5f20
Note: Based on the results we know that the JVM is stored in the utf-16 format used at runtime, utf-16 is typically stored in 2 bytes, and 4 bytes are used if a character that is not represented by two bytes is encountered. Then there will be another space to introduce, and we look at the character class source, we will find that the use of utf-16 to encode, from both sides found the answer we want.
3. Can I store the char type in Chinese?
Based on the above exploration, we already know that the characters in the Java class file are encoded in Utf-8, and are encoded in utf-16 when the JVM is running. The character ' Zhang ' can be represented in two bytes, and char is two bytes in Java, so it can be stored.
Iv. Summary
Through the above analysis, we know:
1. Characters are encoded in the UTF-8 format in the class file, and are encoded in the UTF-16 format when the JVM is running.
2. The char type is two bytes and can be used for storing Chinese.
In the process of this call also looked at a lot of information on the character, benefit, and found particularly interesting, then will be shared, so I hereby notice that the next one will be further to introduce the coding and coding in Java problems. Please look forward to