Java character encoding format detailed _java

Source: Internet
Author: User
Tags string format stringbuffer

First, the preface

When analyzing comparable and comparator, the CompareTo method of the string class was analyzed, and the string was used to store the elements in a char[array, compared to the two string characters that were compared, and the characters were stored with Char, at which point, It suddenly occurred to me that the char inside Java can be stored in Chinese? Later found that it is possible, and this also leads to the Java character encoding format problem.

Second, Java storage format

In Java, the following code obtains various encoding formats for the character ' Zhang '.

Import java.io.UnsupportedEncodingException;
public class Test {public
 static string GetCode (string content, string format) throws Unsupportedencodingexception { c3/>byte[] bytes = content.getbytes (format);
  StringBuffer sb = new StringBuffer ();
  for (int i = 0; i < bytes.length i++) {
   sb.append (integer.tohexstring (bytes[i) & 0xff). toUpperCase () + ""); 
   } return
  
  sb.tostring ();
 }
 public static void Main (string[] args) throws Unsupportedencodingexception {
  System.out.println ("GBK:" + GetCode (" Zhang "," GBK "));
  System.out.println ("gb2312:" + getcode ("Zhang", "gb2312"));
  System.out.println ("iso-8859-1:" + getcode ("Zhang", "iso-8859-1"));
  System.out.println ("Unicode:" + getcode ("Zhang", "Unicode"));
System.out.println ("utf-16:" + getcode ("Zhang", "utf-16"));
  System.out.println ("Utf-8:" + getcode ("Zhang", "Utf-8"));
}


Run Result:

Gbk:d5 C5 
gb2312:d5 C5 
iso-8859-1:3F 
unicode:fe ff 5F 
utf-16:fe FF 5F 20 

Description: From the result we can know that the character ' Zhang ' gbk and gb2312 encoding is the same, Unicode and utf-16 encoding the same, but its iso-8859-1, Unicode, UTF-8 encoding are not the same. So, in the JVM, what kind of coded format is the character ' Zhang ' stored in? Let's start our analysis below.

Third, the idea of exploration

1. View the storage format of the. class file Constant Pool

The test code is as follows

public class Test {public
 static void Main (string[] args) {
  String str = "Zhang";  
 }
}

Using Javap-verbose Test.class to decompile, the constant pool is found as follows:

Then use Winhex to open the class file and find that the character ' Zhang ' is stored in the constant pool as follows

Description: The above two pieces can be stored in utf-8 format in the class file.

But is the utf-8 format at run time? Continue our quest for adventure.

2. In the process of a probe

Use the following code

public class Test {public 
 static void Main (string[] args) {
  String str = "Zhang";
  System.out.println (integer.tohexstring (str.codepointat (0)). toUpperCase ());
 }


Run Result:

5f20

Note: Based on the results we know that the JVM is stored in the utf-16 format used at runtime, utf-16 is typically stored in 2 bytes, and 4 bytes are used if a character that is not represented by two bytes is encountered. Then there will be another space to introduce, and we look at the character class source, we will find that the use of utf-16 to encode, from both sides found the answer we want.

3. Can I store the char type in Chinese?

Based on the above exploration, we already know that the characters in the Java class file are encoded in Utf-8, and are encoded in utf-16 when the JVM is running. The character ' Zhang ' can be represented in two bytes, and char is two bytes in Java, so it can be stored.

Iv. Summary

Through the above analysis, we know:

1. Characters are encoded in the UTF-8 format in the class file, and are encoded in the UTF-16 format when the JVM is running.

2. The char type is two bytes and can be used for storing Chinese.

In the process of this call also looked at a lot of information on the character, benefit, and found particularly interesting, then will be shared, so I hereby notice that the next one will be further to introduce the coding and coding in Java problems. Please look forward to

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.