Java character encoding format detailed

Java character encoding format detailed _java

Last Update:2017-01-19 Source: Internet

Author: User

Tags string format stringbuffer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the preface

When analyzing comparable and comparator, the CompareTo method of the string class was analyzed, and the string was used to store the elements in a char[array, compared to the two string characters that were compared, and the characters were stored with Char, at which point, It suddenly occurred to me that the char inside Java can be stored in Chinese? Later found that it is possible, and this also leads to the Java character encoding format problem.

Second, Java storage format

In Java, the following code obtains various encoding formats for the character ' Zhang '.

Import java.io.UnsupportedEncodingException;
public class Test {public
 static string GetCode (string content, string format) throws Unsupportedencodingexception { c3/>byte[] bytes = content.getbytes (format);
  StringBuffer sb = new StringBuffer ();
  for (int i = 0; i < bytes.length i++) {
   sb.append (integer.tohexstring (bytes[i) & 0xff). toUpperCase () + ""); 
   } return
  
  sb.tostring ();
 }
 public static void Main (string[] args) throws Unsupportedencodingexception {
  System.out.println ("GBK:" + GetCode (" Zhang "," GBK "));
  System.out.println ("gb2312:" + getcode ("Zhang", "gb2312"));
  System.out.println ("iso-8859-1:" + getcode ("Zhang", "iso-8859-1"));
  System.out.println ("Unicode:" + getcode ("Zhang", "Unicode"));
System.out.println ("utf-16:" + getcode ("Zhang", "utf-16"));
  System.out.println ("Utf-8:" + getcode ("Zhang", "Utf-8"));
}

Run Result:

Gbk:d5 C5 
gb2312:d5 C5 
iso-8859-1:3F 
unicode:fe ff 5F 
utf-16:fe FF 5F 20

Description: From the result we can know that the character ' Zhang ' gbk and gb2312 encoding is the same, Unicode and utf-16 encoding the same, but its iso-8859-1, Unicode, UTF-8 encoding are not the same. So, in the JVM, what kind of coded format is the character ' Zhang ' stored in? Let's start our analysis below.

Third, the idea of exploration

1. View the storage format of the. class file Constant Pool

The test code is as follows

public class Test {public
 static void Main (string[] args) {
  String str = "Zhang";  
 }
}

Using Javap-verbose Test.class to decompile, the constant pool is found as follows:

Then use Winhex to open the class file and find that the character ' Zhang ' is stored in the constant pool as follows

Description: The above two pieces can be stored in utf-8 format in the class file.

But is the utf-8 format at run time? Continue our quest for adventure.

2. In the process of a probe

Use the following code

public class Test {public 
 static void Main (string[] args) {
  String str = "Zhang";
  System.out.println (integer.tohexstring (str.codepointat (0)). toUpperCase ());
 }

Run Result:

5f20

Note: Based on the results we know that the JVM is stored in the utf-16 format used at runtime, utf-16 is typically stored in 2 bytes, and 4 bytes are used if a character that is not represented by two bytes is encountered. Then there will be another space to introduce, and we look at the character class source, we will find that the use of utf-16 to encode, from both sides found the answer we want.

3. Can I store the char type in Chinese?

Based on the above exploration, we already know that the characters in the Java class file are encoded in Utf-8, and are encoded in utf-16 when the JVM is running. The character ' Zhang ' can be represented in two bytes, and char is two bytes in Java, so it can be stored.

Iv. Summary

Through the above analysis, we know:

1. Characters are encoded in the UTF-8 format in the class file, and are encoded in the UTF-16 format when the JVM is running.

2. The char type is two bytes and can be used for storing Chinese.

In the process of this call also looked at a lot of information on the character, benefit, and found particularly interesting, then will be shared, so I hereby notice that the next one will be further to introduce the coding and coding in Java problems. Please look forward to

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More