Char is the underlying type of Java (the original type) and is a character type. Characters in Java are Unicode-encoded, so a Java character occupies 2 bytes, and the content of the character is stored in Unicode code values (binary numbers). The question is, how does the program convert Unicode code values to the program data we want? For example: Chinese characters ' Han ' corresponds to the Unicode code value: 0x6c49. We want the program data to be ' Han ', while the computer stores the code value. How to display the code value ' 0x6c49 ' as ' word ' requires a conversion process. This conversion process requires a translation rule. The translation rules are written in UTF (UCS transformation Format) and now have utf-8,utf-16,utf-32.
Now we usually use the conversion rule is utf-8, and its conversion rules are as follows;
Unicode encoding (hexadecimal)
UTF-8 byte stream (binary)000000-00007f 0xxxxxxx000080-0007ff 110xxxxx 10xxxxxx000800-00ffff 1110xxxx 10xxxxxx 10x XXXXX010000-10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
The Unicode encoding of the word "Han" is 0x6c49. 0x6c49 between 0X0800-0XFFFF, using a 3-byte template: 1110xxxx 10xxxxxx 10xxxxxx. The 0x6c49 is written as binary: 0110 1100 0100 1001, using this bitstream in turn instead of the X in the template, get: 11100110 10110001 10001001, that is, E6 B1 89.
The default conversion rules for Java virtual machines are utf-8;
The relationship between Unicode and UTF in Char in Java