Today's interview was asked, "can char in Java save Chinese?" ", I answered some words can not have the word, the results were laughed at, but I also forgot the character encoding of the relevant knowledge so I can not explain. I checked the information in the evening and recorded it.
Online Search This question, the answer exclusively all is can, after all, the random writing line code can clearly prove can:
char c = ' I ';
But the fact is not so simple, Java's internal code is UTF-16, please refer to string encoding (b) to prove that Java's char encoding is UTF-16
Java Char is stored in two bytes, representing the range from ' \u0000 ' to ' \uffff ', that is, from 0 to 65535. In fact, a char cannot represent 65,535 characters, because only u+0000 to u+d7ff and u+e000 to U+FFFF can be used to represent a complete character, these are called BMP, and the other is made up of high surrogate and low surrogate stitching The character represented by 4 bytes.
Therefore , the Java char can only represent the BMP part of the utf-16 character . The extended character set is not represented for CJK (Central CJK Unified ideographs) Section.
For example, in addition to the Ext-a section, char cannot be represented.
Can char in Java actually save Chinese?