The basic multilingual plane is an Encoding Area in Unicode. Encoding from u + 0000 to U + FFFF.
Commonly used Chinese characters correspond to this interval.
The relationship is as follows:
3400-4dbf: CJK Unified ideographic extension A (CJK uniied ideographs extension) 4dc0-4dff: Yijing hexagrams symbols) 4e00-9fbf: CJK Unified ideographic symbol (CJK uniied ideographs) E000-F8FF: self-built area (6400 self-built space in total) |
Commonly used words occupy 2 bytes in the Multi-text plane.
Uncommon words occupy 4 bytes in other planes
How can I determine whether a word is commonly used or uncommon?
Unicode reserves a region. The characters in the basic multi-language plane do not map in this region, but other areas do map in this region.
In BMP, the Code Point segments from u + d800 to U + dfff are permanently retained and not mapped to characters.
Apart from BMP, the first two bytes of the four bytes are high bytes, and the last two bytes are low bytes.
The range of the first two bytes is 0xd800 .. 0 xdbff.
The range of the last two bytes is 0xdc00 .. 0 xdfff.
Therefore, you can use the following methods to determine whether a word is uncommon:
Wchar WC [2]; If (WC [0]> = 0xd800 & WC [0] <= 0 xdbff) Uncommon words; Else Common words; |
References:
Http://dict.youdao.com/wiki/%E5%9F%BA%E6%9C%AC%E5%A4%9A%E6%96%87%E7%A7%8D%E5%B9%B3%E9%9D%A2 /#