I have written a post about the Unicode encoding range of Chinese Characters in UNICODE, but it is not very detailed. Today I have studied Unicode again and provided a detailed Unicode value range.
The Unicode object of this study is Unicode 5.2.0. The latest version is 6.0.
Unicode in this study divides the encoding into the following planes (plane in English, which can be considered as a different location)
Unicode can be logically divided into 17 planes. Each plane has 65536 (= 216) code points, although only a few planes are currently used.
Plane 0 (0000-ffff): Basic multi-text plane (Basic multilingual plane, BMP ).
Plane 1 (2017-1ffff): Multi-text complementary plane (SMP ).
Lementary ideographic plane, sip ).
Plane 3 (30000-3ffff): the third plane of ideographic text (tertiary ideographic plane, tip ).
Plane 4 to 13 (40000-dffff) not used
Plane 14 (E0000-EFFFF): Special Purpose supplemental plane (supplementary special-purpose plane, SSP)
Flat 15 (F0000-FFFFF) reserved as private use area (PUA)
Plane 16 (100000-10ffff), reserved as a private use area (PUA)
The most useful of course is that the BMP plane is 0 encoded from u + 0000 to U + FFFF. It contains almost all common characters.
Meaning of the encoding range of the Unicode basic plane area
In view of the fact that the original Unicode 16-bit space is insufficient for application, we have set up 16 extended code spaces starting with Unicode 3.1, called the secondary plane,
Increase the space available for Unicode from over 60 thousand words to about 1 million words. Secondary flat characters must be stored in four bytes.
Several major intervals in Unicode
Summary:
1. Currently, most of the Chinese characters on the Internet are u + 4e00 .. the range of U + 9fa5 is only the range of "unified ideographic texts of China, Japan, and South Korea", but not all of them. If you want to include all of them, they also need their extension set, radical, text, and note letters;
2e80-a4cf plus F900-FAFF plus FE30-FE4F
Where
2e80-a4cf
Including the Chinese and Japanese Ministry of North Korea supplement, Kangxi Ministry, ideographic character descriptor, Chinese and Japanese Dynasty symbols and punctuation, Japanese hirakana, Japanese Katakana, phonetic alphabet, slang compatible letters, text symbol, phonetic alphabet expansion, strokes of the Chinese and Japanese dynasties, Japanese Katakana speech extension, with a circle of letters and months of the Chinese and Japanese dynasties, compatible with the Chinese and Japanese dynasties, and unified ideographic Text of the Chinese and Japanese Dynasties, yi-style syllables and Yi-style roots
F900-FAFF
Compatible with ideographic texts
FE30-FE4F
Compatibility between China and Japan
Therefore, the general use of 4e00-9fa5 has been available, if you want to be wider, then use 2e80-a4cf | F900-FAFF | FE30-FE4F
2, ASCII, full-width Chinese and English Punctuation, half-width Katakana, half-width hirakana, half-width Korean letters: FF00-FFEF
3. Do not care too much about the differences between simplified and Traditional Chinese. If you need to specify Simplified Chinese, refer to simplified Chinese encoding in Unicode.
Reprinted from:
Http://www.iteye.com/topic/977671