Previously wrote a post is written in Chinese in Unicode encoding range Unicode Chinese range, but not very detailed, today again study the next Unicode, and give a detailed range of Unicode values.
The Unicode object for this study is the Unicode 5.2.0 version. Now, the latest version is 6.0.
For this study, Unicode divides the code into the following planes (plane in English, which can be thought of as different locations)
Unicode can be logically divided into 17 planes (Plane), each with 65536 (= 216) code points, although only a few planes are currently in use.
Plane 0 (0000–FFFF): Basic Multilingual Plane (Basic multilingual Plane, BMP).
Plane 1 (10000–1FFFF): Multilingual supplemental Plane (supplementary multilingual Plane, SMP).
Plane 2 (20000–2FFFF): ideographs Supplemental Plane (supplementary ideographic Plane, SIP).
Plane 3 (30000–3FFFF): ideographic third plane (tertiary ideographic Plane, TIP).
Flat 4 to 13 (40000–DFFFF) not yet in use
Plane (E0000–EFFFF): Special Purpose supplemental Plane (supplementary special-purpose Plane, SSP)
Flat (F0000–FFFFF) reserved for private use Zone (PUA)
Flat 16 (100000–10FFFF), reserved for private use Zone (PUA)
The most useful of course is the BMP plane 0 encoding from u+0000 to U+FFFF. It contains almost all of the characters commonly used characters.
Encoding interval meanings of Unicode basic plane area
In view of the fact that the original 16-bit space in Unicode is not enough to be applied, starting with the Unicode 3.1 version, 16 extended loadline spaces are established, called the secondary plane,
Increases the amount of space available for Unicode from 60,000 to about 1 million words. The auxiliary plane characters are stored with 4 bytes.
Several large ranges in Unicode
The final summary below:
1, most of the online now for judging Chinese characters is u+4e00. U+9fa5 This range is just "CJK Unified ideographs" This interval, but this is not all, if you want to include all, but also their expansion set, radicals, pictographic characters, inter-note letters and so on;
2E80-A4CF plus F900-faff plus fe30-fe4f
which
2e80-a4cf
Includes Chinese and Japanese radicals, Kangxi radicals, ideographic characters, CJK symbols and punctuation, Japanese hiragana, Japanese katakana, Bopomofo Letter, Proverbs compatible letters, ideographic annotation flags, Bopomofo Letter extensions, Chinese and Japanese strokes, Japanese Katakana speech extensions, Japanese-Korean alphabet and months in a circle, compatibility with Japan and South Korea, CJK Unified Ideographs Extension A, I Ching 64 gua symbol, CJK Unified ideographic Text, Yi Syllable, yi text root
F900-faff
Chinese-Japanese-Korean compatible ideographs
fe30-fe4f
Chinese-Japanese-Korean compatible form
Therefore, the general use of 4E00-9FA5 has been possible, if more extensive, then with the 2E80-A4CF | | F900-faff | | fe30-fe4f
2, full-width ASCII, full-width English punctuation, half-width katakana, half-width hiragana, half-width Korean letter: FF00-FFEF
3, do not care too much about the difference between Simplified Chinese, if you want to make it clear, please refer to Unicode in Simplified Chinese code
See the original: http://www.iteye.com/topic/977671
Reproduced Encoding of Chinese characters in Unicode