Encoding range of Chinese and other special characters in Unicode

Source: Internet
Author: User

In programming, regular expressions matching Chinese characters are sometimes used, which can be done with [\ u4e00-\ u9fa5] +. However, this regular expression is not suitable for the general Martian birds, and even the full-angle punctuation marks are not included. For example, the name of a player in the game. Ordinary young people generally use Chinese characters. There are several special characters in the Youth League of literature and art, which can be used in the second generation of Mars. At this time, you need more powerful regular expressions.

In fact, most of the player names in the game are taken from: CJK Unified ideographs, plus some special characters. They are basically covered by [\ u2e80-\ ufe4f] +. According to unicode5.0:

1) Standard CJK text

Http://www.unicode.org/Public/UNIDATA/Unihan.html

2) ASCII, all Chinese Punctuation, half width Katakana, half width hirakana, half width Korean letters: FF00-FFEF

Http://www.unicode.org/charts/PDF/UFF00.pdf

3) CJK radical supplement: 2e80-2eff

Http://www.unicode.org/charts/PDF/U2E80.pdf

4) CJK punctuation: 3000-303f

Http://www.unicode.org/charts/PDF/U3000.pdf

5) CJK strokes: 31c0-31ef

Http://www.unicode.org/charts/PDF/U31C0.pdf

6) Kangxi: 2f00-2fdf

Http://www.unicode.org/charts/PDF/U2F00.pdf

7) Chinese character structure description character: 2ff0-2fff

Http://www.unicode.org/charts/PDF/U2FF0.pdf

8) phonetic symbol: 3100-312f

Http://www.unicode.org/charts/PDF/U3100.pdf

9) phonetic symbols (extensions of South Fujian and Hakka): 31a0-31bf

Http://www.unicode.org/charts/PDF/U31A0.pdf

10) Japanese hirakana: 3040-309f

Http://www.unicode.org/charts/PDF/U3040.pdf

11) Japanese Katakana: 30a0-30ff

Http://www.unicode.org/charts/PDF/U30A0.pdf

12) Japanese Katakana pinyin Extension: 31f0-31ff

Http://www.unicode.org/charts/PDF/U31F0.pdf

13) Korean pinyin: AC00-D7AF

Http://www.unicode.org/charts/PDF/UAC00.pdf

14) Korean letters: 1100-11ff

Http://www.unicode.org/charts/PDF/U1100.pdf

15) compatible letters in Korean: 3130-318f

Http://www.unicode.org/charts/PDF/U3130.pdf

16) taixuan Sutra Symbol: 1d300-1d35f

Http://www.unicode.org/charts/PDF/U1D300.pdf

17) Yijing sixty-14 pictures: 4dc0-4dff

Http://www.unicode.org/charts/PDF/U4DC0.pdf

18) Yi-wen syllable: A000-A48F

Http://www.unicode.org/charts/PDF/UA000.pdf

19) Yi document first: A490-A4CF

Http://www.unicode.org/charts/PDF/UA490.pdf

20) Braille: 2800-28ff

Http://www.unicode.org/charts/PDF/U2800.pdf

21) CJK letter and month: 3200-32ff

Http://www.unicode.org/charts/PDF/U3200.pdf

22) CJK special symbols (date merging): 3300-33ff

Http://www.unicode.org/charts/PDF/U3300.pdf

23) decorative symbols (not for CJK purposes): 2700-27bf

Http://www.unicode.org/charts/PDF/U2700.pdf

24) Miscellaneous symbols (not for CJK purposes): 2600-26ff

Http://www.unicode.org/charts/PDF/U2600.pdf

English vertical punctuation: FE10-FE1F

Http://www.unicode.org/charts/PDF/UFE10.pdf

26) CJK compatible symbols (vertical variants, underscores, comma): FE30-FE4F

Http://www.unicode.org/charts/PDF/UFE30.pdf

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.