Chinese character regular .. Friends who are familiar with character sets .. What are the situations for matching Chinese character regular expressions? In each case, how do I write regular expressions? For example, how do I match the ASCII and UNICODE encoding gb2312 & nbsp; gbk & nbsp; big5? Depends on what font library is used for the service? For the matching of UICODE encoding, the regular range provided on the Internet is: [u4e00-u9fa5] but I checked UNICODE encoding table found: Chinese character regular .. Friends who are familiar with character sets ..
What are the situations for matching Chinese character regular expressions?
In each case, how do I write regular expressions?
For example, ASCII and UNICODE encoding
How does gb2312 gbk big5 match? Depends on what font library is used for the service?
For the matching of UICODE encoding, the regular range provided on the Internet is:
[\ U4e00-\ u9fa5]
However, I checked the UNICODE encoding table and found that:
There are Chinese characters from 3220 ..
In addition, is \ x80-\ xff a matching ASCII code?
Click here ..
For more information, see ..
I am very grateful .......
------ Solution --------------------
2E80 ~ 33FFh: Symbol area of China, Japan, and South Korea. Reception of Kangxi Dictionary heads, China-Japan-South Korea auxiliary departments heads, phonetic symbols, Japanese kana, Korean notes, Chinese-Japan-South Korea symbols, punctuation marks, circled or including Rune numbers, months, and Japanese kana combination, unit, year, month, date, and time.
3400 ~ 4 DFFh: Japan and South Korea recognized the expansion of ideographic text area A, a total of 6,582 Chinese and Korean characters.
4E00 ~ 9 FFFh: Japan and South Korea recognized the ideographic text area, a total of 20,902 Chinese and Korean characters.
A000 ~ A4FFh: Yi text area, which contains the texts and roots of Yi people in southern China.
AC00 ~ D7FFh: a combination area of Korean and pinyin. it contains text in Korean notes.
F900 ~ FAFFh: Compatible with ideographic text area, a total of 302 Chinese and Korean characters.
FB00 ~ FFFDh: it is a text expression area that contains a combination of Latin characters, Hebrew characters, Arabic characters, Chinese-Japanese vertices, small characters, halfwidth characters, and fullwidth characters.
For example, to match all Chinese and Korean non-symbolic characters, the regular expression should be ^ [\ u3400-\ u9FFF] + $
Theoretically, I copied a Korean file to msn. co. ko and found that it was not correct.
Copy a 'handler' to msn.co.jp ..
Then, expand the range to ^ [\ u2E80-\ u9FFF] + $. this is all done. This should be the regular expression that matches the Chinese and Japanese characters, including traditional Chinese that we are still using blindly.
The regular expression for Chinese characters should be ^ [\ u4E00-\ u9FFF] + $, which is very close to the ^ [\ u4E00-\ u9FA5] + $
Note that ^ [\ u4E00-\ u9FA5] + $ is a regular expression used to match simplified Chinese characters. In fact, traditional Chinese characters are also in the regular expression, I used the tester to test the 'central People's Republic of Korea 'and also passed the test. of course, ^ [\ u4E00-\ u9FFF] + $ is the same result.
------ Solution --------------------
Mb_ereg_match
------ Solution --------------------
U0000 ASCII.pdf
U0a00133
U0a80133
U0b00133
U0b80133
U0c00133
U0C80.pdf
U0d00133
U0d80133
U0e00133
U0e80133
U0f00133
U1a00133
U1b00133
U1D000.pdf
U1d00133
U1d80133
U1d100133
U1d200133
U1d300133
U1D360.pdf
U1d400133
U1DC0.pdf
U1e00133
U1f00133
U1ff80133
U2A00 extended mathematical symbols
U02B0.pdf
U2b00133
U2c00133
U2c60133
U2C80.pdf
U2d00133
U2d30133
U2d80133
U2e00133
U2e80133
U2f00133
U2f800133
U2FF0.pdf
U2ff80133
U3ff80133
U4DC0.pdf
U4E00 Chinese Region
U4ff80133
U5ff80133
U6ff80133
U07C0.pdf
U7ff80133
U8ff80133
U9ff80133
U10a00133
U10A0.pdf
U10ff80133
U13A0.pdf
U16A0.pdf
U19E0.pdf
U20A0.pdf
U20D0.pdf
U25A0.pdf
U27C0.pdf
U27F0.pdf
U30A0
U31A0.pdf
U31C0.pdf
U31F0.pdf
U0080 Latin sign
U0100133
U103A0.pdf
U0180.pdf
U0250133
U0300133
U0370.pdf
U0400133
U0500133
U0530133
U0590.pdf
U0600133
U0700133
U0750133
U0780133
U0900133
U0980133
U1000.pdf
U1100133
U1200networks
U1380133
U1400133
U1680133
U1700133
U1720133
U1740133
U1760133
U1780133
U1800133
U1900133
U1950133
U1980133
U2000133
U2070.pdf
U2100133
U2150133
U2190 commandid
U2200 mathematical symbols
U2300133
U2400133
U2440133
U2460 number sequence
U2500 runner-up
U2580 cube
U2600133
U2700133
U2800133
U2900133
U2980133
U3000 Chinese standard
U3040 Japanese plain text
U3100 old Chinese Pinyin
U3130 Korean pinyin
U3190.pdf
U3200 digital symbol marker
U3300 unit and time interval
U3400133
U10000networks
U10080133
U10100133
U10140133
U1030020.
U10330133
U10380133
U10400133
U101_success
U10480133
U10800133
U10900133
U120000000
U12400133
U20000.pdf
U00000.pdf
UA000.pdf
UA490.pdf