This article is reproduced from: http://joedanny.iteye.com/blog/156903
Here are several main non-English character ranges (found on Google ):
2e80 ~ 33ffh: Symbol area of China, Japan, and South Korea. Reception of Kangxi Dictionary heads, China-Japan-South Korea auxiliary departments heads, phonetic symbols, Japanese Kana, Korean Notes, Chinese-Japan-South Korea symbols, punctuation marks, circled or including Rune numbers, months, and Japanese Kana combination, unit, year, month, date, and time. 3400 ~ 4 dffh: Japan and South Korea recognized the expansion of ideographic text area A, a total of 6,582 Chinese and Korean characters. 4e00 ~ 9 fffh: Japan and South Korea recognized the ideographic text area, a total of 20,902 Chinese and Korean characters. A000 ~ A4ffh: Yi text area, which contains the texts and roots of Yi people in southern China. Ac00 ~ D7ffh: A combination area of Korean and pinyin. It contains text in Korean Notes. F900 ~ Faffh: compatible with ideographic text area, a total of 302 Chinese and Korean characters. Fb00 ~ Fffdh: it is a text expression area that contains a combination of Latin characters, Hebrew characters, Arabic characters, Chinese-Japanese vertices, small characters, halfwidth characters, and fullwidth characters.
- For example, if you need to match all Chinese and Korean non-symbolic characters, the regular expression should be ^ [\ u3400-\ u9fff] + $ theoretically correct, but I went to MSN. co. ko casually copied a Korean and found that it was not correct. It was strange that he copied a 'handler' to msn.co.jp ..
- Then, expand the range to ^ [\ u2e80-\ u9fff] + $. This is all done. This should be the regular expression that matches the Chinese and Japanese characters, including traditional Chinese that we are still using blindly.
- The regular expression for Chinese characters should be ^ [\ u4e00-\ u9fff] + $, it is very close to the ^ [\ u4e00-\ u9fa5] + $ that is often mentioned in forums.
- Note that ^ [\ u4e00-\ u9fa5] + $ is a regular expression used to match simplified Chinese characters. In fact, traditional Chinese characters are also in the regular expression, I used the tester to test the 'central People's Republic of Korea 'and also passed the test. Of course, ^ [\ u4e00-\ u9fff] + $ is the same result.