Chinese matching in utf-8 environment
\w match only Chinese, numbers, letters, for the Chinese people, only match the English will often use, see
Matching regular expressions for Chinese characters: [\U4E00-\U9FA5]
Maybe you also need to match double-byte characters, and Chinese is also double-byte characters
Match Double-byte characters (including Chinese characters): [^\x00-\xff]
Note: Can be used to calculate the length of a string (a double-byte character length meter 2,ascii character 1)
in the ANSI (GB2312) environment
Match all gb2312 encoded table characters:/[". chr (0xb0)." -". Chr (0xf7)." +/
Match only Chinese characters without matching full-width punctuation:/(["Chr (0xb0)." -". Chr (0xf7)." [". chr (0XA1)." -". Chr (0xFE)."]) /
The expression can match a Chinese character.
Matches full-width punctuation without matching kanji:/(["Chr (0XA1)." -". Chr (0XA3)." [". chr (0XA1)." -". Chr (0xFF)."]) /
Example
| The code is as follows |
Copy Code |
| 1. Using Preg_match function to match Chinese characters <?php $str = ' asd US CD '; $key = ' #[\x{4e00}-\x{9fa5}] #u '; Preg_match ($key, $str, $res); Print_r ($res); ?> Results: Array ([0]=> me) 2. Use Preg_match function to match Chinese characters (more than 1 consecutive) <?php $str = ' 34353434 US CD '; $key = ' #[\x{4e00}-\x{9fa5}]{1,} #u '; Preg_match ($key, $str, $res); Print_r ($res); ?> Results Array ([0]=> us) 3, improve 1, using Preg_match_all function matching <?php $str = ' 34353434 US CD '; $key = ' #[\x{4e00}-\x{9fa5}] #u '; Preg_match_all ($key, $str, $res); Print_r ($res); ?> Results Array ([0]=>array ([0]=> i [1]=>)) 4, improve 2, using the Preg_match_all function to match Chinese characters (more than 1 consecutive) <?php $str = ' 34353434 US CD '; $key = ' #[\x{4e00}-\x{9fa5}]{1,} #u '; Preg_match_all ($key, $str, $res); Print_r ($res);
?> Results Array ([0]=>array ([0]=> US)) |
As can be seen from the results, the use of [\x4e00-\x9fa5] This regular expression can be matched to Chinese.
The difference between Preg_match or Preg_match_all is that the former match is over (whether or not the match succeeds), and the latter, from the beginning to the end of the string to be matched.