PHP Regular Expression Matching Chinese problem analysis,
$str = ' People's Republic of China 123456789ABCDEFG ';
Echo Preg_match ("/^[u4e00-u9fa5_a-za-z0-9]{3,15}$", $strName);
Run the above code and see what the message is?
Warning:preg_match (): compilation Failed:pcre does not support L, L, N, p, p, U, u, or X at offset 3 in F:http://www.hzh Uti.com/nokia/5800/on Line 2
Originally, the following Perl escape sequences are not supported in PHP regular expressions: L, L, N, p, p, U, u, or X
In UTF-8 mode, "x{...}" is allowed, and the content in curly braces is a string that represents a hexadecimal number.
The original hexadecimal escape sequence xhh matches a double-byte UTF-8 character if its value is greater than 127.
So
can solve this problem.
Preg_match ("/^[x80-xff_a-za-z0-9]{3,15}$", $strName);
Preg_match ('/[x{2460}-x{2468}]/u ', $str);
Match the Chinese characters in the code
Test it in the way he provides it, with the following code:
Code to copy code as follows
$STR = "PHP programming";
if (Preg_match ("/^[x{2460}-x{2468}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
Found that this is still a judgment on whether the Chinese is abnormal. However, since x represents the hexadecimal data, why and JS inside the scope provided by X4E00-X9FA5 different? So I replaced the code below:
$STR = "PHP programming";
if (Preg_match ("/^[x4e00-x9fa5]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
The thing that was supposed to succeed, unexpectedly, warning again produced:
Warning:preg_match () [Function.preg-match]: compilation Failed:invalid UTF-8 string at offset 6 inch test.php on line 3
It seems that there is a wrong way of expression, so compared to the expression of the article, "4e00" and "9fa5" on both sides of the "{" and "}" wrapped up, ran again, found that really accurate:
$STR = "PHP programming";
if (Preg_match ("/^[x{4e00}-x{9fa5}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
Know the final correct expression--/^[x{4e00}-x{9fa5}]+$/u of the regular expression matching Chinese characters under Utf-8 encoding in PHP,
Finally summed up
if (Preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." +$/", $str)) {//can only be used in GB2312 cases
if (Preg_match ("/^[x7f-xff]+$/", $str)) {//compatible gb2312,utf-8
echo "Correct input";
} else {
echo "Error input";
}
Double-byte character encoding range
1. GBK (gb2312/gb18030)
X00-xff GBK Double byte encoding range
x20-x7f ASCII
Xa1-xff Chinese gb2312
X80-xff Chinese GBK
2. UTF-8 (Unicode)
U4E00-U9FA5 (English)
x3130-x318f (Korean
XAC00-XD7A3 (Korean)
u0800-u4e00 (Japanese)
http://www.bkjia.com/PHPjc/1110328.html www.bkjia.com true http://www.bkjia.com/PHPjc/1110328.html techarticle PHP Regular Expression matching Chinese problem analysis, $str = ' People's Republic of China 123456789ABCDEFG '; Echo preg_match ("/^[u4e00-u9fa5_a-za-z0-9]{3,15}$", $strName) ; Run the above code ...