Php regular expression matching Chinese problem analysis ,. Php regular expression matching Chinese problem analysis, $ str People's Republic of China 123456789 abcdefg; echopreg_match (^ [u4e00-u9fa5_a-zA-Z0-9] {} $, $ strName ); run the above code php regular expression matching Chinese problem analysis,
$ Str = '123456789abcdefg of the People's Republic of China ';
Echo preg_match ("/^ [u4e00-u9fa5_a-zA-Z0-9] {3, 15} $", $ strName );
Run the above code to see the prompt information?
Warning: preg_match (): Compilation failed: PCRE does not support L, l, N, P, p, U, or X at offset 3 in F: http://www.hzhuti.com/nokia/5800/ on line 2
Originally, PHP regular expressions do not support the following Perl escape sequences: L, l, N, P, p, U, u, or X
In UTF-8 mode, "x {...}" is allowed, and the content in braces is a string that represents a hexadecimal number.
The original hexadecimal escape sequence xhh matches a dubyte UTF-8 character if its value is greater than 127.
So,
It can be solved in this way.
preg_match("/^[x80-xff_a-zA-Z0-9]{3,15}$",$strName);
preg_match('/[x{2460}-x{2468}]/u', $str);
Match Chinese characters with internal code
The code is as follows:
The code is as follows:
$ Str = "php programming ";
If (preg_match ("/^ [x {2460}-x {2468}] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
I found that this time I was still judging whether it was a Chinese character. However, since x represents the hexadecimal data, why is it different from the range x4e00-x9fa5 provided in js? So I changed to the following code:
$ Str = "php programming ";
If (preg_match ("/^ [x4e00-x9fa5] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
I thought it was a success. I did not expect that warning once again produced:
Warning: preg_match () [function. preg-match]: Compilation failed: invalid UTF-8 string at offset 6 in test. php on line 3
There seems to be another wrong expression, so I compared the expression in the article and wrapped it in "{" and "}" for "4e00" and "9fa5" respectively, I ran it again and found it was really accurate:
$ Str = "php programming ";
If (preg_match ("/^ [x {4e00}-x {9fa5}] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
I understand the final correct expression for matching Chinese characters with regular expressions in UTF-8 encoding in php --/^ [x {4e00}-x {9fa5}] + $/u,
Summary:
// If (preg_match ("/^ [". chr (0xa1 ). "-". chr (0xff ). "] + $/", $ str) {// can only be used in the case of GB2312
If (preg_match ('// ^ [x7f-xff] + $/", $ str) {// Compatible with gb2312, UTF-8
Echo "correct input ";
} Else {
Echo "incorrect input ";
}
Double byte character encoding range
1. GBK (GB2312/GB18030)
X00-xff GBK dubyte encoding range
X20-x7f (ASCII)
Xa1-xff Chinese gb2312
X80-xff Chinese gbk
2. UTF-8 (Unicode)
U4e00-u9fa5)
X3130-x318F (Korean
XAC00-xD7A3 (Korean)
U0800-u4e00 (Japanese)
Response, $ str = '123456789abcdefg'; echo preg_match ("/^ [u4e00-u9fa5_a-zA-Z0-9] {} $", $ strName); run the above code...