- $ Str = '123456789abcdefg of the People's Republic of China ';
- Echo preg_match ("/^ [u4e00-u9fa5_a-zA-Z0-9] {3, 15} $", $ strName );
- ?>
-
Run the above code and the system will prompt: Warning: preg_match (): Compilation failed: PCRE does not support L, l, N, P, p, U, u, or X at offset 3 in F: wwwrootphptest. php on line 2 The reason is: PHP regular expressions do not support the following Perl escape sequences: L, l, N, P, p, U, u, or X In UTF-8 mode, "x {...}" is allowed, and the content in braces is a string that represents a hexadecimal number. The original hexadecimal escape sequence xhh matches a dubyte UTF-8 character if its value is greater than 127. Solution:
- Preg_match ("/^ [x80-xff_a-zA-Z0-9] {} $", $ strName );
- Preg_match ('/[x {2460}-x {2468}]/U', $ str );
Match the Chinese characters in the internal code according to the method provided by him:
- $ Str = "php programming ";
- If (preg_match ("/^ [x {2460}-x {2468}] + $/u", $ str )){
- Print ("all strings are Chinese ");
- } Else {
- Print ("Not all strings are Chinese ");
- }
- ?>
-
In this way, the system still judges whether it is a Chinese character. However, since x represents the hexadecimal data, why is it different from the range x4e00-x9fa5 provided in js? Modify the code to the following:
- $ Str = "php programming ";
- If (preg_match ("/^ [x4e00-x9fa5] + $/u", $ str )){
- Print ("all strings are Chinese ");
- } Else {
- Print ("Not all strings are Chinese ");
- }
- ?>
-
Again, warning produces: Warning: preg_match () [function. preg-match]: Compilation failed: invalid UTF-8 string at offset 6 in test. php on line 3 was modified, and the "4e00" and "9fa5" were respectively wrapped in "{" and "}", respectively, and ran again, and found that this time it was correct:
- $ Str = "php programming ";
- If (preg_match ("/^ [x {4e00}-x {9fa5}] + $/u", $ str )){
- Print ("all strings are Chinese ");
- } Else {
- Print ("Not all strings are Chinese ");
- }
- ?>
-
I understand the final correct expression for matching Chinese characters with regular expressions in UTF-8 encoding in php:/^ [x {4e00}-x {9fa5}] + $/u, Implementation code of the final version:
- // If (preg_match ("/^ [". chr (0xa1 ). "-". chr (0xff ). "] + $/", $ str) {// can only be used in the case of GB2312
- If (preg_match ('// ^ [x7f-xff] + $/", $ str) {// Compatible with gb2312, UTF-8
- Echo "correct input ";
- } Else {
- Echo "incorrect input ";
- }
- ?>
Example 2,
- $ Action = trim ($ _ GET ['action']);
- If ($ action = "sub ")
- {
- $ Str = $ _ POST ['dir'];
- // If (! Preg_match ("/^ [". chr (0xa1 ). "-". chr (0xff ). "A-Za-z0-9 _] + $/", $ str) // GB2312 regular expression of Chinese characters, letters, numbers, underscores (_)
- If (! Preg_match ("/^ [\ x {4e00}-\ x {9fa5} A-Za-z0-9 _] + $/u", $ str) // UTF-8 Chinese characters, letters, numbers, underscores, regular expressions
- {
- Echo "the [". $ str. "] you entered contains illegal characters ";
- }
- Else
- {
- Echo "the [". $ str. "] you entered is completely legal. pass! ";
- }
- }
- ?>
-
Appendix: php dubyte character encoding range 1. GBK (GB2312/GB18030) X00-xff GBK double byte encoding range x20-x7f ASCIIxa1-xff Chinese gb2312x80-xff Chinese gbk2. UTF-8 (Unicode) U4e00-u9fa5 (Chinese) x3130-x318F (Korean xAC00-xD7A3 (Korean) u0800-u4e00 (Japanese)I would like to introduce this to you and hope to help you understand the php regular expression matching method in Chinese. Programmer's house. I wish you all a better learning experience. |