[Turn] UTF-8 Chinese character regular expression, UTF-8 Chinese character regular expression. [Turn] UTF-8 Chinese regular expression, UTF-8 Chinese regular expression original link: blog. csdn. netwide288articledetails30066639 $ str programming; if (! Preg_match (^ [x {4e00 [to] UTF-8 Chinese character regular expression, UTF-8 Chinese character regular expression
Link: http://blog.csdn.net/wide288/article/details/30066639
$ Str = "programming ";
// If (! Preg_match ("/^ [\ x {4e00}-\ x {9fa5} A-Za-z0-9 _] + $/u", $ str) // UTF-8 Chinese characters, letters, numbers, underscores, regular expressions
If (! Preg_match ("/^ [\ x {4e00}-\ x {9fa5}] + $/u", $ str) // UTF-8 Chinese characters, letters, numbers, underscores, regular expressions
{
Echo "the [". $ str. "] you entered contains illegal characters ";
}
Else
{
Echo "the [". $ str. "] you entered is completely legal. pass! ";
}
-----------------------
UTF-8 match:
In javascript, it is very easy to judge that the string is Chinese. For example: var str = "php programming"; if (/^ [\ u4e00-\ u9fa5] + $ /. test (str) {alert ("all strings are Chinese");} else {alert ("Not all strings are Chinese ");}
In php, \ x is used to represent hexadecimal data. Therefore, it is transformed into the following code: $ str = "php programming"; if (preg_match ("/^ [\ x4e00-\ x9fa5] + $/", $ str )) {print ("all strings are Chinese");} else {print ("Not all strings are Chinese");} it seems that no error is reported and the result is correct, however, after changing $ str to "programming", the result still shows that "not all strings are Chinese". it seems that such a judgment is not accurate enough.
Important: Read <精通正则表达式> I found that I made an enhanced explanation for [\ x4e00-\ x9fa5 ].
In php regular expressions, [\ x4e00-\ x9fa5] is actually the concept of character and character Group. \ x {hex} represents a hexadecimal number, note that hex can be 1-2 bits or 4 bits, but if it is 4 bits, braces must be added,
At the same time, if it is a hex greater than x {FF}, it must be used with the u modifier. Otherwise, an error will occur.
Only regular expressions matching full-width characters can be found on the Internet: ^ [\ x80-\ xff] * ^/. brackets [\ u4e00-\ u9fa5] can be added here to match Chinese characters, however, PHP does not support it. why is the range \ x4e00-\ x9fa5 different from the hexadecimal data in \ x? So I switched to the following code and found that it was really accurate: $ str = "php programming "; if (preg_match ("/^ [\ x {4e00}-\ x {9fa5}] + $/u", $ str )) {print ("all strings are Chinese");} else {print ("Not all strings are Chinese ");}
I understand the final correct expression for matching Chinese characters with regular expressions in UTF-8 encoding in php --/^ [\ x {4e00}-\ x {9fa5}] + $/u, refer to the above article and write the following test code (copy the following code and save it. php file)
GBK:
Preg_match ("/^ [". chr (0xa1 ). "-". chr (0xff ). "A-Za-z0-9 _] + $/", $ str); // regular expression of GB2312 Chinese characters, letters, numbers, underscores (_)
Http://blog.csdn.net/wide288/article/details/30066639 $ str = "programming"; // if (! Preg_match ("/^ [\ x {4e00...