"Turn" UTF-8 Chinese character regular expression, utf-8 Chinese character regular expression
Original link: http://blog.csdn.net/wide288/article/details/30066639
$STR = "Programming";
if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression
if (!preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression
{
echo "You entered [". $str. "] contain illegal characters ";
}
Else
{
echo "You entered [". $str. "] Perfectly legal, through! ";
}
-----------------------
UTF-8 Matching:
In JavaScript, it's easy to tell if a string is Chinese. For example: var str = "PHP programming"; if (/^[\u4e00-\u9fa5]+$/.test (str)) {alert ("The string is all Chinese"),} else{alert ("The string is not all Chinese");}
In PHP, hexadecimal data is represented by \x. Then, change to the following code: $STR = "PHP programming"; if (Preg_match ("/^[\x4e00-\x9fa5]+$/", $str)) {print ("The string is all Chinese"),} else {print ("The string is not all Chinese"); it seems to be not an error, the result is correct, but the $ STR changed to "Programming", but the result is "the string is not all Chinese", it seems that the judgment is not accurate.
Important: Check out the <精通正则表达式> found that, for [\x4e00-\x9fa5] This piece of things, to do a reinforced explanation
In PHP's regular, [\x4e00-\x9fa5], is actually the concept of character and character group, \x{hex}, the expression of a 16 binary number, it should be noted that hex can be 1-2 bits or 4 bits, but if it is 4 bits must be added braces,
At the same time, if the hex is greater than x{ff}, it must be modified with U connect prompt, otherwise it will be illegal.
Only matches full-width characters can be found on the net: ^[\x80-\xff]*^/, here is no more brackets [\U4E00-\U9FA5] can match Chinese, but PHP does not support However, since \x represents the hexadecimal data, why and JS inside the scope of the provided \x4e00-\ X9fa5 not the same? So I replaced the code below and found that it was really accurate: $STR = "PHP programming"; if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str)) {print ("The string is all Chinese"),} else {print ("The string is not all Chinese");}
I know the final correct expression of UTF-8 encoding in PHP using regular expression to match Chinese characters--/^[\x{4e00}-\x{9fa5}]+$/u, refer to the above article to write the following section of the test code (copy the following code to save as a. php file)
GBK:
Preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $STR); GB2312 Chinese alphanumeric underline regular expression
http://www.bkjia.com/PHPjc/1042014.html www.bkjia.com true http://www.bkjia.com/PHPjc/1042014.html techarticle "Turn" UTF-8 Chinese regular expression, utf-8 Chinese regular expression of the original link: http://blog.csdn.net/wide288/article/details/30066639 $str = "programming";//if (!preg _match ("/^[\x{4e00 ...