Original link: http://blog.csdn.net/wide288/article/details/30066639
$STR = "Programming";
if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression
if (!preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression
{
echo "<font color=red> you entered [". $str. "] Contains illegal characters </font> ";
}
Else
{
echo "<font color=green> you entered [". $str. "] Perfectly legal, through!</font> ";
}
-----------------------
UTF-8 Matching:
In JavaScript, it's easy to tell if a string is Chinese. For example: var str = "PHP programming"; if (/^[\u4e00-\u9fa5]+$/.test (str)) {alert ("The string is all Chinese");} else{alert ("The string is not all Chinese");}
In PHP, hexadecimal data is represented by \x. Then, change to the following code: $STR = "PHP programming"; if (Preg_match ("/^[\x4e00-\x9fa5]+$/", $str)) {print ("The string is all Chinese"),} else {print ("The string is not all Chinese"); it seems to be not an error, the result is correct, but the $ STR changed to "Programming", but the result is "the string is not all Chinese", it seems that the judgment is not accurate.
Important: Check out the < mastery of regular Expressions > Discovery, and make an enhanced explanation for [\X4E00-\X9FA5].
In PHP's regular, [\x4e00-\x9fa5], is actually the concept of character and character group, \x{hex}, the expression of a 16 binary number, it should be noted that hex can be 1-2 bits or 4 bits, but if it is 4 bits must be added braces,
At the same time, if the hex is greater than x{ff}, it must be modified with U connect prompt, otherwise it will be illegal.
Only matches full-width characters can be found on the net: ^[\x80-\xff]*^/, here is no more brackets [\U4E00-\U9FA5] can match Chinese, but PHP does not support However, since \x represents the hexadecimal data, why and JS inside the scope of the provided \x4e00-\ X9fa5 not the same? So I replaced the code below and found that it was really accurate: $STR = "PHP programming"; if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str)) {print ("The string is all Chinese"),} else {print ("The string is not all Chinese");}
I know the final correct expression of UTF-8 encoding in PHP using regular expression to match Chinese characters--/^[\x{4e00}-\x{9fa5}]+$/u, refer to the above article to write the following section of the test code (copy the following code to save as a. php file)
<?php $action = Trim ($_get[' action '); if ($action = = "Sub") {$str = $_post[' dir ']; if (!preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $str))//gb2312 Kanji Alphanumeric underline regular expression if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u ", $STR))// UTF-8 Kanji Alphanumeric underline regular expression {echo ' <font color=red> you enter [". $str."] Contains illegal characters </font> "; } else {echo "<font color=green> you enter [". $str. "] Perfectly legal, through!</font> "; }}?> <form method= "POST" action= "? action=sub" > Input character (number, letter, kanji, underline): <input type= "text" name= "dir" value= " "> <input type=" Submit "value=" > </form>
GBK:
Preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $STR); GB2312 Chinese alphanumeric underline regular expression
"Turn" UTF-8 Chinese character regular expression