To determine if the input contains illegal characters, see the code below
$STR = "Programming",//if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression if (!preg_match ("/^ [\x{4e00}-\x{9fa5}]+$/u ", $str))//utf-8 Kanji Alphanumeric underline regular expression { echo" you entered [". $str."] contain illegal characters "; } else { echo "you entered [". $str. "] Perfectly legal, pass! "; }
-----------------------
UTF-8 Matching:
In JavaScript, it's easy to tell if a string is Chinese.
Like what:
Copy the Code code as follows:
var str = "PHP programming";
if (/^[\u4e00-\u9fa5]+$/.test (str))
{Alert ("The string is all Chinese");
}
else{alert ("The string is not all Chinese");
}
In PHP, hexadecimal data is represented by \x.
Then, change to the following code:
Copy the Code code as follows:
$STR = "PHP programming";
if (Preg_match ("/^[\x4e00-\x9fa5]+$/", $str))
{
Print ("The string is all Chinese");
}
else {print ("The string is not all Chinese");
}
Seemingly no error, the results of the judgment is correct, but the $STR replaced by "programming" two words, the result is still showing "the string is not all Chinese", it seems that this judgment is not accurate.
Important:
Looked up the <精通正则表达式> discovery, for [\x4e00-\x9fa5] This piece of things, make an enhanced explanation for yourself
In PHP's regular, [\x4e00-\x9fa5], is actually the concept of character and character group, \x{hex}, the expression of a 16 binary number, it should be noted that hex can be 1-2 bits or 4 bits, but if it is 4 bits must be added braces,
At the same time, if the hex is greater than x{ff}, it must be modified with U connect prompt, otherwise it will be illegal.
Only matches full-width characters can be found on the net: ^[\x80-\xff]*^/, here is no more brackets [\U4E00-\U9FA5] can match Chinese, but PHP does not support However, since \x represents the hexadecimal data, why and JS inside the scope of the provided \x4e00-\ X9fa5 not the same?
So I replaced the code below and found that it was really accurate:
Copy the Code code as follows:
$STR = "PHP programming";
if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str))
{
Print ("The string is all Chinese");
}
else {print ("The string is not all Chinese");
}
I know the final correct expression of UTF-8 encoding in PHP using regular expression to match Chinese characters--/^[\x{4e00}-\x{9fa5}]+$/u, refer to the above article to write the following section of the test code (copy the following code to save as a. php file)
<?php $action = Trim ($_get[' action '); if ($action = = "Sub") { $str = $_post[' dir ']; if (!preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $str))//gb2312 Kanji alphanumeric underline regular expression if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u ", $str)) //utf-8 Chinese alphanumeric underline regular expression { echo "you entered [". $str. "] contain illegal characters "; } else { echo "you entered [". $str. "] Perfectly legal, through! "; }?
GBK:
Copy the Code code as follows:
Preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $STR); GB2312 Chinese alphanumeric underline regular expression
The above content is PHP in UTF-8 regular expression how to match the whole content of Chinese characters, I hope you like.