UTF-8 matching: In JavaScript, it is simple to determine that a string is Chinese. For example: var str = "PHP programming"; if (/^[\u4e00-\u9fa5]+$/.test (str)) {alert ("This string is all Chinese"); Else{alert ("This string is not all Chinese");} PHP, is the use of ...
UTF-8 match:
In ja vasc ript, it is simple to determine that the string is Chinese. Like what:
var str = "PHP programming";
if (/^[\u4e00-\u9fa5]+$/.test (str)) {
Alert ("The string is all in Chinese");
}
else{
Alert ("This string is not all Chinese");
}
In PHP, the hexadecimal data is represented by \x. Instead, change to the following code:
$STR = "PHP programming";
if (Preg_match ("/^[\x4e00-\x9fa5]+$/" $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}
Seemingly no error, the results of the decision is accurate, but the $str replaced by "programming" two words, the result is still show "the string is not all Chinese", it seems that such a decision is not accurate.
Only regular matching full-width characters can be found on the net: ^[\x80-\xff]*^/
[\u4e00-\u9fa5] can match Chinese but PHP does not support
It might also have something to do with coding, so you need to know about the pattern modifier.
In a "pattern modifier" article, read:
U (PCRE_UTF8)
This modifier enables an additional feature that is incompatible with Perl in a PCRE. The pattern string is treated as UTF-8. This modifier is available under Unix from PHP 4.1.0 and is available under Win32 from PHP 4.2.3.
Example:
Preg_match ('/[\x{2460}-\x{2468}]/u ' $str); Matching inner code Chinese characters
In the way he provided, the code was as follows:
$STR = "PHP programming";
if (Preg_match ("/^[\x{2460}-\x{2468}]+$/u" $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}
Found that this is still the case for the Chinese judge not abnormal.
However, since \x represents the hexadecimal data, why and JS inside the scope of the \X4E00-\X9FA5 is not the same? So I switched to the code below to find out that it was really accurate:
$STR = "PHP programming";
if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u" $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}
I know the final accurate expression of Chinese characters matched with regular expressions in PHP utf-8 encoding--/^[\x{4e00}-\x{9fa5}]+$/u,
Reference to the above article wrote the following section of the test code (copy the following code saved into a. php file)
Input characters (alphanumeric characters underline):
GBK:
Preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/"$str); GB2312 Chinese character alphanumeric underline regular expression