UTF-8 matching: In javascript, it is very easy to judge that the string is Chinese. For example:
Copy codeThe Code is as follows:
Var str = "php programming ";
If (/^ [\ u4e00-\ u9fa5] + $/. test (str )){
Alert ("all strings are Chinese ");
} Else {
Alert ("Not all strings are Chinese ");
}
// In php, \ x is used to represent hexadecimal data. Therefore, it is transformed into the following code:
$ Str = "php programming ";
If (preg_match ("/^ [\ x4e00-\ x9fa5] + $/", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
It seems that no error is reported, and the result is correct. However, if you replace $ str with "programming", the result still shows "Not all the strings are Chinese ", it seems that such judgment is not accurate enough. Important: I checked the <proficient Regular Expression> and found that for [\ x4e00-\ x9fa5], I made an enhanced explanation of php's regular expression, [\ x4e00-\ x9fa5] is actually the concept of character and character group. \ x {hex} represents a hexadecimal number, note that hex can be 1-2 bits or 4 bits, but if it is 4 bits, it must be enclosed by braces. If it is a hex larger than x {FF, must be used with the u modifier. Otherwise, an error will occur.
Only regular expressions matching the full-width characters can be found on the Internet: ^ [\ x80-\ xff] * ^/. brackets are not added here.
[\ U4e00-\ u9fa5] can match Chinese characters, but PHP does not support
However, since \ x represents the hexadecimal data, why is the range \ x4e00-\ x9fa5 different from that provided in js? So I switched to the following code and found that it was really accurate:
Copy codeThe Code is as follows:
$ Str = "php programming ";
If (preg_match ("/^ [\ x {4e00}-\ x {9fa5}] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
I understand the final correct expression for matching Chinese characters with regular expressions in UTF-8 encoding in php --/^ [\ x {4e00}-\ x {9fa5}] + $/u,
Refer to the above article and write the following test code (copy the following code and save it as a. php file)
Copy codeThe Code is as follows:
<? Php
$ Action = trim ($ _ GET ['action']);
If ($ action = "sub ")
{
$ Str = $ _ POST ['dir'];
// If (! Preg_match ("/^ [". chr (0xa1 ). "-". chr (0xff ). "A-Za-z0-9 _] + $/", $ str) // GB2312 Regular Expression of Chinese characters, letters, numbers, underscores (_)
If (! Preg_match ("/^ [\ x {4e00}-\ x {9fa5} A-Za-z0-9 _] + $/u", $ str) // UTF-8 Chinese characters, letters, numbers, underscores, regular expressions
{
Echo "<font color = red> The [". $ str. "] You entered contains illegal characters </font> ";
}
Else
{
Echo "<font color = green> The [". $ str. "] You entered is completely legal. Pass! </Font> ";
}
}
?>
Copy codeThe Code is as follows:
<Form. method = "POST" action = "? Action = sub ">
Input characters (numbers, letters, Chinese characters, and underlines ):
<Input type = "text" name = "dir" value = "">
<Input type = "submit" value = "submit">
</Form>
GBK: preg_match ("/^ [". chr (0xa1 ). "-". chr (0xff ). "A-Za-z0-9 _] + $/", $ str); // Regular Expression of GB2312 Chinese characters, letters, numbers, underscores.