Analysis on the problems of PHP regular expression matching in Chinese

Source: Internet
Author: User
Tags php regular expression
Analysis of PHP Regular expression matching Chinese problem
$str = ' People's Republic of China 123456789ABCDEFG ';
Echo Preg_match ("/^[u4e00-u9fa5_a-za-z0-9]{3,15}$", $strName);



Run the above code and see what the message is?

Warning:preg_match (): compilation Failed:pcre does not support L, L, N, p, p, U, u, or X at offset 3 in F:http://www.hzh Uti.com/nokia/5800/on Line 2
Originally, the following Perl escape sequences are not supported in PHP regular expressions: L, L, N, p, p, U, u, or X

In UTF-8 mode, "x{...}" is allowed, and the content in curly braces is a string that represents a hexadecimal number.

The original hexadecimal escape sequence xhh matches a double-byte UTF-8 character if its value is greater than 127.
So
can solve this problem.

Preg_match ("/^[x80-xff_a-za-z0-9]{3,15}$", $strName);


Preg_match ('/[x{2460}-x{2468}]/u ', $str);


Match the Chinese characters in the code
Test it in the way he provides it, with the following code:

Code to copy code as follows

$STR = "PHP programming";
if (Preg_match ("/^[x{2460}-x{2468}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}


Found that this is still a judgment on whether the Chinese is abnormal. However, since x represents the hexadecimal data, why and JS inside the scope provided by X4E00-X9FA5 different? So I replaced the code below:

$STR = "PHP programming";
if (Preg_match ("/^[x4e00-x9fa5]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}


The thing that was supposed to succeed, unexpectedly, warning again produced:
Warning:preg_match () [Function.preg-match]: compilation Failed:invalid UTF-8 string at offset 6 inch test.php on line 3

It seems that there is a wrong way of expression, so compared to the expression of the article, "4e00" and "9fa5" on both sides of the "{" and "}" wrapped up, ran again, found that really accurate:

$STR = "PHP programming";
if (Preg_match ("/^[x{4e00}-x{9fa5}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}


Know the final correct expression--/^[x{4e00}-x{9fa5}]+$/u of the regular expression matching Chinese characters under Utf-8 encoding in PHP,

Finally summed up

if (Preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." +$/", $str)) {//can only be used in GB2312 cases
if (Preg_match ("/^[x7f-xff]+$/", $str)) {//compatible gb2312,utf-8
echo "Correct input";
} else {
echo "Error input";
}


Double-byte character encoding range

1. GBK (gb2312/gb18030)
X00-xff GBK Double byte encoding range
x20-x7f ASCII
Xa1-xff Chinese gb2312
X80-xff Chinese GBK

2. UTF-8 (Unicode)

U4E00-U9FA5 (English)
x3130-x318f (Korean
XAC00-XD7A3 (Korean)
u0800-u4e00 (Japanese)

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.