How to match Chinese characters _ regular expressions with regular expressions in utf-8 coding in PHP

Source: Internet
Author: User
Tags modifier php programming php regular expression preg utf 8
In JavaScript, it's easy to tell if a string is Chinese. Like what:
Copy Code code as follows:

var str = "PHP programming";
if (/^[\u4e00-\u9fa5]+$/.test (str)) {
Alert ("The string is all in Chinese");
} else {
Alert ("This string is not all Chinese");
}

Taken for granted, in PHP to determine whether the string is Chinese, it will follow this idea:
Copy Code code as follows:

<?php
$STR = "PHP programming";
if (Preg_match ("/^[\u4e00-\u9fa5]+$/", $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}
?>

However, it will soon be found that PHP does not support this expression, the error:
Warning:preg_match () [Function.preg-match]: compilation Failed:pcre does not support \l, \l, \ n, \u, or \u at offset 3 I n test.php on line 3
Just started looking at Google a lot of times, want to from the PHP regular expression for hexadecimal data expression way breakthrough, found in PHP, is using \x to represent hexadecimal data. Instead, change to the following code:
$STR = "PHP programming";
if (Preg_match ("/^[\x4e00-\x9fa5]+$/", $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}
Seemingly no error, judge the result is correct, but the $STR replaced by "programming" two words, the result is still show "the string is not all Chinese", it seems that the judgment is not accurate enough.
Later ran back to Baidu search "PHP matching Chinese characters UTF 8", found that the article is more than Google's matching degree is much higher, it seems that Baidu "more understand Chinese" is still to a certain extent correct. In the second article "★★★ seek UTF8 under the matching Chinese characters, online and so on ..." see the following elements:
Landlord Zhiin (┈jcan┈) 2006-11-15 15:59:30 in WEB development/PHP Questions
To find the UTF8 matching Chinese characters, excluding full-width characters and special symbols!
Only regular matching full-width characters can be found on the net: ^[\x80-\xff]*^/
[\u4e00-\u9fa5] can match Chinese, but PHP does not support
Depressed in ....
1/F pleasedotellmewhy (Allah bless you!) reply to 2006-11-15 16:04:55 score 11
Chr (0XA1). '-' . Chr (0xff) can match all Chinese, but don't know what to do under UTF-8! Top
2/F Zhiin (┈jcan┈) reply to 2006-11-15 16:11:34 score 0
Even under GB2312, Chr (0XA1). '-' . Chr (0xff) is not right
It also matches the full-width symbols in the top.
3/F xuzuning (NAG) back to 2006-11-15 16:19:56 score 90
Pattern modifier: U
After trying each of these clues, and finding out that they are, as they say, probably related to the code, you need to know about the pattern modifier-so keep searching for Baidu.
In a "pattern modifier" article, read:
U (PCRE_UTF8)
This modifier enables an additional feature that is incompatible with Perl in a PCRE. The pattern string is treated as UTF-8. This modifier is available under Unix from PHP 4.1.0 and is available under Win32 from PHP 4.2.3.
Example:
Preg_match ('/[\x{2460}-\x{2468}]/u ', $str); Matching inner code Chinese characters
In the way he provided, the code was as follows:
Copy Code code as follows:

$STR = "PHP programming";
if (Preg_match ("/^[\x{2460}-\x{2468}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}

Find out whether or not to judge the Chinese is still abnormal. However, since \x represents the hexadecimal data, why and JS inside the scope of the \X4E00-\X9FA5 is not the same? So I switched to the bottom code:
Copy Code code as follows:

$STR = "PHP programming";
if (Preg_match ("/^[\x4e00-\x9fa5]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}

Originally thought the thing that definitely succeeds, unexpectedly, warning again produce:
Warning:preg_match () [Function.preg-match]: compilation Failed:invalid UTF-8 string at offset 6 into test.php on line 3
It seems that there is a wrong way of expression, and then contrasted the expression of the article, to "4e00" and "9fa5" on both sides of the "{" and "}" wrapped up, ran again, found really accurate:
Copy Code code as follows:

$STR = "PHP programming";
if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("This string is not all Chinese");
}

Know that PHP in the Utf-8 code with regular expressions to match the final correct expression of Chinese characters--/^[\x{4e00}-\x{9fa5}]+$/u, so I use this expression to Baidu search, found that there is really someone else came up with such a correct conclusion, Just through the usual way is difficult to find, and only found one--"with a positive delete Chinese characters", it seems that the internet on the correctness of the selection of information is still urgent to strengthen.
PS: Google will not give up, but also search for a while, and found an article "PHP commonly used class", or in the Baidu space, hehe, interesting!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.