How to match Chinese characters with regular expressions in UTF-8 encoding in php

Source: Internet
Author: User
Tags preg utf 8
Compilationfailed: PCREdoesnotsupport \ L, \ l, \ N {name}, \ U, or \ u // array (& amp; #39; username & amp; #39 ;, & amp; #39; match & amp; #39;, & amp; #39; pattern & amp; #39; & gt Compilation failed: PCRE does not support \ L, \ l, \ N {name}, \ U, or \ u
 
 
 
 
 
// Array ('username', 'match', 'pattern' => '/^ [A-Za-z0-9 _] + $/U', 'message' => Yii :: t ('user', '{attribute} is invalid! ')),
// Array ('username', 'match', 'pattern' => '/^ (?! _)(?!. *? _ $) [A-zA-Z0-9 _ \ u4e00-\ u9fa5] + $/U', 'message' => Yii: t ('user', '{attribute} is invalid! ')),
Array ('username', 'match', 'pattern' => '/^ (?! _)(?!. *? _ $) [A-zA-Z0-9 _ \ x {4e00}-\ x {9fa5}] + $/U', 'message' => Yii: t ('user ', '{attribute} is invalid! ')),
 
Match Chinese characters in regular expressions with \ x {4e00}-\ x {9fa5} instead of \ u4e00-\ u9fa5
 
========================================================== ========================================================== ========================
Php regular expressions
Http://space.itpub.net/16555225/viewspace-497780
Http://www.haogongju.net/art/994560
Regular expression consisting of digits, 26 English letters, underscores, or Chinese characters
Http://hi.baidu.com/liu19871112/blog/item/eae9bd3245c88a5cad4b5fa8.html
 
How to match Chinese characters with regular expressions in UTF-8 encoding in php
Http://hi.baidu.com/comdeng/blog/item/f272362e47ce29564ec226c5.html
In javascript, it is very easy to judge that the string is Chinese. For example:
Var str = "php programming ";
If (/^ [\ u4e00-\ u9fa5] + $/. test (str )){
Alert ("all strings are Chinese ");
} Else {
Alert ("Not all strings are Chinese ");
}
 
If you use php to determine whether the character string is Chinese, you will follow this idea:
$ Str = "php programming ";
If (preg_match ("/^ [\ u4e00-\ u9fa5] + $/", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
?>
 
However, it will soon be discovered that php does not support such expressions and an error is returned:
Warning: preg_match () [function. preg-match]: Compilation failed: PCRE does not support \ L, \ l, \ N, \ U, or \ u at offset 3 in test. php on line 3
 
At the beginning, I checked many times from google and tried to break through the expression of php regular expressions for hexadecimal data. I found that in php, it uses \ x to represent hexadecimal data. Therefore, it is transformed into the following code:
$ Str = "php programming ";
If (preg_match ("/^ [\ x4e00-\ x9fa5] + $/", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
It seems that no error is reported, and the result is correct. However, if you replace $ str with "programming", the result still shows "not all the strings are Chinese ", it seems that such judgment is not accurate enough.
 
Then I ran back to Baidu to search for "php matching utf 8" and found that the article was more matched than google, it seems that Baidu's "Baidu understands Chinese better" is still correct to some extent. In the second article 《★★★Find the regular expressions matching Chinese characters in UTF8, such as online..., and see the following content:
 
Author zhiin (mongojcan program) 15:59:30 asked questions in Web development/PHP
 
Evaluate the regular expression matching Chinese characters in UTF8, excluding full-width characters and special characters!

Only regular expressions matching full-width characters can be found on the Internet: ^ [\ x80-\ xff] * ^/
[\ U4e00-\ u9fa5] can match Chinese characters, but PHP does not support

Depressed .......
 
1 floor PleaseDoTellMeWhy (Allah bless you !) Reply to 16:04:55 score 11
 
Chr (0xa1). '-'. chr (0xff) can match all Chinese, but don't know how it works in UTF-8! Top
On the second floor, zhiin (mongojcan regression) scored 0 at 16:11:34.
 
Even under gb2312, chr (0xa1). '-'. chr (0xff) is incorrect.
It also matches the fullwidth symbol in Top
The third floor xuzuning (nagging) replied to the-16:19:56 score 90
 
Pattern modifier: u
 
I tried these clues one by one and found that, as they said, it may be related to encoding, therefore, you need to know about the pattern modifier-so you can continue searching for Baidu.
 
I learned in an article titled pattern modifier:
U (PCRE_UTF8)
This modifier enables additional features that are not compatible with Perl in a PCRE. The pattern string is treated as a UTF-8. This modifier is available in Unix from PHP 4.1.0 and win32 from PHP 4.2.3.
Example:
Preg_match ('/[\ x {2460}-\ x {2468}]/U', $ str); match Chinese characters with internal codes
 
The code is as follows:
$ Str = "php programming ";
If (preg_match ("/^ [\ x {2460}-\ x {2468}] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
 
I found that this time I was still judging whether it was a Chinese character. However, since \ x represents the hexadecimal data, why is the range \ x4e00-\ x9fa5 different from that provided in js? So I changed to the following code:
$ Str = "php programming ";
If (preg_match ("/^ [\ x4e00-\ x9fa5] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
I thought it was a success. I did not expect that warning once again produced:
Warning: preg_match () [function. preg-match]: Compilation failed: invalid UTF-8 string at offset 6 in test. php on line 3
There seems to be another wrong expression, so I compared the expression in the article and wrapped it in "{" and "}" for "4e00" and "9fa5" respectively, I ran it again and found it was really accurate:
$ Str = "php programming ";
If (preg_match ("/^ [\ x {4e00}-\ x {9fa5}] + $/u", $ str )){
Print ("all strings are Chinese ");
} Else {
Print ("Not all strings are Chinese ");
}
 
I understand the final correct expression for matching Chinese characters with regular expressions in UTF-8 encoding in php --/^ [\ x {4e00}-\ x {9fa5}] + $/u, so I used this expression to search Baidu again and found that someone else had come to such a correct conclusion, but it was difficult to find it through the conventional method, in addition, I found only one article, "delete Chinese characters with regular expressions". it seems that the screening of information correctness on the Internet still needs to be strengthened.
 
Ps: I am not so worried about google. I also searched for it and found another article "common php classes", which is still in Baidu Space.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.