PHP regular Match Chinese (2011-09-26 10:10:46)
Reprint: http://hi.baidu.com/?_d/blog/item/063b77d5432f8f1aa18bb7fd.html
In JavaScript, it's easy to tell if a string is Chinese. Like what:
var str = "PHP programming";
if (/^[\u4e00-\u9fa5]+$/.test (str)) {
Alert ("The string is all Chinese");
} else {
Alert ("The string is not all Chinese");
}
Take it for granted, in PHP to determine whether the string is Chinese, will follow this idea:
<?php
$STR = "PHP programming";
if (Preg_match ("/^[\u4e00-\u9fa5]+$/", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
?>
However, it will soon be found that PHP does not support such an expression, error:
Warning:preg_match () [Function.preg-match]: compilation Failed:pcre does not support \l, \l, \ n, \u, or \u at offset 3 I n test.php on line 3
Just started to look at Google a lot of times, want to from the PHP regular expression for the hexadecimal data
Breakthrough in expression, found in PHP, is to use \x to represent hexadecimal data. So
Transform it into the following code:
$STR = "PHP programming";
if (Preg_match ("/^[\x4e00-\x9fa5]+$/", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
Seemingly no error, the results of the judgment is correct, but the $STR replaced by "programming" two words, the result is
Or "The string is not all Chinese", it seems that the judgment is not accurate enough.
Later ran back to Baidu search "php matching Chinese character utf 8", found that the article is more than the matching degree of Google is much higher,
It seems Baidu's "Baidu more understand Chinese" is still to a certain extent is correct. In the second article, "★ ¡ï Ask UTF8
The following are some of the following: the regular matching of Chinese characters, online ...
Landlord Zhiin (┈jcan┈) 2006-11-15 15:59:30 in WEB development/PHP Questions
Find the UTF8 of matching Chinese characters, not including full-width characters and special symbols!
Only regular matches for full-width characters can be found online: ^[\x80-\xff]*^/
[\u4e00-\u9fa5] can match Chinese, but PHP does not support
Depressed in .....
1 floor pleasedotellmewhy (Allah bless you!) reply at 2006-11-15 16:04:55 score 11
Chr (0XA1). ‘-‘ . Chr (0xff) can match all Chinese, but do not know how under UTF-8! Top
2 floor Zhiin (┈jcan┈) reply at 2006-11-15 16:11:34 score 0
Even under the gb2312, Chr (0XA1). ‘-‘ . Chr (0xff) also wrong
It also matches the full-width symbol in the top
3 floor xuzuning (nagging) reply at 2006-11-15 16:19:56 score 90
Pattern modifier: U
According to the clues provided, one after another, it is true that, as they say, it may also be related to coding,
So you need to know something about the pattern modifier--and keep searching for Baidu.
In an article in the pattern modifier, I learned that:
U (PCRE_UTF8)
This modifier enables additional features in a PCRE that are incompatible with Perl. The pattern string is treated as UTF-8.
This modifier is available under Unix from PHP 4.1.0 and is available under Win32 from PHP 4.2.3.
Example:
Preg_match ('/[\x{2460}-\x{2468}]/u ', $str); Match the Chinese characters in the code
Test it in the way he provides it, with the following code:
$STR = "PHP programming";
if (Preg_match ("/^[\x{2460}-\x{2468}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
Found that this is still a judgment on whether the Chinese is abnormal. However, since \x represents the hexadecimal data,
Why and JS inside provide scope \x4e00-\x9fa5 not the same? So I replaced the code below:
$STR = "PHP programming";
if (Preg_match ("/^[\x4e00-\x9fa5]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
The thing that was supposed to succeed, unexpectedly, warning again produced:
Warning:preg_match () [Function.preg-match]: compilation Failed:invalid UTF-8 string at offset 6 inch test.php on line 3
It seems that there is a wrong way of expression, so against the expression of the article,
To "4e00" and "9fa5" on both sides with "{" and "}" wrapped up, ran again, found that really accurate:
$STR = "PHP programming";
if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str)) {
Print ("The string is all Chinese");
} else {
Print ("The string is not all Chinese");
}
Know the final correct expression--/^[\x{4e00}-\x{9fa5}]+$/u of the regular expression matching Chinese characters under Utf-8 encoding in PHP,
So I used this expression to go to Baidu search, found that there are really others have come to such a correct conclusion, but through
The conventional way is difficult to find, and just search for one article-"using the regular deletion of Chinese characters", it seems that the internet for
The selection of the correctness of information is still to be strengthened urgently.
PS: Google does not give up, also searched for a bit, and found an article "PHP Common Class",
Or in the Baidu space, hehe, interesting!
--------------------------------------------------------------------------------------------------------------- -------------------
Refer to the above article to write the following test code (copy the following code to save the. php file)
<?php
$action = Trim ($_get[' action ');
if ($action = = "Sub")
{
$str = $_post[' dir '];
if (!preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $str))//gb2312 Chinese character alphanumeric underline regular expression
if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression
{
echo "<font color=red> you entered [". $str. "] Contains illegal characters </font> ";
}
Else
{
echo "<font color=green> you entered [". $str. "] Perfectly legal, through!</font> ";
}
}
?>
<form method= "POST" action= "? Action=sub" >
Input characters (numbers, letters, kanji, underscores):
<input type= "text" name= "dir" value= "" >
<input type= "Submit" value= "Submission" >
</form>
Go
PHP matching Zhong Wenjing (reprint)!