PHP matching Zhong Wenjing (reprint)!

Source: Internet
Author: User
Tags php programming php regular expression preg utf 8

PHP regular Match Chinese (2011-09-26 10:10:46)

Reprint: http://hi.baidu.com/?_d/blog/item/063b77d5432f8f1aa18bb7fd.html

In JavaScript, it's easy to tell if a string is Chinese. Like what:

var str = "PHP programming";

if (/^[\u4e00-\u9fa5]+$/.test (str)) {

Alert ("The string is all Chinese");

} else {

Alert ("The string is not all Chinese");

}


Take it for granted, in PHP to determine whether the string is Chinese, will follow this idea:

<?php

$STR = "PHP programming";

if (Preg_match ("/^[\u4e00-\u9fa5]+$/", $str)) {

Print ("The string is all Chinese");

} else {

Print ("The string is not all Chinese");

}

?>


However, it will soon be found that PHP does not support such an expression, error:

Warning:preg_match () [Function.preg-match]: compilation Failed:pcre does not support \l, \l, \ n, \u, or \u at offset 3 I n test.php on line 3


Just started to look at Google a lot of times, want to from the PHP regular expression for the hexadecimal data

Breakthrough in expression, found in PHP, is to use \x to represent hexadecimal data. So

Transform it into the following code:

$STR = "PHP programming";

if (Preg_match ("/^[\x4e00-\x9fa5]+$/", $str)) {

Print ("The string is all Chinese");

} else {

Print ("The string is not all Chinese");

}

Seemingly no error, the results of the judgment is correct, but the $STR replaced by "programming" two words, the result is

Or "The string is not all Chinese", it seems that the judgment is not accurate enough.


Later ran back to Baidu search "php matching Chinese character utf 8", found that the article is more than the matching degree of Google is much higher,

It seems Baidu's "Baidu more understand Chinese" is still to a certain extent is correct. In the second article, "★ ¡ï Ask UTF8

The following are some of the following: the regular matching of Chinese characters, online ...


Landlord Zhiin (┈jcan┈) 2006-11-15 15:59:30 in WEB development/PHP Questions


Find the UTF8 of matching Chinese characters, not including full-width characters and special symbols!

Only regular matches for full-width characters can be found online: ^[\x80-\xff]*^/

[\u4e00-\u9fa5] can match Chinese, but PHP does not support

Depressed in .....


1 floor pleasedotellmewhy (Allah bless you!) reply at 2006-11-15 16:04:55 score 11


Chr (0XA1). ‘-‘ . Chr (0xff) can match all Chinese, but do not know how under UTF-8! Top

2 floor Zhiin (┈jcan┈) reply at 2006-11-15 16:11:34 score 0


Even under the gb2312, Chr (0XA1). ‘-‘ . Chr (0xff) also wrong

It also matches the full-width symbol in the top

3 floor xuzuning (nagging) reply at 2006-11-15 16:19:56 score 90


Pattern modifier: U


According to the clues provided, one after another, it is true that, as they say, it may also be related to coding,

So you need to know something about the pattern modifier--and keep searching for Baidu.


In an article in the pattern modifier, I learned that:


U (PCRE_UTF8)


This modifier enables additional features in a PCRE that are incompatible with Perl. The pattern string is treated as UTF-8.

This modifier is available under Unix from PHP 4.1.0 and is available under Win32 from PHP 4.2.3.

Example:

Preg_match ('/[\x{2460}-\x{2468}]/u ', $str); Match the Chinese characters in the code

Test it in the way he provides it, with the following code:


$STR = "PHP programming";

if (Preg_match ("/^[\x{2460}-\x{2468}]+$/u", $str)) {

Print ("The string is all Chinese");

} else {

Print ("The string is not all Chinese");

}


Found that this is still a judgment on whether the Chinese is abnormal. However, since \x represents the hexadecimal data,

Why and JS inside provide scope \x4e00-\x9fa5 not the same? So I replaced the code below:

$STR = "PHP programming";

if (Preg_match ("/^[\x4e00-\x9fa5]+$/u", $str)) {

Print ("The string is all Chinese");

} else {

Print ("The string is not all Chinese");

}

The thing that was supposed to succeed, unexpectedly, warning again produced:

Warning:preg_match () [Function.preg-match]: compilation Failed:invalid UTF-8 string at offset 6 inch test.php on line 3

It seems that there is a wrong way of expression, so against the expression of the article,

To "4e00" and "9fa5" on both sides with "{" and "}" wrapped up, ran again, found that really accurate:

$STR = "PHP programming";

if (Preg_match ("/^[\x{4e00}-\x{9fa5}]+$/u", $str)) {

Print ("The string is all Chinese");

} else {

Print ("The string is not all Chinese");

}


Know the final correct expression--/^[\x{4e00}-\x{9fa5}]+$/u of the regular expression matching Chinese characters under Utf-8 encoding in PHP,

So I used this expression to go to Baidu search, found that there are really others have come to such a correct conclusion, but through

The conventional way is difficult to find, and just search for one article-"using the regular deletion of Chinese characters", it seems that the internet for

The selection of the correctness of information is still to be strengthened urgently.


PS: Google does not give up, also searched for a bit, and found an article "PHP Common Class",

Or in the Baidu space, hehe, interesting!

--------------------------------------------------------------------------------------------------------------- -------------------

Refer to the above article to write the following test code (copy the following code to save the. php file)


<?php

$action = Trim ($_get[' action ');

if ($action = = "Sub")

{

$str = $_post[' dir '];

if (!preg_match ("/^[". Chr (0XA1). " -". Chr (0xff)." a-za-z0-9_]+$/", $str))//gb2312 Chinese character alphanumeric underline regular expression

if (!preg_match ("/^[\x{4e00}-\x{9fa5}a-za-z0-9_]+$/u", $str))//utf-8 Chinese alphanumeric underscore regular expression

{

echo "<font color=red> you entered [". $str. "] Contains illegal characters </font> ";

}

Else

{

echo "<font color=green> you entered [". $str. "] Perfectly legal, through!</font> ";

}

}

?>

<form method= "POST" action= "? Action=sub" >

Input characters (numbers, letters, kanji, underscores):

<input type= "text" name= "dir" value= "" >

<input type= "Submit" value= "Submission" >

</form>

Go


PHP matching Zhong Wenjing (reprint)!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.