PHP regular expressions that perfectly match Chinese

Source: Internet
Author: User
Perfect PHP regular expressions that match Chinese
1. General use of meta-characters matching Chinese,/.*?/s, can match a section of Chinese, which in the ANSI (GB2312) and UTF-8 Environment of the program code can be implemented. But to remind you, \w can't match Chinese. Once in a "proficient in regular expression" (people's post and Post Publishing, Jin Sha) book see can use \w match Chinese, here to correct the use of PHP not. You can use "/./", "/[^\d]/", "/[^a]/", and match Chinese characters.

2. If you want to match Chinese accurately, that is, to match the plain characters, or match Chinese characters plus full-width punctuation, you need to use different methods depending on the encoding environment. The following are described in two commonly used encodings (GB2312,UTF-8):

In an ANSI (gb2312) environment, you can use [Chr (0XNN)-CHR (0XMM)] to match, such as in a Web text to provide such a method, "/[". Chr (0xb0). -". Chr (0xf7)." +/", this can be used, but this is too general, this expression is to match all the characters of the GB2312 encoding table, including Chinese characters, punctuation, Japanese hiragana, and some do not know what the symbol. From the coding table, we can see that the encoding range of Chinese characters is 0xb0a1-0xf7fe, and gb2312 is encoded with two bytes, the highest bit of each byte is 1. So it is possible to write a regular formula that matches the Chinese characters simply:

"/([". Chr (0xb0). " -". Chr (0xf7)." [". chr (0XA1)." -". Chr (0xFE).") /", the expression can match a Chinese character, and the number relationship can be easily expanded.

And extrapolate, if you want to match full-width punctuation without matching Chinese, you can write this:

"/([". Chr (0XA1). " -". Chr (0XA3)." [". chr (0XA1)." -". Chr (0xFF).") /"is the symbol that matches the encoding range 0xa1a1-0xa3ff. The others are similar.

3. The following is a description of the Chinese match in the utf-8 environment. Similar to the above, you can also use a Unicode encoding table to determine the Chinese match. As can be seen from the Code table, the Chinese encoding range is 0x4e00-0x9fa5, so the regular formula can be written like this:

"/[\x{4e00}-\x{9fa5}]/u", \x{nnnn} represents the 16 binary form of the character, please check your PHP manual for more information. Pay special attention to the pattern modifier u, which is said in the PHP manual: u(PCRE_UTF8) This modifier enables an additional feature that is incompatible with Perl in a PCRE. The pattern string is treated as UTF-8. This modifier is available under Unix from PHP 4.1.0 and is available under Win32 from PHP 4.2.3. starting from PHP 4.3.5 Check the UTF-8 legitimacy of the mode. This is exactly what is necessary for the correct match. In fact, I would like to remind you that it is utf-8 environment to use metacharacters to match strings preferably with modifier u, which is only experience.

Here are two examples:

(1) in the ANSI programming environment:

$strtest = "Yyg Chinese characters yyg";

$pregstr = "/([". Chr (0xb0). " -". Chr (0xf7)." [". chr (0XA1)." -". Chr (0xFE).") +/i ";

if (Preg_match ($pregstr, $strtest, $matchArray)) {

echo $matchArray [0];

}

Output: Chinese characters

(2) under the UTF-8 programming environment:

$strtest = "Yyg Chinese characters yyg";

$pregstr = "/[\x{4e00}-\x{9fa5}]+/u";

if (Preg_match ($pregstr, $strtest, $matchArray)) {

echo $matchArray [0];

}

Output: Chinese characters

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.