Use regular expressions in PHP to extract Chinese implementation notes, _php tutorial

Source: Internet
Author: User
Tags php regular expression

Use regular expressions in PHP to extract Chinese implementation notes,


Recently, the boss called a data check-up exercise that involves extracting Chinese text from a file containing Chinese text segments and storing it, using PHP development. The middle involves the PHP regular expression Chinese match question, the net collects a big, but also very disorderly does not have a notified son, passes through own code the revision and the examination, first will write down the extract function.

The first thing to note is that the double-byte character encoding problem, here we may also encounter like Korean, Japanese and other coding problems, and Chinese understanding is a meaning.

1. GBK (gb2312/gb18030)
Copy the Code code as follows:
\x00-\xff GBK Double byte encoding range
\x20-\x7f ASCII
\xa1-\xff Chinese gb2312
\x80-\xff Chinese GBK

2. UTF-8 (Unicode)
Copy the Code code as follows:
\U4E00-\U9FA5 (English)
\x3130-\x318f (Korean
\XAC00-\XD7A3 (Korean)
\u0800-\u4e00 (Japanese)

Below the notepad++, we can first test our regular writing errors or not. The first expression I use [\u4e00-\u9fa5]+ to test, + number means more than one

The match character. The result is the same as expected, so is it possible to use the regular in the script?

We test, we use Preg_match_all ('/[\u4e00-\u9fa5]+/', $subject, $matches) call, and then you see a result: compilation Failed:pcre does Not support \l, \l, \n{name}, \u, or \u at offset 2 .... Isn't it a big head?? What is the reason for this?

Looking at a lot of data, you find that u (PCRE_UTF8) is the above PCRE, which is a Perl library, including a Perl-compatible regular expression library. This modifier enables additional features in a PCRE that are incompatible with Perl. The pattern string is treated as UTF-8. This modifier is available under Unix from PHP 4.1.0 and is available under Win32 from PHP 4.2.3. The PHP regular expression is also different in the way hexadecimal data is expressed, in PHP, the hexadecimal data is represented by \x. Here we will optimize the code, the detection function becomes:

Copy the Code code as follows:
Class Storedataadapter extends store{
Private $dsData;
/**
* Data conversion function, call Preg_match_all to match the value according to $pattern, and store the returned result as an array in $matches.
* $matches [0] will contain text that matches the entire pattern, $matches [1] will contain text that matches the sub-pattern in the first captured parenthesis, and so on
* @see Store::d Ata_convert ()
*/
Public Function Data_convert ($pattern, $subject) {
$matches =array ();
if (Preg_match_all ($pattern, $subject, $matches)) {
return $matches [0];
}else
{
return null;
}
}
}

When called, it becomes:

Copy the Code code as follows:
$store =new Storedataadapter ($txtContent);
$match =array ();
$dsName = $store->data_convert ('/[\x7f-\xff]+/', $txtContent);
foreach ($dsName as $val) {
echo $val. "
";
}

The input file is:

, the following is the output file content after extracting the Chinese:

To meet the expected requirements.

http://www.bkjia.com/PHPjc/945718.html www.bkjia.com true http://www.bkjia.com/PHPjc/945718.html techarticle using regular expressions in PHP to extract Chinese implementation notes, recently the boss called a data check for the small exercise, involves extracting from a file containing Chinese text segments and ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.