Chinese character replacement and pattern matching in php

Last Update:2018-04-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Php Chinese character replacement and pattern matching problem Original Post address: www. anbbs. comanbbsindex. php? F_id3page1 is working on a keyword highlighted display program over the past two days. the written program also runs well in local testing, but as soon as it reaches the page, a bunch of garbled code appears, let alone highlight it. you just don't have to watch it! I am looking for an error. I am looking for the php Chinese character replacement and pattern matching problem.

Original Post address: http://www.anbbs.com/anbbs/index.php? F_id = 3 & page = 1
In the past two days, we are working on a program for keyword highlighted Display. The program we have written has been well tested locally, but as soon as we get up, there will be a bunch of garbled code, not to mention highlighted code, I just don't know!

I am looking for an error. I can find it and find that there is no problem with English. Chinese characters are prone to problems. sometimes there are problems with Chinese characters.

Summary:

For example, preg_match_all ($ pat ,......) And preg_replace ($ pat ,......)......

The problems are as follows:
Preg_match_all ("/(Chinese characters) +/ism", "I am a Chinese character, see what you think of me! ", $ M_a );
This mode is easy to match with Chinese characters ". In this mode, Chinese characters can be matched successfully, but you should not be too happy with the results. The results are uncertain. why are you not sure about it.

The following problems must occur:
Preg_match_all ("/[Chinese characters] +/ism", "I am a Chinese character. what do you think of me! ", $ M_a );
I wanted to match "Chinese", "word", or "Chinese character ". This is a problem. if a large group of garbled characters are matched, an endless loop may occur. Why is this happening? Because PHP uses non-UNICODE characters internally and does not support multi-byte text, a "Chinese character" is regarded as 4 bytes ASCII for pattern matching. it is strange that there is no error!

Later, I tried to re-write the pattern match and found a pattern (why? Later) the solution can be:
Preg_match_all ("/(Chinese | word) +/ism", "I am a Chinese character. what do you think of me! ", $ M_a );

In this way, we can match the results in "Chinese", "word", or "Chinese character", $ m_a.

Array
(
[0] => Array
(
[0] => Chinese characters
)

[1] => Array
(
[0] => Word
)

)

How can I see a fully-matched string! However, I was so happy that I still had problems in actual use! Find the problem again and finally find the root of the problem! PHP does not support multi-byte text, so during pattern matching and character operations, it is performed after the internal code is converted (I don't know if this is correct). For example:

Eregi_replace ("sex", "no", "sense of responsibility"); this operation is to replace the character string "sense of responsibility" with "no ", what is the final result? Because "sense of responsibility" does not mean "nature", the result should be "sense of responsibility" if no replacement operation is performed, but the result is "a sense of responsibility "!

I did not expect it! Why? Take a look at the ASCII code, you will understand, two ASCII codes, one Chinese character "have a sense of responsibility" ASCII code in sequence: 211,208 (have), 212,240 (responsibility ), 200,206 (ren), 184,208 (sense)

The encoding of "sex" is 208,212 (sex), which is exactly the same as the combination of some 2nd bytes and the 1st bytes of responsibility! So PHP will find the same pattern for matching, split half of the Chinese characters and then combine them with the replaced strings, so there is an error!

At that time, I thought the most commonly used str_replace () should not be a problem, but in fact, str_replace () will also encounter errors when performing the same operation! Now I think it's so lucky to replace Chinese characters before! It may be that the replacement of Chinese characters at that time was a long string of Chinese characters, and it was not easy to see the above situation. Even if there is no problem, you must know that it is not safe!

There are some problems. we need to continue our work and overcome the following difficulties: The current self.

I think of a group of PHP extension modules, Multibyte String Functions, and added many Functions that support multi-byte text operations, such as ereg_replace () corresponding to mb_ereg_replace. For specific function descriptions, see related articles.

Conclusion: for Chinese characters, it is best to use Multibyte String Functions.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Chinese character replacement and pattern matching in php

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Chinese character replacement and pattern matching in php

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support