Php Chinese Character replacement and pattern matching problems !! This is a must-have guide!

Source: Internet
Author: User

Author: bluedoor
Original post address: http://www.anbbs.com/anbbs/index.php? F_id = 3 & page = 1
In the past two days, we are working on a program for keyword highlighted display. The program we have written has been well tested locally, but as soon as we get up, there will be a bunch of garbled code, not to mention highlighted code, I just don't know!
I am looking for an error. I can find it and find that there is no problem with English. Chinese characters are prone to problems. Sometimes there are problems with Chinese characters.
Summary:
For example, preg_match_all ($ pat ,......) And preg_replace ($ pat ,......)......
The problems are as follows:
Preg_match_all ("/(Chinese characters) +/ism", "I am a Chinese character, see what you think of me! ", $ M_a );
This mode is easy to match with Chinese characters ". In this mode, Chinese characters can be matched successfully, but you should not be too happy with the results. The results are uncertain. Why are you not sure about it.
The following problems must occur:
Preg_match_all ("/[Chinese characters] +/ism", "I am a Chinese character. What do you think of me! ", $ M_a );
I wanted to match "Chinese", "Word", or "Chinese character ". This is a problem. If a large group of garbled characters are matched, an endless loop may occur. Why is this happening? Because PHP uses non-UNICODE characters internally and does not support multi-byte text, a "Chinese character" is regarded as 4 bytes ASCII for pattern matching. It is strange that there is no error!
Later, I tried to re-write the pattern match and found a pattern (why? Later) The solution can be:
Preg_match_all ("/(Chinese | word) +/ism", "I am a Chinese character. What do you think of me! ", $ M_a );
In this way, we can match the results in "Chinese", "Word", or "Chinese character", $ m_a.
Array
(
[0] => Array
(
[0] => Chinese Characters
)
[1] => Array
(
[0] => word
)
)
How can I see a fully-matched string! However, I was so happy that I still had problems in actual use! Find the problem again and finally find the root of the problem! PHP does not support multi-byte text, so during pattern matching and character operations, it is performed after the internal code is converted (I don't know if this is correct). For example:
Eregi_replace ("sex", "no", "Sense of Responsibility"); this operation is to replace the character string "Sense of Responsibility" with "no ", what is the final result? Because "Sense of Responsibility" does not mean "nature", the result should be "Sense of Responsibility" If no replacement operation is performed, but the result is "a sense of responsibility "!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.