Author: bluedoor
Original post address: http://www.anbbs.com/anbbs/index.php? F_id = 3 & page = 1
In the past two days, we are working on a program for keyword highlighted display. The program we have written has been well tested locally, but as soon as we get up, there will be a bunch of garbled code, not to mention highlighted code, I just don't know!
I am looking for an error. I can find it and find that there is no problem with English. Chinese characters are prone to problems. Sometimes there are problems with Chinese characters.
Summary:
For example, preg_match_all ($ pat ,......) And preg_replace ($ pat ,......)......
The problems are as follows:
Preg_match_all ("/(Chinese characters) +/ism", "I am a Chinese character, see what you think of me! ", $ M_a );
This mode is easy to match with Chinese characters ". In this mode, Chinese characters can be matched successfully, but you should not be too happy with the results. The results are uncertain. Why are you not sure about it.
The following problems must occur:
Preg_match_all ("/[Chinese characters] +/ism", "I am a Chinese character. What do you think of me! ", $ M_a );
I wanted to match "Chinese", "Word", or "Chinese character ". This is a problem. If a large group of garbled characters are matched, an endless loop may occur. Why is this happening? Because PHP uses non-UNICODE characters internally and does not support multi-byte text, a "Chinese character" is regarded as 4 bytes ASCII for pattern matching. It is strange that there is no error!
Later, I tried to re-write the pattern match and found a pattern (why? Later) The solution can be:
Preg_match_all ("/(Chinese | word) +/ism", "I am a Chinese character. What do you think of me! ", $ M_a );
In this way, we can match the results in "Chinese", "Word", or "Chinese character", $ m_a.
Array
(
[0] => Array
(
[0] => Chinese Characters
)
[1] => Array
(
[0] => word
)
)
How can I see a fully-matched string! However, I was so happy that I still had problems in actual use! Find the problem again and finally find the root of the problem! PHP does not support multi-byte text, so during pattern matching and character operations, it is performed after the internal code is converted (I don't know if this is correct). For example:
Eregi_replace ("sex", "no", "Sense of Responsibility"); this operation is to replace the character string "Sense of Responsibility" with "no ", what is the final result? Because "Sense of Responsibility" does not mean "nature", the result should be "Sense of Responsibility" If no replacement operation is performed, but the result is "a sense of responsibility "!