The program running result of {code...} can be viewed in nyaii. comstest. php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript. {Code ...}
'; $ A = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ B = preg_replace ('/', 'blood', $ a); echo $ B; echo 'added square brackets and the replacement result was garbled.
'; $ C = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ d = preg_replace ('/[10 thousand]/', 'hire', $ ); echo $ d;?>
The preceding program running result can be viewed at http://nyaii.com/s/test.php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript.
'Heaven and earth insensitive '. replace (/[Day]/, '') // outputs" Earth insensitive"
Reply content:
'; $ A = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ B = preg_replace ('/', 'blood', $ a); echo $ B; echo 'added square brackets and the replacement result was garbled.
'; $ C = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ d = preg_replace ('/[10 thousand]/', 'hire', $ ); echo $ d;?>
The preceding program running result can be viewed at http://nyaii.com/s/test.php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript.
'Heaven and earth insensitive '. replace (/[Day]/, '') // outputs" Earth insensitive"
Add the UTF8 modifier.
$ D = preg_replace ('/[tens of thousands]/U', 'hour', $ );
For other modifiers, see
Http://php.net/manual/en/reference.pcre.pattern.modifiers.php
The following is a supplement to the questions in the subject comment:
The question about why u modifiers need to be added in [] is actually strictly speaking, you 'd better add u modifiers in both scenarios.
But why [] will lead to garbled Characters? This should be explained at the byte level rather than the character level.
First, we know that PHP Strings are not stored in Unicode. Then let's take a look at this code.
We can get the utf8 hexadecimal code of the word "", which is e4b887.
Therefore, when the utf8 modifier is not enabled, the Regular Expression Engine does not regard "" as an independent character, but three bytes of continuous data.
Conclusion:
When there is no [] match, it looks for three consecutive characters with a hexadecimal encoded value of e4 b8 87. In other words, your pattern is actually\xe4\xb8\x87
But this continuous character appears in your string, only the "" character can match, so there will be no garbled characters when you replace it. However, if your string may contain four UTF-8 encoded characters, such as emoji, it may cause problems.
When you wrap [] out of, the Regular Expression Engine actually looks[\xe4\xb8\x87]
The regular expression can quickly find that it matches any of the three characters, so this time will affect all the Chinese characters.
After you add the utf8 modifier, "" will be treated as an independent character by a regular expression, so this problem will no longer occur.
As for javascript, because it is native unicode for character encoding, each character will be treated as a character rather than being split into bytes, so this problem will not occur.