Garbled characters appear in PHP Strings matching Chinese characters using regular expressions

Source: Internet
Author: User
Tags expression engine
The program running result of {code...} can be viewed in nyaii. comstest. php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript. {Code ...}

  '; $ A = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ B = preg_replace ('/', 'blood', $ a); echo $ B; echo 'added square brackets and the replacement result was garbled.
'; $ C = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ d = preg_replace ('/[10 thousand]/', 'hire', $ ); echo $ d;?>

The preceding program running result can be viewed at http://nyaii.com/s/test.php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript.

'Heaven and earth insensitive '. replace (/[Day]/, '') // outputs" Earth insensitive"

Reply content:

  '; $ A = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ B = preg_replace ('/', 'blood', $ a); echo $ B; echo 'added square brackets and the replacement result was garbled.
'; $ C = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ d = preg_replace ('/[10 thousand]/', 'hire', $ ); echo $ d;?>

The preceding program running result can be viewed at http://nyaii.com/s/test.php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript.

'Heaven and earth insensitive '. replace (/[Day]/, '') // outputs" Earth insensitive"

Add the UTF8 modifier.

$ D = preg_replace ('/[tens of thousands]/U', 'hour', $ );

For other modifiers, see
Http://php.net/manual/en/reference.pcre.pattern.modifiers.php

The following is a supplement to the questions in the subject comment:

The question about why u modifiers need to be added in [] is actually strictly speaking, you 'd better add u modifiers in both scenarios.

But why [] will lead to garbled Characters? This should be explained at the byte level rather than the character level.

First, we know that PHP Strings are not stored in Unicode. Then let's take a look at this code.


  

We can get the utf8 hexadecimal code of the word "", which is e4b887.
Therefore, when the utf8 modifier is not enabled, the Regular Expression Engine does not regard "" as an independent character, but three bytes of continuous data.

Conclusion:

  1. When there is no [] match, it looks for three consecutive characters with a hexadecimal encoded value of e4 b8 87. In other words, your pattern is actually\xe4\xb8\x87But this continuous character appears in your string, only the "" character can match, so there will be no garbled characters when you replace it. However, if your string may contain four UTF-8 encoded characters, such as emoji, it may cause problems.

  2. When you wrap [] out of, the Regular Expression Engine actually looks[\xe4\xb8\x87]The regular expression can quickly find that it matches any of the three characters, so this time will affect all the Chinese characters.

  3. After you add the utf8 modifier, "" will be treated as an independent character by a regular expression, so this problem will no longer occur.

As for javascript, because it is native unicode for character encoding, each character will be treated as a character rather than being split into bytes, so this problem will not occur.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.