Garbled characters appear in PHP Strings matching Chinese characters using regular expressions

Last Update:2018-05-18 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The program running result of {code...} can be viewed in nyaii. comstest. php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript. {Code ...}


  '; $ A = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ B = preg_replace ('/', 'blood', $ a); echo $ B; echo 'added square brackets and the replacement result was garbled.
'; $ C = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ d = preg_replace ('/[10 thousand]/', 'hire', $ ); echo $ d;?>

The preceding program running result can be viewed at http://nyaii.com/s/test.php. Somehow, garbled characters appear after square brackets are added to matching Chinese characters. In the same case, it is normal to execute in javascript.

'Heaven and earth insensitive '. replace (/[Day]/, '') // outputs" Earth insensitive"

Reply content:


  '; $ A = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ B = preg_replace ('/', 'blood', $ a); echo $ B; echo 'added square brackets and the replacement result was garbled.
'; $ C = 'Heaven and Earth are insensitive, take everything as a Dongle'; $ d = preg_replace ('/[10 thousand]/', 'hire', $ ); echo $ d;?>

'Heaven and earth insensitive '. replace (/[Day]/, '') // outputs" Earth insensitive"

Add the UTF8 modifier.

$ D = preg_replace ('/[tens of thousands]/U', 'hour', $ );

For other modifiers, see
Http://php.net/manual/en/reference.pcre.pattern.modifiers.php

The following is a supplement to the questions in the subject comment:

The question about why u modifiers need to be added in [] is actually strictly speaking, you 'd better add u modifiers in both scenarios.

But why [] will lead to garbled Characters? This should be explained at the byte level rather than the character level.

First, we know that PHP Strings are not stored in Unicode. Then let's take a look at this code.

We can get the utf8 hexadecimal code of the word "", which is e4b887. Therefore, when the utf8 modifier is not enabled, the Regular Expression Engine does not regard "" as an independent character, but three bytes of continuous data.

Conclusion:

When there is no [] match, it looks for three consecutive characters with a hexadecimal encoded value of e4 b8 87. In other words, your pattern is actually\xe4\xb8\x87But this continuous character appears in your string, only the "" character can match, so there will be no garbled characters when you replace it. However, if your string may contain four UTF-8 encoded characters, such as emoji, it may cause problems.
When you wrap [] out of, the Regular Expression Engine actually looks[\xe4\xb8\x87]The regular expression can quickly find that it matches any of the three characters, so this time will affect all the Chinese characters.
After you add the utf8 modifier, "" will be treated as an independent character by a regular expression, so this problem will no longer occur.

As for javascript, because it is native unicode for character encoding, each character will be treated as a character rather than being split into bytes, so this problem will not occur.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More