The problem of Netizen AINIAA is
The PHP code is as follows
Copy Code code as follows:
$words = "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrsruvwxyz!@#$%^&* () _+-=[]\\,./{}|<>?" "Hello, we";
$otherStr =preg_replace ("/[CHR (128)-CHR (256)]+/is", "", $words);
Echo ' Otherstr: ', $otherStr;
Why would the printed results be:
OTHERSTR:! #$% & {}| "Hello, we
Please ask me what does the regular expression/[CHR (128)-CHR (256)]+/is mean?
If/[CHR (128)-CHR (256)]+/is refers to the ASCII code in 128 to 256 characters, why a-za-z such characters are also replaced, their ASCII code is less than 127.
The most depressing reason is why the ASCII code is in the 0-127 interval "#", "$", "%", "&", "!", "{", "}", "|", "'", "is not replaced????
What's even more amazing is that if you change the regular expression to "/[CHR (128)-CHR]+/s", the result of the output becomes: OTHERSTR:DEFG IJKLMNOPQ stuvwxyz! #$% & {}| "Hello, we
Only the symbol ' I ' in the regular expression is removed, and the result is missing. I can't understand it completely.
I do not know what your views are????
Additional ASCII code Comparison table
(I'm not going to put it on this ASCII chart)
Replies, there is a netizen said no resolution CHR (128) These, and gave a new solution. First of all, the Netizen answers is correct, first do not comment on whether he "know it, and know why", the Netizen did not give the reason for the error.
cfc4n to answer this netizen:
PHP's regular Preg_match function is the Pcre regular engine, the user's code, the Pcre engine processing the regular expression is "/[CHR (128)-CHR (256)]+/is", what is the following is?
In PHP's regular, the boundary character is called the pattern modifier behind it. It tells the engine how to parse and process the regular. Where I modifiers represent case-insensitive. s represents the "dot pattern", which is used to allow the Fu number "." to match the newline character, which only works on the dot ".". In this netizen's question, modifier s does not work.
Find out why:
We are here to analyze the regular expression "[Chr (128)-CHR (256)]+" written by this netizen, how does the Pcre engine of the regular expression explain this regular? First, we need to know that in regular expressions, brackets "[]" represent character groups, the character group except the connector "-" is not a meta character, that is to say, are ordinary characters, of course, if the hyphen appears in the first, or not to identify the range between two characters, are ordinary character bars "-" just. The CHR (128) Here simply identifies the ASCII code as 128 (exactly, the ASCII code is only 0-127, 128 to the other, it should not be called ASCII code.) But in the regular, he still stands for "C, H, R, (, 1, 2, 8,)" (comma is not, only distinguishes readable) these eight characters just. What is the range of the concatenated characters in this regular? It is obvious that the range of connection characters here is "-C", ")" ASCII code is 0x29, that is, decimal, "C" ASCII code is 0x63, that is, decimal 99, then, he this connection character range is ASCII (CHR (41)) to the ASCII The character between Chr (99)). In other words, the user's regular range is "[hr)-C (]", which is Chr (41) to CHR (99) plus HR two letters and the preceding "(".
The first time the user test, there is modifier I, meaning that, is not case-sensitive, then the characters between Chr (41) to CHR (99), and if these characters are case-sensitive, including their case matches. will be replaced with empty. In its second test, the modifier I was removed, and a case-insensitive match was made, because its range was only C, but suddenly, except for the lowercase letter "h", "R", so the test result would be more "defgijklmnopqstuvwxyz". So, his results came up with these differences.
The expression of the user is equivalent to the figure shown below
Solution:
The wrong reason to find out, then, the solution to it?
Let's take a look at the needs of this netizen, his need is to match Unicode (ASCII is only 0-127-bit, after 128, should be called Unicode code) CHR (128) to CHR (255) between the character matching, replaced by empty. In regular expressions, there are two ways to match hexadecimal characters, "\u" and "\x{}", which can only represent a 4-bit hexadecimal value after "\u", while the latter "\x{}" could represent any number of hexadecimal digits (written in curly braces).
So, how do you write this regular expression????
The purpose of the Netizen is Chr (128) to CHR (255), then "[\U0080-\U00FF]" or "[\x{0080}-\x{00ff}]".
The purpose is to match the characters in the red box in the following figure
To remind you that PHP matches Unicode characters, you need to use the U modifier.
According to user needs, the more correct after the PHP code is as follows:
Copy Code code as follows:
$words = "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrsruvwxyz!@#$%^&* () _+-=[]\\,./{}|<>?" "Hello, we";
$otherStr =preg_replace ("//[\x{0080}-\x{00ff}]+/iu", "", $words);
Echo ' Otherstr: ', $otherStr;
Its run result is still outputting that segment of string, why? Because none of the strings are within the range of Chr (128) to CHR (255).
(When testing, note that the file encoding is UTF-8)
For my humble opinion, I welcome the criticism.