PHP Regular expression Efficiency greed, non-greed and retrospective analysis (recommended)

Source: Internet
Author: User
Tags php regular expression
First, literacy. What is the greedy of regular expression, what is non-greedy? Or what is the matching priority quantifier, and what is the override of the priority quantifier?

Well, I don't know what the concept is, let's give an example.

A classmate wants to filter the content between, that is so write regular and program.

$str = preg_replace ('%<script>.+?</script>%i ', ' ', $str);//Non-greedy

It seems that there is no problem, in fact, otherwise. If

$str = ' <script<script>alert (document.cookie) </script>>alert (document.cookie) </script> ';

So after the above program processing, the result is

$str = ' <script<script>alert (document.cookie) </script>>alert (document.cookie) </script> '; $str = preg_replace ('%<script>.+?</script>%i ', ' ', $str);//non-greedy print_r ($STR);//$STR output is <script> Alert (Document.cookie) </script>

Still couldn't reach the effect he wanted. The above is non-greedy, and some are called inertia. Its flag is a non-greedy identity that is added after the amount of metacharacters? , such as +?, *?、?? (More special, in the future blog, I will write) and so on. That is, the identity of non-greed, if not write it is greed. Like what

$str = ' <script<script>alert (document.cookie) </script>>alert (document.cookie) </script> '; $str = preg_replace ('%<script>.+</script>%i ', ' ', $str);//non-greedy print_r ($STR);//$STR output for <script only these, It doesn't seem right, huh, do you know how to rewrite that regular?

The above is a greedy, non-greedy distinction introduced. Below, talk about greed, non-greed caused by backtracking problems. Let's look at a small example.

The regular expression is \w* (\d+) and the string is cfc456n, so what is the regular match??

If you answer is 456, then, congratulations, the answer is wrong, the result is not 456, but 6, you know why?

cfc4n to explain, when the regular engine with regular \w* (\d+) to match the string cfc456n, will first use \w* to match the string cfc456n, first, \w* will match the string cfc456n all the characters, and then to \d+ to match the rest of the string, And the rest, then, \w* rules will be reluctant to spit out a character, to \d+ to match, at the same time, before spitting out the character, record a point, this point, is used for backtracking points, and then \d+ to match N, found and can not match the success, will again ask \w* to spit out a character, \w* A backtracking point is recorded again, and a character is spit out. At this time, \w* matching results only cfc45, has spit out 6n, \d+ again to match 6, found that matching success, will notify the engine, matching success, directly displayed. So, the result (\d+) is 6, not 456.

When the above regular expression is changed to \w*? (\d+) (note that this is non-greedy), the string is still cfc456n, then, how much is the regular match??

A classmate replied: The result is 456.

Well, yes, right, is 456,cfc4n weak weak ask, why is 456?

I'm here to explain why it's 456.

Regular expression has a rule, is the quantifier first match, so \w*? will first match the string cfc456, because \w*? is non-greedy, the regular engine uses the expression \w+? Matches only one string at a time, and then gives control to the subsequent \d+ to match the next character, while recording a point, Used to return to the match when the mismatch is unsuccessful, again, that is, the backtracking point. Since \w is the quantifier is *,* represents 0 to countless times, so, first 0 times, that is \w*? Match empty, record backtracking point, control to \d+,\d+ to match the first character C, then, match failed, so, then control to cfc456n? To match cfc456n's c,\w*? Match C success, because is non-greedy, so, he only match one character at a time, record the backtracking point, and then give control to \d+ match F, then, \d+ match F again failed, then control to \w*?,\w*? Match C again, record the backtracking point (then \w* Match results are CFC), and then control to \d+,\d+ to match 4, matching success, and then, because the quantifier is +, is 1 to countless times, so, then back to match, then match 5, success, then, then match 6, success, then, continue matching operation, the next character is N, the match failed, at this time, \d+ 's going to give it control over the hand. Since there is no regular expression behind \d+, the entire regular expression declaration is completed, and the result is cfc456, where the first set of results is 456. Dear classmates, you understand the result of the question just now, why is it 456?

Well, did you learn from the above examples of greedy, non-greedy matching principle? Do you understand when you need to use greed, non-greed to process your strings?

Brother Bird's article on the expression, program for

$reg = "/<script>.*?<\/script>/is"; $str = "<script>********</script>"; Length greater than 100014$ret = Preg_repalce ($reg, "", $str); Returns null

The reason for this is that there is too much backtracking until the stack space is exhausted.

Let's look at an example.

String

$str = ' <script>123456</script> ';

The regular expression is

$strRegex 1 = '%<script>.+<\/script>% '; $strRegex 2 = '%<script>.+?<\/script>% '; $strRegex 3 = '% <script> (?:(?! <\/script>).) +<\/script>% ';

The above is a small series to introduce you to the PHP regular expression efficiency greed, non-greedy and retrospective analysis, I hope to help you, if you have any questions please give me a message, small series will promptly reply to you. Thank you very much for your support for topic.alibabacloud.com!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.