When Will recursive Regular Expressions be used? Of course, this is when a pattern appears recursively in the string to be matched (seemingly nonsense). The most typical example is the issue of recursive Regular Expressions processing nested parentheses. The example is as follows.
Assume that your text contains properly matched nested parentheses. The depth of the parentheses can be an infinite Layer. You want to capture such a bracket group.Copy codeThe Code is as follows: <? Php
$ String = "some text (a (B (c) d) e) more text ";
If (preg_match ("/\ ([^ ()] + | (? R) * \)/", $ string, $ matches )){
Echo "<pre>"; print_r ($ matches); echo "</pre> ";
}
?>
The result is:Copy codeThe Code is as follows: Array
(
[0] => (a (B (c) d) e)
[1] => e
)
We can see that the required text has been captured in $ matches [0.
Principle
How it works.
The key point in the above regular expression is (? R ).(? R) is to recursively replace the entire regular expression in which it is located. During each iteration, the PHP syntax analyzer will (? R) Replace with "\ ([^ ()] + | (? R) * \) ".
Therefore, in the above example, the regular expression is equivalent:Copy codeThe Code is as follows: "/\ ([^ ()] + | \ ([^ ()] + | \ ([^ ()] + )*\)) *\))*\)/"
However, the above Code is only applicable to brackets with a depth of three layers. For nested parentheses with an unknown depth, you have to use this regular expression:Copy codeThe Code is as follows: "// \ ([^ ()] + | (? R ))*\)/"
It not only can match infinite depth, but also simplifies the syntax of regular expressions. It has powerful functions and concise syntax.
Now let's take a closer look at "/\ ([^ ()] + | (? R) * \)/"how to match" (a (B (c) d) e:
"(C)" This part is matched by the regular expression "\ ([^ ()] +. note that (c) is actually a microcosm of the entire recursion. Although the sparrow is small, it uses the entire regular expression.
In other words, in (c) In the next step, you can use (? R) to match.
(B (c) d) the matching process is:
"\ (" Match "(";
"[^ ()] +" Match "B ";
(? R) Match "(c )";
"[^ ()] +" Match "d ";
"\)" Match ")".
According to the above matching principle, it is not difficult to understand why the 2nd elements of the array $ matches [1] are equivalent to 'E. the substring 'E' is captured in the last matching iteration. during the matching process, only the last capture result is saved to the array.
For this feature, you can try it by yourself and use the regular expression ([a-z] + [0-9] +) + to match the string abc123xyz890, what is the capture result $1. note that the result does not conflict with the Left Longest principle.
If we only need to capture $ matches [0], we can do this:Copy codeThe Code is as follows: <? Php
$ String = "some text (a (B (c) d) e) more text ";
If (preg_match ("/((? : [^ ()] + | (? R) *)/", $ string, $ matches ))
{
Echo "<pre>"; print_r ($ matches); echo "</pre> ";
}
?>
The results are the same:
Array
(
[0] => (a (B (c) d) e)
)
The change is to change the capture parentheses () to non-capture parentheses (? .
It can also be further improved:Copy codeThe Code is as follows: <? Php
$ String = "some text (a (B (c) d) e) more text ";
If (preg_match ("/(?> [^ ()] + | (? R) *)/", $ string, $ matches ))
{
Echo "<pre>"; print_r ($ matches); echo "</pre> ";
}
?>
Here we use the so-called one-time mode (rex note: In "proficient Regular Expression v3.0" Translated by Mr Yu Sheng, it is called "curing group ". refer to this book .) the PHP Manual also recommends that you use this mode whenever possible as long as the conditions permit to speed up regular expressions.