When Will recursive Regular Expressions be used? Of course, when a pattern appears recursively in the matched string (seemingly nonsense ). the most typical example is the issue of recursive Regular Expressions processing nested parentheses. when Will recursive Regular Expressions be used? Of course, when a pattern appears recursively in the string to be matched (seemingly nonsense), the most typical example is the issue of recursive Regular Expressions processing nested parentheses.
Script ec (2); script
Example.
Assume that your text contains properly matched nested parentheses. The depth of the parentheses can be an infinite Layer. You want to capture such a bracket group.
The standard answer is as follows:
| The Code is as follows: |
|
$ String = "some text (a (B (c) d) e) more text "; If (preg_match ("/([^ ()] + | (? R) *)/", $ string, $ matches )) { Echo" "; print_r($matches); echo " "; } ?> The output result is: Array ( [0] => (a (B (c) d) e) [1] => e )
|
We can see that the required text has been captured in $ matches [0.
Principle
How it works.
The key point in the above regular expression is (? R ).(? R) is to recursively replace the entire regular expression in which it is located. During each iteration, the PHP syntax analyzer will (? R) Replace with "([^ ()] + | (? R) *) ".
Therefore, in the above example, the regular expression is equivalent:
| The Code is as follows: |
|
"/([^ ()] + | ([^ ()] + | ([^ ()] + )*))*))*)/" |
However, the above Code is only applicable to brackets with a depth of three layers. For nested parentheses with an unknown depth, you have to use this regular expression:
| The Code is as follows: |
|
"/([^ ()] + | (? R ))*)/" |
It not only can match infinite depth, but also simplifies the syntax of regular expressions. It has powerful functions and concise syntax.
Now let's take a closer look at "/([^ ()] + | (? R) *)/"how to match" (a (B (c) d) e:
1. "(c)" This part is matched by the regular expression "([^ ()] +. note that (c) is actually a microcosm of the entire recursion. Although the sparrow is small, it uses the entire regular expression.
In other words, in (c) In the next step, you can use (? R) to match.
2. the matching process of (B (c) d) is:
1. "(" match "(";
2. "[^ ()] +" match "B ";
3 .(? R) Match "(c )";
4. "[^ ()] +" match "d ";
5. ")" match ")".
According to the above matching principle, it is not difficult to understand why the 2nd elements of the array $ matches [1] are equivalent to 'E. the substring 'E' is captured in the last matching iteration. during the matching process, only the last capture result is saved to the array.
Rex Note: For this feature, you can try it by yourself and use the regular expression ([a-z] + [0-9] +) + to match the string abc123xyz890, what is the capture result $1. note that the result does not conflict with the Left Longest principle.
If we only need to capture $ matches [0], we can do this:
| The Code is as follows: |
|
$ String = "some text (a (B (c) d) e) more text "; If (preg_match ("/((? : [^ ()] + | (? R) *)/", $ string, $ matches )) { Echo" "; print_r($matches); echo " "; } ?> The results are the same: Array ( [0] => (a (B (c) d) e) )
|
The change is to change the capture parentheses () to non-capture parentheses (? .
It can also be further improved:
| The Code is as follows: |
|
$ String = "some text (a (B (c) d) e) more text "; If (preg_match ("/(?> [^ ()] + | (? R) *)/", $ string, $ matches )) { Echo""; print_r($matches); echo " "; } ?>
|
Here we use the so-called one-time mode (rex note: In "proficient Regular Expression v3.0" Translated by Mr Yu Sheng, it is called "curing group ". refer to this book .) the PHP Manual also recommends that you use this mode whenever possible as long as the conditions permit to speed up regular expressions.