PHP matches nested text inside parentheses
The contents of this article, collated from the Web finer points of PHP regular expressions. Its analytical process silking, harmonizing, is worth reading. This article systematically lists the common features of the normal expressions in PHP, and I only pick up the recursive part of the translation to sort them out.
A previous article translated recursive regular expressions in the Perl language. In fact, many languages in the regular is to support recursion, such as this article to introduce the PHP regular recursion. Although, the most commonly used regular expression in the work is "regular", only with the most basic grammar can solve more than 85% of the problem, and reasonable and efficient use of ordinary regular to solve complex problems is also a skill and learning; But a higher level of grammar does have its value, and sometimes it does not work; And the fun of learning is to try all kinds of possibilities to satisfy your endless curiosity.
The contents of this article, collated from the Web finer points of PHP regular expressions. Its analytical process silking, harmonizing, is worth reading. This article systematically lists the common features of the normal expressions in PHP, and I only pick up the recursive part of the translation to sort them out.
Body
Example
When will a recursive regular expression be used? Of course there is a pattern in the string to be matched recursively (seemingly nonsense). The most classic example of this is the problem of recursive regex processing of nested parentheses. Examples are as follows.
Suppose your text contains nested parentheses that are correctly paired. The depth of the parentheses can be an infinite layer. You want to capture such a bracket group.
Forgive me, the answer is this:
View Sourceprint?
2 |
$string = "some text (a(b(c)d)e) more text" ;? |
3 |
if (preg_match( "/\(([^()]+|(?R))*\)/" , $string , $matches ))? |
5 |
echo " " ; print_r( $matches
echo "
" ;? |
The output is:
View Sourceprint?
Visible, the text we need has been captured in $matches[0].
Principle
Now think about the principle.
The key point in the above regular expression is (?). R). (? R) is to recursively replace the entire regular expression in which it resides. At each iteration, the PHP parser will (? R) replaced by "\ (([^ ()]+| (? R) *\) ".
So, specifically to the above example, its regular expression is equivalent to:
"/\(([^()]+|\(([^()]+|\(([^()]+)*\))*\))*\)/"
But the above code is only suitable for brackets that are 3 levels deep. For parentheses nested in unknown depth, you have to use this regularization:
"/\(([^()]+| (? R)) *\)/"
It can not only match the infinite depth, but also simplifies the syntax of regular expressions. Powerful, simple syntax.
Now take a closer look at "/\ ([^ ()]+| (? R) *\)/"How to Match" (A (b (c) d) e) ":
"(c)" This part is matched by the regular "\ (([^ ()]+) *\)". Note that (c) is actually equivalent to a miniature of the entire recursion, which is perfectly formed, so it uses the entire regular expression.
In other words, in the next step (c), you can use (? R) to match.
(b (c) d) The matching process is:
"\ (" Match "(";
"[^ ()]+" matches "B";
(? R) matches "(c)";
"[^ ()]+" matches "D";
"\)" matches ").
Based on the above matching principle, it is not difficult to understand why the 2nd element of the array $matches[1] is equivalent to ' e '. The substring ' e ' is captured in the last matching iteration. Only the last captured result is saved to the array during the matching process.
Rex Note: For this feature, you can try it yourself and see what the result of capturing the string abc123xyz890 is by using the regular ([a-z]+[0-9]+) + to match strings. Note that the results do not conflict with the left longest principle.
If we only need to capture $matches [0], you can do this:
View Sourceprint?
2 |
$string = "some text (a(b(c)d)e) more text" ;? |
3 |
if (preg_match( "/\((?:[^()]+|(?R))*\)/" , $string , $matches ))? |
5 |
echo " " ; print_r( $matches
echo "
" ;? |
Produces the same result:
View Sourceprint?
The change is to capture parentheses () instead of capturing parentheses (?:) The.
Can also be further improved to:
?
View Sourceprint?
2 |
$string = "some text (a(b(c)d)e) more text" ;? |
3 |
if (preg_match( "/\((?>[^()]+|(?R))*\)/" , $string , $matches ))? |
5 |
echo " " ; print_r( $matches
echo "
" ;? |
Here we have used the so-called one-off mode (Rex Note: Mr. Yu Yu, "proficient in regular Expression v3.0", referred to as the "curing group". Refer to the book.) The PHP manual also recommends that you use this pattern whenever possible, so that you can increase the speed of regular expressions.
The one-time mode is simple and is no longer detailed here. If interested, refer to the official PHP manual. If you want to learn more about Perl-compatible regular expressions, please refer to the links at the end of this article.