PHP mating nested text inside parentheses

Source: Internet
Author: User
PHP matches nested text inside parentheses

The contents of this article, collated from the Web finer points of PHP regular expressions. Its analytical process silking, harmonizing, is worth reading. This article systematically lists the common features of the normal expressions in PHP, and I only pick up the recursive part of the translation to sort them out.

A previous article translated recursive regular expressions in the Perl language. In fact, many languages in the regular is to support recursion, such as this article to introduce the PHP regular recursion. Although, the most commonly used regular expression in the work is "regular", only with the most basic grammar can solve more than 85% of the problem, and reasonable and efficient use of ordinary regular to solve complex problems is also a skill and learning; But a higher level of grammar does have its value, and sometimes it does not work; And the fun of learning is to try all kinds of possibilities to satisfy your endless curiosity.

The contents of this article, collated from the Web finer points of PHP regular expressions. Its analytical process silking, harmonizing, is worth reading. This article systematically lists the common features of the normal expressions in PHP, and I only pick up the recursive part of the translation to sort them out.


Body
Example
When will a recursive regular expression be used? Of course there is a pattern in the string to be matched recursively (seemingly nonsense). The most classic example of this is the problem of recursive regex processing of nested parentheses. Examples are as follows.

Suppose your text contains nested parentheses that are correctly paired. The depth of the parentheses can be an infinite layer. You want to capture such a bracket group.

Forgive me, the answer is this:

View Sourceprint?

1

2 $string= "some text (a(b(c)d)e) more text";?

3 if(preg_match("/\(([^()]+|(?R))*\)/",$string,$matches))?

4 {?

5 echo"
"; print_r($matchesecho"
";?

6 }?

7 ?>

The output is:

View Sourceprint?

1 Array?

2 (?

3 [0] => (a(b(c)d)e)?

4 [1] => e?

5 )

Visible, the text we need has been captured in $matches[0].

Principle
Now think about the principle.

The key point in the above regular expression is (?). R). (? R) is to recursively replace the entire regular expression in which it resides. At each iteration, the PHP parser will (? R) replaced by "\ (([^ ()]+| (? R) *\) ".

So, specifically to the above example, its regular expression is equivalent to:

"/\(([^()]+|\(([^()]+|\(([^()]+)*\))*\))*\)/"

But the above code is only suitable for brackets that are 3 levels deep. For parentheses nested in unknown depth, you have to use this regularization:

"/\(([^()]+| (? R)) *\)/"

It can not only match the infinite depth, but also simplifies the syntax of regular expressions. Powerful, simple syntax.

Now take a closer look at "/\ ([^ ()]+| (? R) *\)/"How to Match" (A (b (c) d) e) ":

"(c)" This part is matched by the regular "\ (([^ ()]+) *\)". Note that (c) is actually equivalent to a miniature of the entire recursion, which is perfectly formed, so it uses the entire regular expression.
In other words, in the next step (c), you can use (? R) to match.
(b (c) d) The matching process is:
"\ (" Match "(";
"[^ ()]+" matches "B";
(? R) matches "(c)";
"[^ ()]+" matches "D";
"\)" matches ").
Based on the above matching principle, it is not difficult to understand why the 2nd element of the array $matches[1] is equivalent to ' e '. The substring ' e ' is captured in the last matching iteration. Only the last captured result is saved to the array during the matching process.

Rex Note: For this feature, you can try it yourself and see what the result of capturing the string abc123xyz890 is by using the regular ([a-z]+[0-9]+) + to match strings. Note that the results do not conflict with the left longest principle.

If we only need to capture $matches [0], you can do this:

View Sourceprint?

1

2 $string= "some text (a(b(c)d)e) more text";?

3 if(preg_match("/\((?:[^()]+|(?R))*\)/",$string,$matches))?

4 {?

5 echo"
"; print_r($matchesecho"
";?

6 }?

7 ?>

Produces the same result:

View Sourceprint?

1 Array?

2 (?

3 [0] => (a(b(c)d)e)?

4 )

The change is to capture parentheses () instead of capturing parentheses (?:) The.

Can also be further improved to:


?

View Sourceprint?

1

2 $string= "some text (a(b(c)d)e) more text";?

3 if(preg_match("/\((?>[^()]+|(?R))*\)/",$string,$matches))?

4 {?

5 echo"
"; print_r($matchesecho"
";?

6 }?

7 ?>

Here we have used the so-called one-off mode (Rex Note: Mr. Yu Yu, "proficient in regular Expression v3.0", referred to as the "curing group". Refer to the book.) The PHP manual also recommends that you use this pattern whenever possible, so that you can increase the speed of regular expressions.

The one-time mode is simple and is no longer detailed here. If interested, refer to the official PHP manual. If you want to learn more about Perl-compatible regular expressions, please refer to the links at the end of this article.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.