Recursive regular usage in PHP--Regular expressions

Source: Internet
Author: User
A recursive regular expression in the Perl language is translated in the previous article. In fact, many languages are in the regular is to support recursion, such as this article to introduce the PHP is recursive recursion. Although the most commonly used regular expressions are very "regular", only the most basic grammar can solve more than 85% of the problems, and the rational and effective use of ordinary regular to solve complex problems is also a skill and knowledge; But the higher grammar does have its value, and sometimes it doesn't work without it. Moreover, the pleasure of learning is also to try various possibilities, to satisfy their endless curiosity.

This article is sorted out from the Web finer points of PHP regular expressions. Its analytical process is She, harmonizing, and worth reading. This paper systematically lists the common features of PHP's regular expressions, and I only extract the recursive partial translations.


Body
Example
When will a recursive regular expression be used? Of course there is a pattern (seemingly nonsense) that appears recursively in the matching string. The classic example is recursion, which handles the problem of nested parentheses. Examples are as follows.

Suppose your text contains nested parentheses with the correct pairing. The depth of the bracket can be an infinite layer. You want to capture such a bracket group.

Forgive me for spoilers, the standard answer is this:

Copy Code code as follows:

<?php
$string = "Some text (a (b (c) d) e) more text";
if (Preg_match ()/\ ([^ ()]+| (?) R)) (*\)/", $string, $matches))
{
echo "<pre>"; Print_r ($matches); echo "</pre>";
}
?>

Its output results are:

Copy Code code as follows:

Array
(
[0] => (A (b (c) d) e)
[1] => E
)

As we can see, the text we need has been captured in $matches[0].

Principle
Now think about the principle.

The key point in the regular expression above is (?) R). (? R) is the function of recursively replacing the entire regular expression in which it resides. Each iteration, the PHP parser will (? R) is replaced with the "\ ([^ ()]+| (). R)) *\) ".

Thus, the regular expression of the above example is equivalent to:

"/\(([^()]+|\(([^()]+|\(([^()]+)*\))*\))*\)/"

However, the above code is only suitable for brackets with a depth of 3 layers. For parentheses nesting of unknown depths, you have to use this regular:

"/\(([^()]+| (? R))/"*\

It can not only match the infinite depth, but also simplifies the syntax of regular expressions. Powerful, concise grammar.

Now let's take a closer look at the/\ ([^ ()]+| (). R) *\)/"How to Match" (A (b (c) d) e) ":

The portion of "(c)" is matched by a regular "\ ([^ ()]+) *\)". Note that (c) is actually equivalent to a miniature of the entire recursion, though small spite, so it uses the entire regular expression.
In other words, (c) in the next step, you can use the (? R) to match.
(b (c) d) The matching process is:
"\ (" Match "(";
"[^ ()]+" matches "B";
(? R) Match "(c)";
"[^ ()]+" matches "D";
"\)".
Based on the above matching principle, it is not difficult to understand why the 2nd element of an array $matches[1] is equivalent to ' e '. Substring ' e ' is captured in the last matching iteration. Only the last captured result is saved to the array during the match.

Rex Note: For this feature, you can try it by yourself and see if you use a regular formula ([a-z]+[0-9]+) + to match the string abc123xyz890 and what the capture result is. Note that the results are not in conflict with the left longest principle.

If we only need to capture $matches [0], we can do this:

Copy Code code as follows:

<?php
$string = "Some text (a (b (c) d) e) more text";
if (Preg_match (?: [^ ()]+| ()/\ R)) (*\)/", $string, $matches))
{
echo "<pre>"; Print_r ($matches); echo "</pre>";
}
?>

produce the same result:
Copy Code code as follows:

Array
(
[0] => (A (b (c) d) e)
)

The change is to capture the parentheses () instead of capturing the capture bracket (?:) Out.

Can also be further improved to:

Copy Code code as follows:

<?php
$string = "Some text (a (b (c) d) e) more text";
if (Preg_match ()/\ (? >[^ ()]+| (?) R)) (*\)/", $string, $matches))
{
echo "<pre>"; Print_r ($matches); echo "</pre>";
}
?>

Here we use the so-called one-off mode (Rex Note: Yu Yu, "proficient in regular Expression v3.0", referred to as "solidification Group". Refer to the book.) The PHP manual also recommends that you use this pattern whenever possible, so that you can elevate the speed of regular expressions.

The one-off mode is simple and is no longer detailed here. If you are interested, you can refer to the official PHP manual. If you want to learn more about Perl compatible regular expressions, refer to the link at the end of this article.

  • Original: Finer points of PHP regular expressions
  • Perl compatible Regular Expressions official website document
  • PHP official website pcre Regular document
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.