Use of recursive Regular Expressions in PHP

Source: Internet
Author: User

In the previous article, we translated recursive Regular Expressions in Perl. in fact, regular expressions in many languages Support recursion, for example, PHP regular recursion described in this article. although the most common regular expressions in the work are "regular", only the most basic syntax can solve more than 85% of the problems, it is also a skill and knowledge to use regular expressions reasonably and effectively to solve complex problems. However, a higher-level syntax does have its value and sometimes it cannot be used; furthermore, the pleasure of learning Regular Expressions lies in trying various possibilities to satisfy your infinite curiosity.

The content of this article is sorted out from the Finer points of PHP regular expressions. the analysis process is worth reading. this article systematically lists common features of Regular Expressions in PHP. I just extract the recursive part of the regular expression and translate it.


Body
Example
When Will recursive Regular Expressions be used? Of course, this is when a pattern appears recursively in the string to be matched (seemingly nonsense). The most typical example is the issue of recursive Regular Expressions processing nested parentheses. The example is as follows.

Assume that your text contains properly matched nested parentheses. The depth of the parentheses can be an infinite Layer. You want to capture such a bracket group.

The standard answer is as follows:

Copy codeThe Code is as follows:
<? Php
$ String = "some text (a (B (c) d) e) more text ";
If (preg_match ("/\ ([^ ()] + | (? R) * \)/", $ string, $ matches ))
{
Echo "<pre>"; print_r ($ matches); echo "</pre> ";
}
?>

The output result is:

Copy codeThe Code is as follows:
Array
(
[0] => (a (B (c) d) e)
[1] => e
)

We can see that the required text has been captured in $ matches [0.

Principle
How it works.

The key point in the above regular expression is (? R ).(? R) is to recursively replace the entire regular expression in which it is located. During each iteration, the PHP syntax analyzer will (? R) Replace with "\ ([^ ()] + | (? R) * \) ".

Therefore, in the above example, the regular expression is equivalent:

"/\ ([^ ()] + | \ ([^ ()] + | \ ([^ ()] + )*\))*\)) *\)/"

However, the above Code is only applicable to brackets with a depth of three layers. For nested parentheses with an unknown depth, you have to use this regular expression:

"/\ ([^ ()] + | (? R ))*\)/"

It not only can match infinite depth, but also simplifies the syntax of regular expressions. It has powerful functions and concise syntax.

Now let's take a closer look at "/\ ([^ ()] + | (? R) * \)/"how to match" (a (B (c) d) e:

"(C)" This part is matched by the regular expression "\ ([^ ()] +. note that (c) is actually a microcosm of the entire recursion. Although the sparrow is small, it uses the entire regular expression.
In other words, in (c) In the next step, you can use (? R) to match.
(B (c) d) the matching process is:
"\ (" Match "(";
"[^ ()] +" Match "B ";
(? R) Match "(c )";
"[^ ()] +" Match "d ";
"\)" Match ")".
According to the above matching principle, it is not difficult to understand why the 2nd elements of the array $ matches [1] are equivalent to 'E. the substring 'E' is captured in the last matching iteration. during the matching process, only the last capture result is saved to the array.

Rex Note: For this feature, you can try it by yourself and use the regular expression ([a-z] + [0-9] +) + to match the string abc123xyz890, what is the capture result $1. note that the result does not conflict with the Left Longest principle.

If we only need to capture $ matches [0], we can do this:

Copy codeThe Code is as follows:
<? Php
$ String = "some text (a (B (c) d) e) more text ";
If (preg_match ("/\((? : [^ ()] + | (? R) * \)/", $ string, $ matches ))
{
Echo "<pre>"; print_r ($ matches); echo "</pre> ";
}
?>

The results are the same:
Copy codeThe Code is as follows:
Array
(
[0] => (a (B (c) d) e)
)

The change is to change the capture parentheses () to non-capture parentheses (? .

It can also be further improved:

Copy codeThe Code is as follows:
<? Php
$ String = "some text (a (B (c) d) e) more text ";
If (preg_match ("/\ (?> [^ ()] + | (? R) * \)/", $ string, $ matches ))
{
Echo "<pre>"; print_r ($ matches); echo "</pre> ";
}
?>

Here we use the so-called one-time mode (rex note: In "proficient Regular Expression v3.0" Translated by Mr Yu Sheng, it is called "curing group ". refer to this book .) the PHP Manual also recommends that you use this mode whenever possible as long as the conditions permit to speed up regular expressions.

The one-time mode is simple and will not be detailed here. If you are interested, please refer to the official PHP manual. If you want to learn more about PERL Compatible Regular Expressions, refer to the link at the end of this article.

  • Original article: Finer points of PHP regular expressions
  • Perl-Compatible Regular Expressions official documentation
  • PCRE regular document on the PHP Official Website
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.