Php-pcre Regular expression annotation and recursive mode

Source: Internet
Author: User

Comments

Character sequence (? #标记开始一个注释直到遇到一个右括号. Nested parentheses are not allowed. The characters in the note do not participate in matching as part of the pattern.

If the pcre_extended option is set, the non-escaped # character outside of a character class represents the remainder of the line as a comment.

Recursive mode

Consider the problem of matching strings within parentheses, allowing infinite nesting of parentheses. If recursion is not used, the best way is to use a pattern to match the nesting of fixed depths. It cannot handle nesting of arbitrary depths. Perl 5.6 provides an experimental feature that allows regular expression recursion. Special item (? R) provides this particular use of recursion. This PCRE pattern solves the problem of parentheses (assuming that the pcre_extended option is set so that whitespace characters are ignored): \ ((? >[^ ()]+) | (? R)).

First, it matches an opening parenthesis. It then matches any number of non-parenthesis character sequences or a pattern's own recursive match (for example, a correct brace substring), and finally, matches a closing parenthesis.

This example pattern contains infinitely repeated nesting, so it is important to use a subgroup-matching non-parenthesis character when the pattern is applied to a string that does not match the pattern. For example, when it is applied to (AAAAAAAAAAAAAAAAAAAAAAAAAAAAA (), it will quickly produce a "mismatch" result. However, if you do not use a one-time subgroup, the match will run for a long period because there are many ways to separate the target string with + and *, and to test all paths before reporting a failure.

All captured subgroups are eventually set to capture values that are captured from the outermost sub-pattern of recursion. If the above pattern matches (AB (CD) EF), the capturing subgroup is ultimately set to the value "EF", which is the last value obtained at the top level. If additional parentheses are added, \ ((? >[^ ()]+) | (? R)), the captured string is the matching content of the top bracket "AB (CD) EF". If there are more than 15 capturing brackets in the pattern, PCRE uses Pcre_malloc to allocate additional memory to store the data during recursion, and then releases them through Pcre_free. If no memory can be allocated, it saves only the first 15 capturing brackets, and cannot give enough memory errors within the recursion.

Starting with PHP 4.3.3, (? 1), (? 2), etc., can be used for recursive subgroups. This can also be used to name subgroups: (? P>name) or (? P&name).

If the recursive subgroup syntax is used outside of the child group brackets it refers to (whether it is a subgroup number or a subgroup name), this operation is equivalent to a subroutine in the programming language. Some previous examples indicate that the pattern (sens|respons) E and \1ibility match "sense and responsibility" and "response and responsibility", but do not match "sense and Responsibility ". If you replace it with the pattern (sens|respons) e and (? 1) ibility, it matches the "sense and responsibility" as if it matches the two strings. This means that the reference is followed by a sub-pattern that matches the reference. (Note: A back reference matches only the result of the previous match of the referenced subgroup, where the recursive syntax reference is to re-match the referenced sub-pattern.) )

The maximum length of the target string is the maximum positive integer that the int variable can store. However, PCRE uses recursive processing of subgroups and infinite repetition. This means that the stack space available for some modes may be limited by the target string.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.