. NET C # Regular expression balancing Group/recursive matching

Source: Internet
Author: User
. NET C # Regular expression balancing Group/recursive matching

Balance Group/recursive matching

The balance group syntax described here is supported by the. Net framework, and other languages/libraries do not necessarily support this feature, or support this feature but require different syntax.

Sometimes we need to match a nested hierarchy like (100 * (50 + 15)), when simply using \ (. +\) will only match the contents between the leftmost left parenthesis and the rightmost right parenthesis (here we are talking about greedy mode, lazy mode has the following problem). If the number of opening and closing parentheses in the original string is not equal, such as (5/(3 + 2)), then the number of the two in our match result will not be equal. Is there a way to match the contents of the longest, paired parentheses in such a string?

To avoid (and/or to confuse your brain completely, we use angle brackets instead of parentheses.) Now our question becomes how to capture the contents of the longest paired angle brackets in a string such as xx <aa <bbb> <bbb> aa> yy?

The following syntax constructs are required:

(?' Group ') names the captured content as group and presses it onto the stack (stacks)
(?' -group ') pops the capture from the stack that was last pressed onto the stack, and if the stack was empty, the match for this group failed
(? (group) Yes|no) if a capture with the name group is present on the stack, continue to match the expression in the Yes section, or continue to match the no section
(?!) 0 wide Negative lookahead assertion, because there is no suffix expression, trying to match always fails
If you are not a programmer (or you call yourself a programmer but do not know what the stack is), you should understand the three syntax above: the first one is to write a "group" on the blackboard, the second one is to erase a "group" from the blackboard, and the third is to see if there is "group" on the blackboard. ", if any, continue to match the Yes section, otherwise it will match the no section.

What we need to do is to hit the left parenthesis, press into an "Open", and each time we hit a closing parenthesis, it pops up to see if the stack is empty--if it's not empty it proves that the left parenthesis is more than the right parenthesis, and that the match should fail. The regular expression engine will backtrack (discarding some of the first or last characters) and try to match the entire expression.



Copy Code
< #最外层的左括号
[^<>]* #最外层的左括号后面的不是括号的内容
(
(
(?' Open ' < ' #碰到了左括号, write an "open" on the blackboard
[^<>]* #匹配左括号后面的不是括号的内容
)+
(
(?' -open ' >) #碰到了右括号, Erase an "Open"
[^<>]* #匹配右括号后面不是括号的内容
)+
)*
(? (Open) (?!)) #在遇到最外层的右括号前面, judging if there is still no erase on the blackboard "Open"; if there is, the match fails

> #最外层的右括号
Copy Code


One of the most common applications of the balance group is to match HTML, and the following example can match nested <div> tags:



<div[^>]*>[^<>]* ((? ' Open ' <div[^>]*> ' [^<>]*] + ((? ') -open ' </div>) [^<>]*] +) * (? ( Open) (?!)) </div>
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.