Http://www.cnblogs.com/qiantuwuliang/archive/2011/06/11/2078482.html
Balanced group/recursive match
The balanced group syntax described here is supported by. NET Framework. Other languages/libraries do not necessarily support this function, or different syntaxes are required to support this function.
Sometimes we need to match a nested hierarchical structure like (100*(50 + 15), and then simply use \(. + \) then it will only match the content between the leftmost left brace and rightmost right brace (here we are discussing the greedy pattern, and the lazy pattern also has the following problems ). If the numbers of left and right brackets in the original string are not the same, for example (5/(3 + 2 ))), then the numbers in our matching results are not equal. Is there a way to match the longest pair of brackets in such a string?
To avoid (and \ (confuse your brain completely, we should replace parentheses with Angle brackets. Now our question is, how can we capture the content in the longest pair angle brackets in a string like XX <AA <BBB> AA> YY?
The following syntax structure is required:
- (? 'Group') Name the captured content as a group and press it into the stack)
- (? '-Group') from the stack, the capture content named "group" pushed into the stack is displayed. If the stack is empty, the matching of the group fails.
- (? (Group) Yes | no) if the capture content named group exists on the stack, continue to match the expression of the yes part; otherwise, continue to match the no part.
- (?!) Assertion with Zero Width and negative direction, attempts to match always fail because there is no suffix expression
If you are not a programmer (or you claim to be a programmer but do not know what a stack is), you can understand the above three syntaxes: the first is to write a "group" on the blackboard, the second is to erase a "group" from the blackboard, and the third is to see whether "group" is written on the blackboard ", if yes, match the yes part. Otherwise, match the no part.
What we need to do is press a "open" button every time we encounter a left bracket, and each right bracket is displayed, at the end, let's see if the stack is empty. If it is not empty, it means that there are more left brackets than right brackets, and the matching should fail. The Regular Expression Engine will backtrack (discard the first or last character) and try to match the entire expression.
<# Left parenthesis of the outermost layer
[^ <>] * # The left brackets behind the outermost layer are not the content of the brackets.
(
(
(? 'Open' <) # open it on the blackboard when you encounter a left bracket"
[^ <>] * # Match the content not enclosed by brackets
) +
(
(? '-Open'>) # Run the right parenthesis to erase an "open"
[^ <>] * # Match the content not enclosed by brackets
) +
)*
(? (Open )(?!)) # In front of the outermost right parenthesis, check whether there is any "open" on the blackboard that has not been erased. If there are still, the match fails.
> # Outer right brackets
The most common application of a balancing group is to match HTML. The following example can match nested <div> labels:
<Div [^>] *> [^ <>] * (? 'Open' <Div [^>] *>) [^ <>] *) + ((? '-Open' </div>) [^ <>] *) + )*(? (Open )(?!)) </Div>