Matching html with regular expressions

Source: Internet
Author: User
Tags expression engine

  Tutorial on helping customers (www.bkjia.com)I want to get a div tag. This tag has its specific id. How can I match a non-div tag in html and delete it to get this div tag. How can this approach be implemented?

A:

Combined with a balancing group, you can achieve the following:

Balanced group/recursive match
The balanced group syntax described here is supported by. Net Framework. Other languages/libraries do not necessarily support this function, or different syntaxes are required to support this function.

Sometimes we need to match a nested hierarchical structure like (100*(50 + 15), and then simply use \(. + \) then it will only match the content between the leftmost left brace and rightmost right brace (here we are discussing the greedy pattern, and the lazy pattern also has the following problems ). If the numbers of left and right brackets in the original string are not the same, for example (5/(3 + 2 ))), then the numbers in our matching results are not equal. Is there a way to match the longest pair of brackets in such a string?

To avoid (and \ (confuse your brain completely, we should replace parentheses with Angle brackets. Now our question is, how can we capture the content in the longest pair angle brackets in a string like xx <aa <bbb> aa> yy?

The following syntax structure is required:

(? 'Group') Name the captured content as a group and press it into the Stack)
(? '-Group') from the stack, the capture content named "group" pushed into the stack is displayed. If the stack is empty, the matching of the group fails.
(? (Group) yes | no) if the capture content named group exists on the stack, continue to match the expression of the yes part; otherwise, continue to match the no part.
(?!) Assertion with Zero Width and negative direction, attempts to match always fail because there is no suffix expression
If you are not a programmer (or you claim to be a programmer but do not know what a stack is), you can understand the above three syntaxes: the first is to write a "group" on the blackboard, the second is to erase a "group" from the blackboard, and the third is to see whether "group" is written on the blackboard ", if yes, continue to match the yes part; otherwise, the no part is matched.

What we need to do is press a "Open" button every time we encounter a left bracket, and each right bracket is displayed, at the end, let's see if the stack is empty. If it is not empty, it means that there are more left brackets than right brackets, and the matching should fail. The Regular Expression Engine will backtrack (discard the first or last character) and try to match the entire expression.

<# Left parentheses of the outermost layer [^ <>] * # The left parentheses of the outermost layer are not the content of the brackets (((? 'Open' <) # When you encounter a left bracket, write an "Open" [^ <>] * on the blackboard. # match the content behind the left bracket instead of the brackets.) + ((? '-Open'>) # If you encounter a right brace, erase an "Open" [^ <>] * # match the content that is not followed by the right brace) + )*(? (Open )(?!)) # In front of the outer right parenthesis, check whether there is any "Open" on the blackboard that has not been erased. If there is still, matching fails> # One of the most common applications of the outermost right brace balancing group is matching HTML. The following example matches nested <div> labels: <div [^>] *> [^ <>] * (? 'Open' <div [^>] *>) [^ <>] *) + ((? '-Open' </div>) [^ <>] *) + )*(? (Open )(?!)) </Div>.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.