Regular expression grouping (), assertion (?<:) detailed

Source: Internet
Author: User

Assertions in regular expressions, as high-level applications appear, not because of how difficult it is, but the concept of a more abstract, not easy to understand, today let the plain side of the way to explain.

If you do not have to assert, the previously used expressions, can only get to the regular string, but not to get an irregular string.

For example, such as HTML source has <title>xxx</title> tags, with the previous knowledge, we can only determine the source of <title> and </title> is fixed. Therefore, if you want to get the page title (XXX), at best, you can only write an expression similar to this: <TITLE>.*</TITLE> and so the write match is complete <title>xxx</title> Label, and not simply the page title xxx.

To solve the above problems, we need to use the assertion knowledge.

Before asserting, the reader should understand the grouping, which helps to understand the assertion.

Groups in the regular (), according to the understanding of the dishes, the role of the group has two:

N Some laws are treated as a group, and then the group-level repetition can result in unexpected results.

After the n grouping, you can simplify the expression by using a back-reference.

First of all, for the IP address matching, the simple can be written as follows:

\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}

But careful observation, we can find a certain pattern, we can see. \d{1,3} As a whole, that is, to see them as a group, and then repeat the group 3 times. The expression is as follows:

\d{1,3} (. \d{1,3}) {3}

Such a look, it is more concise.

Take another look at the second function, and with the match <title>xxx</title> tag, simple regex can be written like this:

<title>.*</title>

As you can see, there are two title in the top expression, exactly the same, you can actually pass the group shorthand. The expression is as follows:

< (title) >.*</\1>

This example is actually the actual application of the reverse reference. For grouping, the entire expression is always counted as group No. 0, in this case, the No. 0 Group is < (title) >.*</\1>, and then from left to right, followed by grouping numbering, so (title) is the 1th group.

With \1 this syntax, you can refer to a group of text content , \1 of course refers to the 1th group of text content, so that you can simplify the regular expression, write only once title, put it in the group, and then in the back of the reference.

With this in mind, can we simplify the just-in-IP-address regular expression? The original expression is \d{1,3} (. \d{1,3}) {3}, inside of the \d{1,3} repeated two times, if the use of a back reference simplification, the expression is as follows:

(\d{1,3}) (. \1) {3}

Simply explained, put \d{1,3} in a group, expressed as (\d{1,3}), it is the 1th group, (. \1) is the 2nd group, in the 2nd group through the \1 syntax, followed by reference to the 1th group of text content .

After the actual test, you will find that it is wrong to write, why?

Side dishes have always been emphasized, back-to-reference, referring to only textual content, not regular expressions !

That is, once the contents of a group are successfully matched, a reference is made to the content after the successful match, referring to the result, not the expression .

Therefore, (\d{1,3}) (. \1) {3} This expression actually matches four numbers with the same IP address, for example: 123.123.123.123.

At this point, the reader has mastered the legendary back-reference, it is so simple.

Next, say what the assertion is.

An assertion is a string that satisfies a certain pattern by indicating that a string precedes or is behind it.

Take the example of the beginning of the article, we want to be xxx, it has no regularity, but it will certainly have <title>, there will certainly be a </title>, and that is enough.

If you want to specify XXX before you will definitely appear <title>, use positive-post assertion, expression: (?<=<title>). *

</title> will definitely appear behind the specified XXX, with positive assertion, expression:. * (?=</title>)

The two are added together, that is (?<=<title>). * (?=</title>)

So you can match to XXX.

I believe readers see this, has been blindfolded, do not hurry, to slowly talk about the side dishes.

Actually mastered the law, it is very simple, whether in advance or after the hair, are relative to the xxx , that is, relative to the target string.

If there is a condition behind the target string, it can be understood that the target string precedes it, with the antecedent assertion placed after the target string.

If the target string has a condition in front of it, it can be understood that the target string is followed by an assertion after it is placed before the target string.

If a specified condition is satisfied, it is positive.

If you specify that a condition is not met, it is negative.

Assertions are just conditions that help you find the strings you really need, and they don't match!

(? =x)

0 The width is asserted in advance. The match continues only if the subexpression X matches to the right of this position. For example,/w+ (? =/d) matches the word followed by a number, not the number. This construct does not backtrack.

(?! X

0 width Negative antecedent assertion. The match continues only if the sub-expression X does not match to the right of the position. For example, for example,/w+ (?! /d) matches a word that does not follow a number, and does not match the number.

(? <=x)

0 width Positive post assertion. The match continues only if the subexpression X matches at the left of this position. For example, (? <=19) 99 matches an instance of 99 followed by 19. This construct does not backtrack.

(? <! X

0 width negative post assertion. The match continues only if the subexpression X does not match to the left of this position. For example, (? <!19) 99 matches an instance that does not follow 99 after 19

From the assertion of the form can be seen, it is a grouping symbol, but the beginning of a question mark is added, the question mark is that this is a non-capturing group, the group is not numbered, can not be used to reference, only as an assertion.

Tutorial to this end, I hope you enjoy reading!

Regular expression grouping (), assertion (?<:) detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.