The regular expression of the regular basis of the Lookaround_

Source: Internet
Author: User

1 Glance Base

Look around for only the subexpression matching, not the character, the match to the content does not save to the final match result, is 0 width. The end result of a glance match is a position.

The effect of a glance is equivalent to adding an additional condition to the location, and only if this condition is met does the scan of the subexpression match the success.

Look around according to the direction of the two kinds of order and reverse, according to whether the match is affirmative and negative two, combined with four kinds of looking around. The sequential look is equivalent to attaching a condition to the right of the current position, and reverse-looking looks like attaching a condition to the left of the current position.

expression

description

(? <=expression)

(? =expression)

(?! Expression)

For the name of the look, some documents are called Pre Search, some are called what what assertion, here used more people easily accepted the "proficient in regular expression" in the name of "Look around", in fact, what does not matter, as long as you know what the role is, so a few grammatical rules, it is easy to remember

2-Glance matching principle

Looking around is a very difficult point in the regular, for the understanding of the look, you can understand from the application and the principle of two, if you want to understand more clearly, in-depth, or from the point of view of the principle of a better understanding, the basic principle of matching is the NFA engine matching principle.

The above mentioned look is equivalent to the "location" attached a condition, the difficulty of looking around is to find this "position", this point is resolved, there is no secret to look at it.

Sequential Scan matching Process

For a sequential positive glance (? =expression) , when the subexpression Expression match succeeds,(? =expression) matches successfully, and reports (? = Expression) matches the current location successfully.

For a sequential negation glance (?!) Expression) , when the subexpression Expression match succeeds,(?! Expression) match fails, when the subexpression Expression match fails,(?! Expression) match successfully, and report (?! Expression) matches the current position successfully;

The example of a sequential scan is already explained in the NFA engine matching principle, where you can explain the sequential negation.

SOURCE string:aa<p>one</p>bb<div>two</div>cc

Regular expression:<(?! /?p\b)[^>]+>

This regular meaning is to match the rest of the tags except <p...> or </p>.

Matching process:

First by the character "<" To obtain control, starting from position 0 match, because "<" Match "a" failed, at position 0, the entire expression matching failure, the first iteration match failed, the regular engine forward transmission, Try the second iteration match at position 1.

Repeat the process until position 2, "<" Match "<" success, control power to "(?!) /?p\b)";"(?!) /?p\b)"When the subexpression obtains control, the inner subexpression is matched. First by "/?" Gain control, try to match "p" failure, backtracking, mismatch, control to "p", "P" to tryto match "P", the match succeeds, control power to "\b"; \b"to try to match position 4, the match succeeds. At this point the subexpression is complete, the "/?p\b" Match is successful, then the expression "(?! /?p\b)"matches failed. At position 2, the entire expression match fails, the new iteration match fails, the regular engine forwards, and the next round of iteration is attempted from position 3.

At position 8 You will also encounter a round of "/?p\b" matching"//" success, which results in a glance expression "(?!). /?p\b)"match failed, causing the entire expression to match the failed procedure.

Repeat the above process until position, "<" Match "<" success, control power to "(?!) /?p\b)";"/"?" Attempt to match "D" failed, backtracking, mismatch, control to "P", by "P" to try to match "D", the match failed, no alternative state is available for backtracking, match failed. At this point the subexpression is complete, and the "/?p\b" match fails, so look around the expression "(?!). /?p\b)"matches successfully. The result of the match is position 15, and then control is given to "[^>]+", and the "[^>]+" is attempted to match from position 15, which can be successfully matched to "div", with control handed to " >">" to match ">".

At this point the regular expression match completes and the report matches successfully. Match result is "<div>", starting position is 14, ending position is 19. where "<" matches "<", "(?!) /?p\b)"match location,"[^>]+"match string"div",">"Match">".

Reverse the Basics

For the reverse-order positive look (? <=expression) , when the subexpression Expression match succeeds,(? <=expression) matches successfully, and reports (? <=expression) matches the current location successfully.

For reverse negation look around (? <! Expression) , when the subexpression Expression matched successfully,(? <! Expression) match fails, when the subexpression Expression match fails,(? <! Expression) match successfully, and report (? <! Expression) matches the current position successfully;

The sequential look is equivalent to attaching a condition to the right of the current position, so its match attempt starts at the current position and then attempts to match to the right until a position makes the match successful or unsuccessful. The special part of the reverse look is that it is the equivalent of attaching a condition to the left of the current position, so it does not start trying to match at the current position, but instead starts at a location on the left side of the current position and matches to the current position, the report match succeeds or fails.

The starting point for an attempt to match is OK, the current position, and the end of the match is indeterminate. The starting point of the reverse-looking look is indeterminate, is a position to the left of the current position, and the matching endpoint is OK, the current position.

So the sequential look is relatively simple, and the reverse look is relatively complex. This is why most languages and tools provide support for sequential looking around, while only a few languages provide the reason to support reverse-looking.

In JavaScript, only sequential looks are supported, and reverse-looking is not supported.

In Java, although sequential look and reverse look are supported, but reverse-looking look only supports the length of the expression, in reverse order to look at quantifiers only support "?", does not support other indefinite length of quantifiers. When the length is determined, the engine can begin to try to match the position of the fixed length as the starting point, and if the length is indeterminate, try to match from position 0, and the complexity of the processing is obvious.

At present only. NET in order to support the indeterminate length of the reverse look.

Reverse-Scan matching process

SOURCE string:<div>a test</div>

Regular expression:(?<=<div>)[^<]+(?=</div>)

This regular meaning is to match the content between <div> and </div> tags, not the <div> and </div> tags themselves.

Matching process:

First by "(?<=<div>)" to get control, starting from position 0 match, because position 0 is the starting position, there is no content on the left, so "<div>" must match the failure, thus looking around expression " (?<=<div>)match failed, causing the entire expression to match at position 0. The first-round iteration mismatch failed, and the regular engine moved forward, starting at position 1 to try the second iteration match.

Until the drive to position 5, "(?<=<div>)" to get control, to the left to find 5 positions, starting from position 0 match, by "<div>" Match "<div>" Success, thereby "(?<=<div>)" Match succeeded, the result of the match is position 5, control to "[^<]+"; "[^<]+" try to match from position 5, match " a Test"success, Control to"(?=</div>), by "</div>" Match "</div>" success, thereby " (?=</div>)"The match was successful and the result is position 11."

At this point the regular expression match completes and the report matches successfully. The result is "a test" with a start position of 5 and an ending position of 11. where "(?<=<div>)" matches position 5, "[^<]+" matches "a test", "(?=</div>) "Match location 11."

The matching process of reverse negation is similar to the above process, except that when the Expression match fails, the reverse negation expression (? <! Expression) to match successfully.

To this look of the matching principle has been basically explained, look around there is no secret to say, need, but also just more practice.

3 Surveying applications

I'm tired of writing today, and I'll give you a comprehensive example of a look around for a while, as for the application scenes and techniques of looking around, and finishing up later.

Requirements: The format of numbers is formatted with the currency of ",".

Regular expression: (?<=\d) (? <!\.\d*) (? = (?: \ D{3}) + (?: \. \d+|$))

Test code:

double[] data = new double[] {0, 12, 123, 1234, 12345, 123456, 1234567, 123456789, 1234567890, 12.345, 123.456, 1234.56, 12345.6789, 123456.789, 1234567.89, 12345678.9};

foreach (double d in data)

{

Richtextbox2.text + = "Source string:" + d.tostring (). PadRight (15) + "format:" + Regex.Replace (d.tostring (), @ "(? <=\d) (? <!\.\d*) (??: \ D{3}) + (?: \. \d+|$) ",", ") +" \ n ";

}

Output results:

SOURCE string: 0 Format: 0

SOURCE string: 12 Format: 12

SOURCE string: 123 format: 123

SOURCE string: 1234 format: 1,234

SOURCE string: 12345 format: 12,345

SOURCE string: 123456 format: 123,456

SOURCE string: 1234567 format: 1,234,567

SOURCE string: 123456789 format: 123,456,789

SOURCE string: 1234567890 format: 1,234,567,890

SOURCE string: 12.345 format: 12.345

SOURCE string: 123.456 format: 123.456

SOURCE string: 1234.56 format: 1,234.56

SOURCE string: 12345.6789 format: 12,345.6789

SOURCE string: 123456.789 format: 123,456.789

SOURCE string: 1234567.89 format: 1,234,567.89

SOURCE string: 12345678.9 format: 12,345,678.9

Implementation analysis:

First, depending on the requirements, it is possible to replace some specific positions with "," and then to analyze and find the laws of these positions and abstract them out in regular expressions.

1, the left side of this position must be a number

2, this position to the right to appear "." Or end, must be a number, and the number must be a multiple of 3

3, this position is separated from the left of any number can not appear "."

From the above three, you can completely determine these locations, as long as the above three to achieve, combined with the regular expression can be.

According to the analysis, the result of the final match is a position, so all the subexpression requirements are 0 widths.

1, is the current location to the left of the additional conditions, so to use the reverse look, because the requirement must appear, so is affirmative, the subexpression in line with this condition is "(? <=\d)"

2, is the right side of the current location attached to the conditions, so to use the order to look around, is also required to appear, so is certain, is a number, and the number of multiples of 3, that is "(?: D{3})", to appear". " or end, namely "(? = (?: \ D{3}) * (?: \.| $))"

3, is the current location to the left of the additional conditions, so to use reverse look, because the request can not appear, so is the negative, that is, "(<!\.\d*)"

Because the 0-width of the subexpression is not mutually exclusive, the last match is the same position, so the order is not affect the final matching results, can be any combination, but the habit of the reverse look around write on the left, the order to look around write on the right.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.