Regular expression 0 Wide assertion detailed _ Regular expression

Source: Internet
Author: User
Tags assert uppercase letter expression engine

Regular expression 0-wide assertion:

0-wide assertion is a difficult point in regular expressions, so this chapter focuses on the analysis of the matching principle. The 0-wide assertion also has other names, such as "Look Around" or "pre-search," and so on, but these are not the focus of our attention.

I. Basic concepts:

The 0-wide assertion, like its name, is a zero-width match, and its matching content is not saved to the matching result, and the final match is only a single position.
The effect is to add a qualification to the specified position to specify that characters before or after this position must meet the qualification to make the word expression in the regular match successful.
Note: The subexpression described here is not only an expression enclosed in parentheses, but an arbitrary matching element in a regular expression.
JavaScript supports only 0-wide lookahead assertions, while 0-wide-predicate assertions can be divided into positive 0-width-first assertions and negative 0-wide lookahead assertions.

The code example is as follows:

Instance code one:

var str= "abZW863";
var reg=/ab (? =[a-z])/;
Console.log (Str.match (reg));

In the above code, the semantics of the regular expression are: match the string "AB" followed by any uppercase letter. The final match result is "AB" because the 0-wide assertion "(? =[a-z])" does not match any characters, but is used to specify that the current position must be followed by an uppercase letter.

Instance code two:

var str= "abZW863";
var reg=/ab (?!) [A-z] /;
Console.log (Str.match (reg));

The semantics of the regular expression in the above code are: match the string "ab" that does not follow any uppercase letters. The regular expression does not match any characters, because in the string, the back of AB follows a capital letter.

Two. Matching principle:

The code above simply describes how the 0-wide assertion is matched in a conceptual way.
Here's how to match the positive 0 width assertion and the negative 0 width assertion separately, in the form of a matching principle.
1. Positive 0 Wide assertion:
The code example is as follows:

var str= "<div>antzone";
var reg=/^ (?=<) <[^>]+>\w+/;
Console.log (Str.match (reg));

The matching process is as follows:
The first is to gain control by "^" in the regular expression. The first match starts with position 0, it matches the start position 0, the match succeeds, then the control is transferred to "(?=<)", and since "^" is 0 wide, "(?=<)" is also starting at position 0, It requires that the right side of the location must be the character "<", the right side of position 0 happens to be the character "<", the match succeeds, then control is transferred to "<", because "(?=<)" is also 0 wide, so it is also starting from position 0 match, so the match is successful, The following matching process is not introduced.

2. Negative to 0 wide assertion:

The code example is as follows:

var str= "abzw863ab88"; 
var reg=/ab (?!) [A-z] /g; 
Console.log (Str.match (reg));

The matching process is as follows:
Gets control first by the character "A" of the regular expression, starting at position 0, the match character "A" succeeds, then the control is transferred to "B", starting at position 1, the character "B" succeeds, then control is transferred to "(?!). [A-z] ", it starts at position 2 and matches, it requires that the right side of the location cannot be any uppercase letter, and that the right side of the position is a capital letter" Z ", the match fails, then the control is returned to the character" a ", and the attempt begins at position 1, the match fails, and then the control is again given to the Try to match from position 2, still fail, so try again and again until the match succeeds from position 7, then transfer control to "B", then try to match from position 8, match successfully, then transfer control to "(?! [A-z], which starts at position 9 to attempt a match, which stipulates that the right side of the location is not uppercase and the match succeeds, but it does not actually match the character, so the final match is "AB".

The following are supplementary

A 0-wide assertion is a method in a regular expression, in computer science, a single string that describes or matches a series of strings that conform to a certain syntactic rule.

Definition explanation

0-Wide assertion is a method in regular expressions
In computer science, a regular expression is a single string that describes or matches a series of strings that conform to a certain syntactic rule. In many text editors or other tools, regular expressions are often used to retrieve and/or replace text content that conforms to a pattern. Many programming languages support the use of regular expressions for string manipulation. For example, in Perl, a powerful regular expression engine was built. The concept of regular expressions was initially popularized by tool software (such as SED and grep) in Unix. Regular expressions are usually abbreviated as "regex", singular with regexp, regex, plural with regexps, regexes, Regexen.

0 Wide Assertion

Used to find something before or after something, but not including it, that is, they are used like \b,^,$ to specify a position that satisfies certain conditions (that is, assertions), so they are also called 0-wide assertions. It's best to take an example: assertions are used to declare a fact that should be true. A regular expression will continue to match only if the assertion is true.

(? =exp) is also called the 0-width positive lookahead assertion, which asserts that the position in which it appears is followed by the expression exp. For example, \b (? =re) \w+\b matches the back part of a word that begins with re (except for the part of re), such as finding reading a book. It matches ading.

var reg = new Regex (@ "\w+ (? =ing)");
var str = "muing";
Console.WriteLine (Reg. Match (str). Value);/return mu

(? <=exp) also known as the 0 width is reviewed later to assert that it asserts that the position itself appears to precede the expression exp. For example, \b\w+ (<=ing\b) matches the first half of the word with ing ending (except for part ing), for example, when looking for I am reading.

If you want to add a comma to every three digits in a very long number (plus, of course, from the right), you can look for parts that need commas in front and inside: (? =\d) \d{3}) +\b, the result is 234567890 when you use it to find 1234567890.
The following example uses both assertions: (? <=\s) \d+ (? =\s) matches numbers that are separated by whitespace (again emphasizing, excluding these whitespace characters).

Negative 0 Wide Assertion

We mentioned earlier how to find a method that is not a character or a character that is not in a certain character class (antisense). But what if we just want to make sure that a character doesn't appear, but doesn't want to match it? For example, if we want to find a word that has the letter Q in it, but Q is not followed by the letter u, we can try this:

\b\w*q[^u]\w*\b matches words that contain the letter Q that is not followed by the letter U. But if you do more testing (or if you're sensitive enough to see it directly), you'll find that if Q appears at the end of the word, like Iraq,benq, the expression will go wrong. This is because [^u] always matches one character, so if Q is the last character of the word, then [^u] will match the word delimiter after Q (possibly a space, or a period or something), and the \w*\b will match the next word, so \b\w*q[^u]\w*\ B will be able to match the entire Iraq fighting. A negative 0-wide assertion solves such a problem because it matches only one location and does not consume any characters. Now, we can solve this problem like this: \b\w*q. u) \w*\b.

0 width Negative lookahead assertion (?!) EXP), asserting that the expression exp is not matched at the back of this position. For example: \d{3} (?! \d) matches three digits, and this three-digit number cannot be followed by a number; \b (?! ABC) +\b matches words that do not contain the \w string ABC.
Similarly, we can use the (? <!exp), 0-width negative review to assert that the front of this position cannot match the expression Exp: (? <![ A-z]) \d{7} matches a seven-digit number that is not preceded by a lowercase letter.

A more complex example: (?<=< (\w+) >). * (?=<\/\1>) matches the contents of a simple HTML tag that does not contain attributes. (<?= (\w+) >) specifies a prefix that is enclosed in angle brackets (for example, a possible <b>) followed by a. * (any string) and finally a suffix (?=<\/\1>). Notice the \/in the suffix, which uses the previously mentioned word escape \1 is a reverse reference, which refers to the first set of captures, the preceding (\w+) matching content, so that if the prefix is actually <b>, the suffix is </b>. The entire expression matches the content between <b> and </b> (reminders again, excluding prefixes and suffixes themselves).

The above looked a little nerve-racking ah. Here's a little something to add

Assertions are used to declare a fact that should be true. A regular expression will continue to match only if the assertion is true.
The next four are used to find things before or after certain content, but not the content, that is, they are used as \b,^,$ to specify a position that satisfies certain conditions (that is, assertions), so they are also called 0-wide assertions. It's best to take an example to illustrate:

(?=exp)also known as the 0-width positive lookahead assertion, it asserts that the position in which it appears is followed by the expression exp. For example \b\w+ (? =ing\b), matches the front part of the word with ing ending (except for the part of ING), such as finding I ' m singing while you ' re dancing. When it matches sing and Danc.
(?<=exp)also called 0 width is reviewed later to assert that it asserts that the position itself appears to precede the expression exp. For example (? <=\bre) \w+\b matches the second half of a word that begins with re (except for parts other than re), for example, when looking for reading a book, it matches ading.

If you want to add a comma to every three digits in a very long number (plus, of course, from the right), you can look for parts that need commas in front and inside: (? <=\d) \d{3}) *\b, the result is 234567890 when you use it to find 1234567890.
The following example uses both assertions: (? <=\s) \d+ (? =\s) matches numbers that are separated by whitespace (again emphasizing, excluding these whitespace characters).

Add two:

The most recent source processing for HTML files requires a regular lookup and replacement. So by this opportunity to the regular system to learn about, although used to have a regular, but each time is a temporary learning mixed pass. In the process of learning or encountered a lot of problems, especially the 0 wide assertion (here also to spit, the web is everywhere are copied paste content, encountered a problem to see a lot of repetitive things, Khan!!! ), so here to write down their own understanding, convenient to check later!

What is the 0-width positive lookahead assertion, see the official Interpretation definition on MSDN

(? = sub-expressions)

(0-width positive lookahead assertion.) ) continues to match only if the subexpression on the right side of this position matches. For example, \w+ (? =\d) matches a word followed by a number, and does not match the number.

Classic example: A word with ing ending, to get the content of ING before

var reg = new Regex (@ "\w+ (? =ing)");
var str = "muing";
Console.WriteLine (Reg. Match (str). Value);/return mu

The above is an example that can be seen all over the Internet, and here you can see that you have returned the previous content of the EXP expression.

And look at the code below.

var reg = new Regex (@ "A (? =b) C");
var str = "abc";
Console.WriteLine (Reg. IsMatch (str));//return False

Why does it return false?

In fact, the MSDN official definition has already been said, but it is very official. Here we need to pay attention to a key point: this location. Yes, it's a position, not a character. Then combine the official definition with the first example to understand the second example:

Since A is followed by B, the match is returned at this time (known by the first example, which returns only a does not return an exp match), at which point A (?) =b) C in the A (?) =b) section has been resolved, and then to solve the C match problem, where the match C to start from the string ABC where, combined with the official definition , you know it starts right from the position of the subexpression, then it starts with the position of B, but B does not match C in the remainder of a (? =b) C, so ABC does not match a (? =b) c.

So if you want to match the above, how should you write it?

The answer is:a(?=b)bc

Of course, some people will say that the direct ABC match, but also this toss it? Of course not so toss, just to illustrate the 0 width of the positive lookahead assertion what is the matter? About the other 0 wide assertions are the same principle!

Add three

(? =exp): 0 width Positive lookahead assertion, which asserts that the position in which it appears will match the expression exp.

#匹配后面为_path, the result is product
' Product_path '. Scan/(product) (? =_path)/

(? <=exp): 0 width is looking back at the assertion that it asserts itself to appear in front of the position to match the expression exp

#匹配前面为name: The result is Wangfei
' Name:wangfei '. Scan/(? <=name:) (Wangfei)/#wangfei

(?! EXP): 0 width Negative lookahead assertion asserts that this position cannot be followed by an expression exp.

#匹配后面不是_path
' Product_path '. Scan/(product) (?! _path)/#nil
#匹配后面不是_url
' Product_path '. Scan/(product) (?! _url)/#product

(? <!exp): 0 width Negative review post assertion to assert that the front of this position cannot match the expression exp

#匹配前面不是name:
' Name:angelica '. Scan/(? <!name:) (Angelica)/#nil
#匹配前面不是nick_name:
' Name:angelica '. Scan/(? <!nick_name:) (Angelica)/#angelica

Small knitting is also fed up with this thing, and so have good things to share, today wash and sleep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.