Detailed explanation of the assertion of the regular expression zero-width, and assertion of the Regular Expression

Source: Internet
Author: User
Tags expression engine

Detailed explanation of the assertion of the regular expression zero-width, and assertion of the Regular Expression

Assertion with zero width is a method in regular expressions. in computer science, a regular expression is a single string used to describe or match a series of strings that conform to certain syntax rules.

Definition

Assertion with zero width is a method in regular expressions.
In computer science, regular expressions refer to a single string used to describe or match a series of strings that conform to certain syntax rules. In many text editors or other tools, regular expressions are usually used to retrieve and/or replace text content that meets a certain pattern. Many programming languages Support string operations using regular expressions. For example, a powerful Regular Expression Engine is built in Perl. The concept of regular expressions was initially popularized by tools in Unix (such as sed and grep. Regular Expressions are abbreviated as "regex". The singular values include regexp and regex, And the plural values include regexps, regexes, and regexen.

Assertion with Zero Width

Used to find things before or after some content (but not including the content), that is, they are used to specify a location like \ B, ^, $, this position should satisfy certain conditions (that is, assertion), so they are also called assertion with zero width. We 'd better illustrate it with examples: assertions are used to declare a fact that should be true. In a regular expression, matching continues only when the assertions are true.

(? = Exp) is also called a zero-width positive prediction predicate. It asserted that the position where it appears can match the expression exp. For example, \ B (? = Re) \ w + \ B, matching the back part of the word starting with re (other than re). For example, it will match ading when searching for reading a book.

Var reg = new Regex (@ "\ w + (? = Ing) "); var str =" muing "; Console. WriteLine (reg. Match (str). Value); // returns mu

(? <= Exp) is also called the zero-width positive review and then asserted that it can match the expression exp in front of its own position. For example, \ B \ w + (? <= Ing \ B) matches the First Half of the word ending with ing (other than ing). For example, when I am reading. is searched, it matches read.

If you want to add a comma (, of course, from the right side) to each of the three digits in a long number, you can search for the parts that need to be added with a comma :((? = \ D) \ d {3}) + \ B. When you use it to search for 1234567890, the result is 234567890.
The following example uses both assertions :(? <= \ S) \ d + (? = \ S) match the numbers separated by spaces (emphasize again, do not include these spaces ).

Assertion with negative Zero Width

We have previously mentioned how to find out characters that are not a character or are not in a character class ). But what if we only want to ensure that a character does not appear, but do not want to match it? For example, if we want to find such a word, which contains the Letter q, but q is not followed by the letter u, we can try this:

\ B \ w * q [^ u] \ w * \ B matches a word that contains the Letter q, not the letter u. But if you do more tests (or if you are keen enough, you can simply observe them), you will find that if q appears at the end of a word, like Iraq, Benq, this expression will cause an error. This is because [^ u] Always matches one character, so if q is the last character of a word, the [^ u] Following will match the word separator (which may be a space, a full stop or something else) after q, And the \ w * \ B Following will match the next word, therefore, \ B \ w * q [^ u] \ w * \ B can match the entire Iraq fighting. The negative zero-width assertion can solve this problem because it only matches one location and does not consume any characters. Now, we can solve this problem as follows: \ B \ w * q (?! U) \ w * \ B.

0-width negative prediction predicate (?! Exp), asserted that the position is not followed by the expression exp. Example: \ d {3 }(?! \ D) match three digits, and the three digits cannot be followed by digits; \ B ((?! Abc) \ w) + \ B match words that do not contain consecutive strings abc.
Similarly, we can use (? <! Exp), zero-width negative review, and then assertion to assert that the front of this position cannot match the expression exp :(? <! [A-z]) \ d {7} matches the first seven digits that are not lowercase letters.

A more complex example :(? <= <(\ W +)> ).*(? = <\/\ 1>) matches the content in the simple HTML Tag that does not contain the attribute. (<? = (\ W +)>) specifies the prefix: The word enclosed by Angle brackets (such as <B>), and then. * (any string), followed by a suffix (? = <\/\ 1> ). Pay attention to the \/In the suffix, which uses the character escape mentioned above; \ 1 is a reverse reference, which references the first group captured, the previous (\ w +) if the prefix is <B>, the suffix is </B>. The entire expression matches the content between <B> and </B> (remind me again, excluding the prefix and suffix itself ).

The above is a bit of a headache. I would like to add one below

Assertions are used to declare a fact that should be true. In a regular expression, matching continues only when the assertions are true.
The following four items are used to search for things before or after some content (but not including the content), that is, they are used to specify a location like \ B, ^, $, this position should satisfy certain conditions (that is, assertion), so they are also called assertion with zero width. We 'd better illustrate it with examples:

(?=exp)It is also called the zero-width positive prediction predicate. It asserted that the position where it appears can be followed by the expression exp. For example, \ B \ w + (? = Ing \ B), matching the front part of the word ending with ing (except for the ing part), such as searching for I'm singing while you're dancing. it will match sing and danc.
(?<=exp)It is also called the assertion after the blank-width positive review. It assertion that the front of its own location can match the expression exp. For example (? <= \ Bre) \ w + \ B will match the second half of the word starting with re (Except re). For example, it matches ading when searching for reading a book.

If you want to add a comma (, of course, from the right side) to each of the three digits in a long number, you can search for the parts that need to be added with a comma :((? <= \ D) \ d {3}) * \ B. When it is used to search for 1234567890, the result is 234567890.
The following example uses both assertions :(? <= \ S) \ d + (? = \ S) match the numbers separated by spaces (emphasize again, do not include these spaces ).

Supplement 2:

Recently, in order to process the source code of html files, regular searches and replacement are required. So I took this opportunity to learn about the regular expression system. Although I used regular expressions before, I learned how to get through them temporarily. In the course of learning, I still encountered many problems, especially the zero-width assertion (I want to talk about it here. The content copied and pasted on the internet is everywhere, I encountered a problem and checked a lot of repeated things. Sweat !!!), So write down your understanding here for future reference!

What is positive prediction with Zero Width first? refer to the official explanations on msdn.

(? = Subexpression)

(0-width positive prediction first asserted .) The child expression continues matching only when it matches the right side of the position. For example, \ w + (? = \ D) matches the word followed by a number instead of the number.

Classic example: if a word ends with ing, You need to obtain the content before ing.

Var reg = new Regex (@ "\ w + (? = Ing) "); var str =" muing "; Console. WriteLine (reg. Match (str). Value); // returns mu

The above is an example that can be seen everywhere on the Internet. Here you may understand that the original content is the content before the exp expression.

Let's look at the following code:

Var reg = new Regex (@ "(? = B) c "); var str =" abc "; Console. WriteLine (reg. IsMatch (str); // return false

Why is false returned?

In fact, the official definition of msdn has been mentioned, but it is very official. Here we need to pay attention to a key point: this location. That's right. It's a location, not a character. The second example is understood based on the official definition and the first example:

Because a is followed by B, the matching content a is returned at this time (according to the first example, only a does not return the exp Matching content). At this time, (? = B) (? = B) partially solved. Next we need to solve the matching problem of c. Where should we start to match c from the string abc? According to the official definition, we can see that it starts from the position of the subexpression to the right, that is, starting from the position of B, but B does not match (? = B) c for the remaining part of c, so abc does not match (? = B) c.

So how should we write regular expressions if we want to match them?

The answer is:a(?=b)bc

Of course, some people will say that abc will be matched directly. Is it still so hard? Of course, you don't need to worry about it. It's just to illustrate how the 0-width positive prediction asserted first? The same principle applies to other zero-width assertions!

Supplement 3

(? = Exp): a zero-width positive prediction predicate that matches the expression exp after the position where it appears.

# The matching result is "_ path" and the result is "product ".
'Product _ path'. scan/(product )(? = _ Path )/

(? <= Exp): the zero-width positive review is followed by an asserted that the front of its own location can match the expression exp.

# Match the preceding name: and the result is wangfei.
'Name: wangfei'. scan /(? <= Name :) (wangfei)/# wangfei

(?! Exp): Zero-width negative prediction first asserted that the position cannot be followed by the expression exp.

# Matching is not followed by _ path
'Product _ path'. scan/(product )(?! _ Path)/# nil
# Matching is not followed by a _ url
'Product _ path'. scan/(product )(?! _ Url)/# product

(? <! Exp): returns a null-width negative review and asserted that the expression exp cannot be matched before this position.

# Matching is not prior to name:
'Name: angelica '. scan /(? <! Name :) (angelica)/# nil
# It is not nick_name before the match:
'Name: angelica '. scan /(? <! Nick_name :) (angelica)/# angelica

This is enough for small editors. If you want to share something good, wash your bed today.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.