There are four forms of Regular Expressions:
(? = Pattern) Assertion (zero-width positive lookahead assertion)
(?! Pattern) Assertion (zero-width negative lookahead assertion)
(? <= Pattern) Assertion (zero-width positive lookbehind assertion)
(? <! Pattern) Assertion (zero-width negative lookbehind assertion)
The pattern here is a regular expression.
As ^ indicates the beginning, $ indicates the end, and \ B indicates the word boundary. The predicate and predicate assertion have a similar effect. They only match certainLocationIn the matching process, it does not occupy characters, so it is called "Zero Width ". The so-calledLocationIt refers to the left side of the first character in a string (each line), the right side of the last character, and the center of the adjacent characters (assuming that the text direction is the left end and right end of the header ).
The following examples illustrate the meanings of the four assertions.
(? = Pattern) forward first assertion
Represents a position in the string,After this locationCharacter SequenceMatchingPattern.
For example, to match the string "a regular expression ",ReRe in gular, but it cannot match the RE in expression. You can use "re (? = Gular) ", this expression limits the position on the right of the RE, which is followed by gular, but does not consume the gular characters. Change the expression to" re (? = Gular). ", it will match Reg, metacharacters. Match g, and the brackets match the positions between E and G.
(?! Pattern) negative first assertion
Represents a position in the string,After this locationCharacter SequenceCannot matchPattern.
For example, if you want to match the string "RegEx represents regular expression ",ReGex andReYou can use "re (?! G) ", this expression limits the position on the right of RE, which is not followed by the character G. The difference between negative and positive lies in whether the character after the position matches the expression in brackets.
(? <= Pattern) Forward and forward assertions
Represents a position in the string,Before this locationCharacter SequenceMatchingPattern.
For example, there are four words in the string "RegEx represents regular expression". To match the RESS inside the word but not at the beginning of the word, you can use "(? <= \ W) Re ", the re in the word should be a word character before the re. The reason for this is that when the Regular Expression Engine matches strings and expressions, It scans the characters in strings one by one from the front to the back and determines whether they match the expressions, when this assertion is encountered in an expression, the regular expression engine needs to check the characters that have been scanned at the front end of the string, which is backward relative to the Scanning direction.
(? <! Pattern) assertion
Represents a position in the string,Before this locationCharacter SequenceCannot matchPattern.
For example, to match the string "RegEx represents regular expression", you can use "(? <! \ W) Re ". In this example, the re starting with a word is a re that is not in the word, that is, the Re is not a word character. Of course, you can also use "\ Bre" for matching.
To understand these four assertions, you can start from two aspects:
1. about lookahead and lookbehind: When the Regular Expression Engine matches strings and expressions, it continuously scans characters in strings from start to end (from the beginning to the end, suppose there is a scanning pointer pointing to the character boundary and moving along with the matching process. The first asserted is that when the scanning pointer is located somewhere, the engine will try to match a character that has not been scanned by the pointer and arrive at the character before the pointer, so it is called the first. The engine tries to match a character that has been scanned by the pointer and then reaches the character.
2. Positive and Negative: Positive indicates the expression in matching brackets, and negative indicates that the expression does not match.
Memory of the four assertions:
1. First and second rows: Second row assertions (? <= Pattern ),(? <! In pattern), there is a smaller sign and an arrow. For the text direction from left to right, this arrow points to the back, which is also in line with our habits. Remove the minor sign, which is the first asserted.
2. Positive and Negative: not equal (! =), Non-logical (!) All are used! Number, so there is! No. Indicates mismatch or negative; yes! If the number is changed to =, it indicates matching and positive.
We often use regular expressions to detect a substring in a string. It is easy to indicate that a string does not contain a character or some characters. Use [^...] format. To indicate that a string does not contain a substring (composed of character sequences?
[^...] Does not work. In this case, the (negative) predicate or use both.
For example, you can determine whether a sentence contains this, but does not contain that.
It is easier to include this. If a sentence does not contain that, you can think that the first character of this sentence is not that, or that is not followed by each character. The regular expression is as follows:
^ ((? <! That).) * This ((? <! That).) * $ or ^ (.(?! That) * This (.(?! That) * $
For the "this is the case" clause, both expressions can be matched successfully, while the "note that this is the case" clause fails to match.
In general, these two expressions can basically meet the requirements. Consider extreme situations. For example, when a sentence starts with "that", ends with "that", and "that" and "this" are connected together, the above expressions may be not competent.
For example, "note thatthis is the case" or "this is the case, not that.
As long as you use these assertions flexibly, it is easy to solve:
^ (.(? <! That) * This (.(? <! That) * $
^ (.(? <! That) * This ((?! That).) * $
^ ((?! That).) * This (.(? <! That) * $
^ ((?! That).) * This ((?! That).) * $
The results of the above four regular expressions can meet the requirements.
The pattern in parentheses is a regular expression. HoweverAssertionLimitation: in Perl and python, this expression must be fixed length, that is, it cannot use *, + ,? Metacharacters, such (? <= ABC) No problem, (? <= A * BC) is not supported, especially when the expression contains | connected branches, each branch must have the same length. Variable-length expressions are not supported because the engine cannot determine the number of steps to be traced back when the engine checks the assertions. Java support? , {M}, {n, m}, and other symbols, but also does not support *, + characters. Javascript simply does not support post-statement assertions, but in general, this is not a big problem. See here.
Test example
Sole sorry chilly high tight laughter
Matching H
Matching h with T
Matching h not followed by T
Match h with G
Match H that is not G
To some extent, predicate assertions and predicate assertions are like verifying the matching characters before and after using the if statement.