Myessay python Regular Expression--understanding of four assertion extensions

Source: Internet
Author: User
Tags expression engine



We often use regular expressions to detect a string containing a substring, to indicate that a string does not contain a single character or some character is also very easy to use [^ ...] The form is ready. However, to indicate that a string does not contain a substring (consisting of a sequence of characters), use the [^ ...] This form will not work, it is necessary to use the extension of the four regular expressions to match, that is, the so-called "forward line matching" (? = ...), "negative forward matching" (?!  ...), "positive back row matching" (? <= ...) , "negative back row matching" (?<!...). one of the ... It can also be any valid regular matching string . Like \b Word boundary special characters, these four assertion expressions themselves do not consume any of the character widths in the matched string, but only match one position.

For the understanding and memory of these 4 assertions, we can learn from the description of the

    1. The so-called forward (lookahead) and lookbehind, in fact, is to look forward and backward to see the meaning. When the regular expression engine performs string and expression matching, the characters in the string are continuously scanned from start to finish (from front to back), and a scan pointer is assumed to point at the character boundary and move along with the matching process. The forward assertion is that when the scan pointer is in a position, the engine tries to match a character that is not yet swept by the pointer before the pointer arrives at the character, so called the forward. After the assertion, the engine tries to match the characters that the pointer has swept, and then the pointer arrives at the character, so it is called a cross.

      Memory mode: After the assertion (? <=pattern), (? <!pattern), there is an arrow-like less than, for from left to right text direction, this arrow refers to the backward, which is more in line with our habits. To remove the less than the number is the forward assertion

    2. The so-called forward (positive) and negative (negative): forward means matching the expressions in parentheses, and negative representations do not match.

      Memory mode: Not equal to (! =), logical non (!) It's all used! Number to indicate, so there is! The form of the number indicates a mismatch, a negative direction; change the number to A = sign to indicate a matching, positive direction.

One thing we need to be aware of in particular is that two assertions (? <= ...) for the After way and (?<!...), of which the ... The expression matches the content must be fixed-length, which means that the following assertion of the matching expression can not contain *,? and the + symbol, which brings some difficulties and troubles to the use of the assertion. The following is a concrete example of the use of these assertions (the matching string in the following instance and the spaces in the pattern string in order to be explicitly displayed, using the character to represent):

line0 = '? #?def??? Func (FuncName, Funcparam, functime=) '
Line1 = '? Def??? Func (FuncName, Funcparam, functime=) '
Line2 = "???? Obj1 (param). Func (' func1 ', ' param1 ', functime=150) # test "
Line3 = "?? Obj2 (). Functest (1) # obj1 (param). Func (' func1 ', ' param1 ') "

we want the string to contain a call to the function func (), i.e. the "func" string appears in the test line, but does not contain the definition of Func for the function in the line being tested, that is, "Def func (" String, and Def and Func May contain more than one space. In the most straightforward way, to match the "Func" string, and the string "def\s+" in the "func" mode, first consider using the backward-looking method, that is, negative backward-line matching to apply to line1, that is, Re.findall (r "(? <!def?) \s*func\ (", line1) (with a space after Def) is expected not to get a match, and the result will be an empty list, but what actually gets is:


>>> Re.findall (R <!def) \s*func\ (", line1)
[‘??? Func (']



three spaces before "Func"; The reason is that the re engine will try to find a "\s*func\" pattern string, and the "Def?" string will not appear in front of the string (Def has a space after it), containing three preceding spaces "??? Func ("Just enough to satisfy the condition, first it can match the" \s*func\ "pattern, and this string precedes the" def "string without spaces, rather than the " Def? "represented in the negative backward line matching assertion (? <!def?). (Def contains a space).



Then try to remove the space after the def in the negative backward line matching assertion, that is, modify it to Re.findall ("(? <!def) \s*func\ (", line1), and what is the result? The measured results are:



>>> Re.findall (R <!def) \s*func\ (", line1)
[‘?? Func (']



two spaces before "func"--careful analysis will find that this is because The reason is that the re engine will try to find a "\s*func\" pattern string, and there will be No "def" string in front of the string (no space after Def), containing 2 of the "?? Func ("Just satisfies the condition because it contains 2 spaces"?? Func ("string can Match" \s*func\ ("), and this string precedes the" Def? "string followed by a space, instead of" def "with no trailing spaces represented in the negative backward line matching assertion pattern" (? <!def) "




then try to use \s+ after def in the negative back row matching assertion, which is modified to re.findall ("(? <!def\s+) func\ (", line1), Logically covers the number of indeterminate sky between Def and Func, but because the two assertions of the following way require that the content of the matching expression string be fixed, the result of this execution will be the Python interpreter error "error: Look-behind requires fixed-width pattern "




-So, for a line1 that contains three spaces between Def and Func, to achieve a match with a negative backward-line assertion, you must use the Def Re.findall ("(? <!def???) with three spaces and no spaces before func. Func\ (", line1) or def has no space but Func has three spaces before it re.findall (" (? <!def) \s{3}func\ (", line1) in the form of a need to know exactly what Def and How many spaces are there between func, but because the Python syntax does not specify the number of spaces between def and function names, using negative backward-line assertions is actually a string that cannot exactly match the unknown number of spaces between Def and Func.



So we can only consider taking negative forward assertions to achieve exact matches, i.e. Re.findall ("^ (?!. *def\s+func\ (). *func\ (", line1), execute the resulting empty list [], and we use positive forward row assertions to verify that our matching string is used correctly, i.e. execute re.findall (" ^ (? =.*def\s+ Func\ (). *func\ (", line1), resulting in [' Def func (']



>>> Re.findall ("^ (?!. *def\s+func\ (). *func\ (", line1)
[]
>>> Re.findall ("^ =.*def\s+func\ (). *func\ (", line1)
['? Def??? Func (']



-This shows that our negative forward assertion precisely matches the case where there is an indefinite length of space between Def and Func.
Here again to explain the meaning of the negative forward assertion:"^ (?!. *def\s+func\ (). *func\ ("indicates a backward search from the start of line, not allowed". *def\s+func\ ("String of this pattern, but try to look for a string that matches the". *func\ ("pattern in this context. This is exactly what we would like the filter to be. here (?!. *def\s+func\ () is not consumed by any string length



It is important to note that there are two other kinds with Re.findall ("^ (?!. *def\s+func\ (). *func\ (", line1) very close to the matching pattern:



1. If you are using re.findall ("^ (?!! Def\s+func\ (). *func\ (", line1), the result of execution will not be the expected empty list, but rather [' Def??? Func ('], which is because of this notation, the re engine will try to search for the existence of a starting position that is not "def\s+func\ (" but ". *func\ ("), but there is a space in front of "def" in Line1, So the re engine found that the search from the starting position was a "? Def\s+func\ (" pattern string with a preceding space, instead of a negative forward expression with no spaces in the "Def\s+func\ (" pattern string, so the match succeeds.)



2, if the use of Re.findall ("(?!. *def\s+func\ (). *func\ (", line1), the result of execution will not be the expected empty list, but the [' EF??? Func ('], this is because if the ^ character is not in the pattern, it is not required that the line1 must satisfy the match condition from the beginning, but the match condition can be satisfied anywhere in the line1, so the EF??? in line1 Func ("This string will satisfy the matching criteria

-- in summary, it is recommended to try a regular match "when there is no yyy before xxx, and there may be other indefinite long strings between XXX and yyy, it is preferable to use negative forward assertion; for the ability to determine the length between XXX and yyy, You can use negative back row assertions




For example, consider matching the "func" string in the Line3, requiring that the # sign cannot appear before "Func", that is, calling statements requiring the Func function are not commented out, since # and func (the length of the characters between is completely random unknown, so you should use negative to forward the assertion method Re.findall ("^ (?!. *#.*func\ (). *func\ (", line3), not Re.findall (" (? <!#). *func\ (", Line3)



Myessay python Regular Expression--understanding of four assertion extensions


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.