This article mainly introduces the greedy and non-greedy features in Python regular expressions. The example code in this article is very detailed and has some reference value for everyone, for more information, see. I have briefly introduced the basics and capturing of Python regular expressions. in this article, I will summarize the greedy/non-greedy features of regular expressions.
Greedy
By default, regular expressions are greedy. The so-called "greedy" is actually the one with a long length among multiple matching strings. For example, the following regular expression is intended to select what a character says, but due to the "greedy" feature, improper matching occurs:
>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'"(.*)"', sentence)['why?" and I say "I don\'t know']
For example, the following examples illustrate the "greedy" feature of the regular expression:
>>> re.findall('hi*', 'hiiiii')['hiiiii']>>> re.findall('hi{2,}', 'hiiiii')['hiiiii']>>> re.findall('hi{1,3}', 'hiiiii')['hiii']
Non-greedy
When we expect the regular expression to be "non-greedy" for matching, we need to clearly explain through the syntax:
{2,5}?
Matching with 2-5 times but with a low priority
Here, question mark? Some people may be dizzy, because he already has his own meaning: the previous match appears 0 or 1 time. In fact, you only need to remember that when the question mark appears after the part of the regular expression that represents an indefinite number of times, it indicates non-greedy match.
In the above examples, the results of non-greedy matching are as follows:
>>> re.findall('hi*?', 'hiiiii')['h']>>> re.findall('hi{2,}?', 'hiiiii')['hii']>>> re.findall('hi{1,3}?', 'hiiiii')['hi']
In another example, non-greedy match is used, and the result is as follows:
>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'"(.*?)"', sentence)['why?', "I don't know"]
Capture and non-greedy
Strictly speaking, this part is not greedy. However, because its behavior is similar to non-greedy, it is put together for convenience of memory.
(?=abc)
Capture, but do not consume characters, and match abc
(?!abc)
Capture, no consumption, and does not match abc
In the process of regular expression matching, there is actually a process of "consuming characters", that is, once a character is retrieved (consumed) during the matching process, the matching will not retrieve this character any more.
Do you know how to use this feature? Examples are also used. For example, we want to find words that appear more than once in a string:
>>> sentence = "Oh what a day, what a lovely day!">>> re.findall(r'\b(\w+)\b.*\b\1\b', sentence)['what']
This regular expression obviously cannot complete the task. Why? The reason is that when the first (\ w +) matches what, and the subsequent \ 1 matches the second what, "Oh what a day, "what" is consumed by the regular expression, so the subsequent matching will start directly after the second "what. Naturally, only one word that appears twice can be found here.
Then the solution is as follows (? = Abc) syntax related. This syntax does not consume strings when grouping matches! Therefore, the correct writing method should be:
>>> re.findall(r'\b(\w+)\b(?=.*\b\1\b)', sentence)['what', 'a', 'day']
If we need to match a word that contains at least two different letters, we can use (?! Abc) syntax:
>>> re.search(r'([a-z]).*(?!\1)[a-z]', 'aa', re.IGNORECASE)>>> re.search(r'([a-z]).*(?!\1)[a-z]', 'ab', re.IGNORECASE)<_sre.SRE_Match object; span=(0, 2), match='ab'>
For more Python regular expressions: articles on greedy/non-greedy features, please follow the PHP Chinese network!