Python regular expression: Greedy/non-greedy

Source: Internet
Author: User
This article mainly introduces the greedy and non-greedy features in Python regular expressions. The example code in this article is very detailed and has some reference value for everyone, for more information, see. I have briefly introduced the basics and capturing of Python regular expressions. in this article, I will summarize the greedy/non-greedy features of regular expressions.

Greedy

By default, regular expressions are greedy. The so-called "greedy" is actually the one with a long length among multiple matching strings. For example, the following regular expression is intended to select what a character says, but due to the "greedy" feature, improper matching occurs:

>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'"(.*)"', sentence)['why?" and I say "I don\'t know']

For example, the following examples illustrate the "greedy" feature of the regular expression:

>>> re.findall('hi*', 'hiiiii')['hiiiii']>>> re.findall('hi{2,}', 'hiiiii')['hiiiii']>>> re.findall('hi{1,3}', 'hiiiii')['hiii']

Non-greedy

When we expect the regular expression to be "non-greedy" for matching, we need to clearly explain through the syntax:

{2,5}? Matching with 2-5 times but with a low priority

Here, question mark? Some people may be dizzy, because he already has his own meaning: the previous match appears 0 or 1 time. In fact, you only need to remember that when the question mark appears after the part of the regular expression that represents an indefinite number of times, it indicates non-greedy match.

In the above examples, the results of non-greedy matching are as follows:

>>> re.findall('hi*?', 'hiiiii')['h']>>> re.findall('hi{2,}?', 'hiiiii')['hii']>>> re.findall('hi{1,3}?', 'hiiiii')['hi']

In another example, non-greedy match is used, and the result is as follows:

>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'"(.*?)"', sentence)['why?', "I don't know"]

Capture and non-greedy

Strictly speaking, this part is not greedy. However, because its behavior is similar to non-greedy, it is put together for convenience of memory.

(?=abc) Capture, but do not consume characters, and match abc

(?!abc)Capture, no consumption, and does not match abc

In the process of regular expression matching, there is actually a process of "consuming characters", that is, once a character is retrieved (consumed) during the matching process, the matching will not retrieve this character any more.

Do you know how to use this feature? Examples are also used. For example, we want to find words that appear more than once in a string:

>>> sentence = "Oh what a day, what a lovely day!">>> re.findall(r'\b(\w+)\b.*\b\1\b', sentence)['what']

This regular expression obviously cannot complete the task. Why? The reason is that when the first (\ w +) matches what, and the subsequent \ 1 matches the second what, "Oh what a day, "what" is consumed by the regular expression, so the subsequent matching will start directly after the second "what. Naturally, only one word that appears twice can be found here.

Then the solution is as follows (? = Abc) syntax related. This syntax does not consume strings when grouping matches! Therefore, the correct writing method should be:

>>> re.findall(r'\b(\w+)\b(?=.*\b\1\b)', sentence)['what', 'a', 'day']

If we need to match a word that contains at least two different letters, we can use (?! Abc) syntax:

>>> re.search(r'([a-z]).*(?!\1)[a-z]', 'aa', re.IGNORECASE)>>> re.search(r'([a-z]).*(?!\1)[a-z]', 'ab', re.IGNORECASE)<_sre.SRE_Match object; span=(0, 2), match='ab'>

For more Python regular expressions: articles on greedy/non-greedy features, please follow the PHP Chinese network!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.