Python regular expression: Greedy/non-greedy

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the greedy and non-greedy features in Python regular expressions. The example code in this article is very detailed and has some reference value for everyone, for more information, see. I have briefly introduced the basics and capturing of Python regular expressions. in this article, I will summarize the greedy/non-greedy features of regular expressions.

Greedy

By default, regular expressions are greedy. The so-called "greedy" is actually the one with a long length among multiple matching strings. For example, the following regular expression is intended to select what a character says, but due to the "greedy" feature, improper matching occurs:

>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'"(.*)"', sentence)['why?" and I say "I don\'t know']

For example, the following examples illustrate the "greedy" feature of the regular expression:

>>> re.findall('hi*', 'hiiiii')['hiiiii']>>> re.findall('hi{2,}', 'hiiiii')['hiiiii']>>> re.findall('hi{1,3}', 'hiiiii')['hiii']

Non-greedy

When we expect the regular expression to be "non-greedy" for matching, we need to clearly explain through the syntax:

{2,5}? Matching with 2-5 times but with a low priority

Here, question mark? Some people may be dizzy, because he already has his own meaning: the previous match appears 0 or 1 time. In fact, you only need to remember that when the question mark appears after the part of the regular expression that represents an indefinite number of times, it indicates non-greedy match.

In the above examples, the results of non-greedy matching are as follows:

>>> re.findall('hi*?', 'hiiiii')['h']>>> re.findall('hi{2,}?', 'hiiiii')['hii']>>> re.findall('hi{1,3}?', 'hiiiii')['hi']

In another example, non-greedy match is used, and the result is as follows:

>>> sentence = """You said "why?" and I say "I don't know".""">>> re.findall(r'"(.*?)"', sentence)['why?', "I don't know"]

Capture and non-greedy

Strictly speaking, this part is not greedy. However, because its behavior is similar to non-greedy, it is put together for convenience of memory.

(?=abc) Capture, but do not consume characters, and match abc

(?!abc)Capture, no consumption, and does not match abc

In the process of regular expression matching, there is actually a process of "consuming characters", that is, once a character is retrieved (consumed) during the matching process, the matching will not retrieve this character any more.

Do you know how to use this feature? Examples are also used. For example, we want to find words that appear more than once in a string:

>>> sentence = "Oh what a day, what a lovely day!">>> re.findall(r'\b(\w+)\b.*\b\1\b', sentence)['what']

This regular expression obviously cannot complete the task. Why? The reason is that when the first (\ w +) matches what, and the subsequent \ 1 matches the second what, "Oh what a day, "what" is consumed by the regular expression, so the subsequent matching will start directly after the second "what. Naturally, only one word that appears twice can be found here.

Then the solution is as follows (? = Abc) syntax related. This syntax does not consume strings when grouping matches! Therefore, the correct writing method should be:

>>> re.findall(r'\b(\w+)\b(?=.*\b\1\b)', sentence)['what', 'a', 'day']

If we need to match a word that contains at least two different letters, we can use (?! Abc) syntax:

>>> re.search(r'([a-z]).*(?!\1)[a-z]', 'aa', re.IGNORECASE)>>> re.search(r'([a-z]).*(?!\1)[a-z]', 'ab', re.IGNORECASE)<_sre.SRE_Match object; span=(0, 2), match='ab'>

For more Python regular expressions: articles on greedy/non-greedy features, please follow the PHP Chinese network!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python regular expression: Greedy/non-greedy

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python regular expression: Greedy/non-greedy

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support