The use and difference of greedy and non-greedy patterns in Python's regular expressions

Source: Internet
Author: User
Regular expressions are not unique to Python, but are a set of independent grammars that are supported by many programming languages. The regular expression syntax used in different languages is not exactly the same, but broadly similar. This article focuses on the use and difference of greedy and non-greedy patterns. By default, regular expressions are matched in greedy mode, that is, to match as much of the content as possible to match. For example:

In the preceding code, the first \b in the regular expression represents a matching word header followed by a letter B that matches a word that begins with the letter B, followed by a dot. Indicates that any character (including spaces) is matched, and the plus sign + indicates that any preceding character appears one or more times, and the last \b represents the end of the word. So the question comes, what is the end of the word? Both whitespace and punctuation are counted at the end of the word, but the regular expression defaults to greedy mode, which matches as much content as possible, so the above code matches the end of the last word in the text.

How do you match a word that starts with the letter B instead of the one above? You can use non-greedy mode. Non-greedy mode is using the question mark "?" Complete, in a regular expression, if the question mark is preceded by a normal character or sub-pattern, the character or sub-pattern preceding the question mark may or may not appear. However, if the question mark follows the contents of +, * and {m,n}, it represents a non-greedy pattern, that is, matching as little content as possible. Take the above problem as an example and change to a non-greedy mode, for example:

The following code further demonstrates the difference between greedy and non-greedy modes:

Of course, back to the original question in this article, if just to match the word starting with the letter B, it is not so troublesome to use \w directly, because \w can only match letters, numbers, or underscores, and cannot match spaces. For example:

Related recommendations:

The application of greedy algorithm and non-greedy algorithm in the regular expression re in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.