Regular expressions in Python

Source: Internet
Author: User

Python's regular expressions are explained in more detail in http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html.

This article needs to add the following questions:

1. Greedy and non-greedy in Python regular expressions:

Python regular expressions are greedy by default, meaning as many matches as possible. If you want to make the regular expression non-greedy, can be added qualifier?.

*?, +? 、??、 and {m,n}?, so that the match is as small as possible.

2. The surround structure in regular expressions:

The surround structure does not match any characters and matches only specific positions in the text.

Type Regular expressions Criteria for matching success
Affirmative reverse look (? <= ...) Subexpression can match the left-hand text
Negative reverse look (?<!...) Sub-expression does not match left text
A certain sequence of look (?=...) Sub-expressions can match the right text
Negative order Look around (?! ...) Sub-expression does not match the right text

3. Solve a problem that matches a word that does not start with any of the letters in "ABC":

You need to use a look around, starting at the beginning of the word, to match the starting position of the word with \b. Do not start with any letter in "ABC", that is, the right of the beginning of the word is not ' a ', ' B ', ' C ', the final regular expression is: \b (?! [ABC]) \w+

4. Regular expression pattern:

Pattern Description

^ matches the beginning of the string

$ matches the end of the string.

. matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.

[...] used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '

[^...] characters not in []: [^ABC] matches characters other than a,b,c.

re* matches 0 or more expressions.

Re+ matches 1 or more expressions.

Re? matches 0 or 1 fragments defined by a preceding regular expression, greedy way

re{N}

re{N,} exactly matches n preceding expressions.

re{N, m} matches n to M times the fragment defined by the preceding regular expression, greedy way

a| b matches a or b

(RE) The G matches the expression in parentheses, and also represents a group

(? imx) The regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.

(?-imx) The regular expression closes I, M, or x optional flag. Affects only the areas in parentheses.

(?: RE) a similar (...), but does not represent a group

(? imx:re) Use I, M, or x optional flag in parentheses

(?-imx:re) I, M, or x optional flags are not used in parentheses

(?#...) comments.

(? = re) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.

(?! Re) forward negative qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string

(?> re) match the standalone mode, eliminating backtracking.

\w match alpha-numeric

\w matches non-alphanumeric numbers

\s matches any whitespace character, equivalent to [\t\n\r\f].

\s matches any non-null character

\d matches any number, equivalent to [0-9].

\d matches any non-numeric

\a Match string start

\z matches the end of the string, if there is a newline, matches only the end string before the line break. C

\z Match string end

\g matches the position where the last match was completed.

\b matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.

\b matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.

\ n, \ t, and so on. Matches a line break. Matches a tab character. such as

\1...\9 The sub-expression of the nth grouping of the match.

\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.


Regular expressions in Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.