Python Regular Expressions

Source: Internet
Author: User
Tags character classes uppercase letter
A regular expression is a special sequence of characters that can help you easily check whether a string matches a pattern. Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern.

The RE module enables the Python language to have all the regular expression functionality.

The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution.

The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.

This section focuses on the regular expression handlers commonly used in Python.

Re.match function

Re.match tries to match a pattern from the beginning of the string.

function Syntax:

Re.match (Pattern, string, flags=0)

Function parameter Description:

Parameters

Describe

Pattern-matched Regular expression

String to match.

Flags flags that govern how regular expressions are matched, such as case sensitivity, multiline matching, and so on.

The match succeeds Re.match method returns a matching object, otherwise none is returned.

We can use the group (NUM) or groups () matching object function to get a matching expression.

Matching Object methods

Describe

Group (num=0) matches the string of the entire expression, and group () can enter more than one group number at a time, in which case it returns a tuple that contains the corresponding values for those groups.

Groups () returns a tuple containing all the group strings, from 1 to the included group number.

Instance:

#!/usr/bin/python

Import re

Line = "Cats is smarter than dogs"

Matchobj = Re.match (R ' (. *) is (. *?). * ', line, re. M|re. I)

If Matchobj:

Print "Matchobj.group ():", Matchobj.group ()

Print "Matchobj.group (1):", Matchobj.group (1)

Print "Matchobj.group (2):", Matchobj.group (2)

Else

Print "No match!!"

The results of the above instance execution are as follows:

Matchobj.group (): Cats is smarter than dogs

Matchobj.group (1): Cats

Matchobj.group (2): Smarter

Re.search method

Re.match tries to match a pattern from the beginning of the string.

function Syntax:

Re.search (Pattern, string, flags=0)

Function parameter Description:

Parameters

Describe

Pattern-matched Regular expression

String to match.

Flags flags that govern how regular expressions are matched, such as case sensitivity, multiline matching, and so on.

The match succeeds Re.search method method returns a matching object, otherwise none is returned.

We can use the group (NUM) or groups () matching object function to get a matching expression.

Matching Object methods

Describe

Group (num=0) matches the string of the entire expression, and group () can enter more than one group number at a time, in which case it returns a tuple that contains the corresponding values for those groups.

Groups () returns a tuple containing all the group strings, from 1 to the included group number.

Instance:

#!/usr/bin/python

Import re

Line = "Cats is smarter than dogs";

Matchobj = Re.match (R ' (. *) is (. *?). * ', line, re. M|re. I)

If Matchobj:

Print "Matchobj.group ():", Matchobj.group ()

Print "Matchobj.group (1):", Matchobj.group (1)

Print "Matchobj.group (2):", Matchobj.group (2)

Else

Print "No match!!"

The results of the above instance execution are as follows:

Matchobj.group (): Cats is smarter than dogs

Matchobj.group (1): Cats

Matchobj.group (2): Smarter

The difference between Re.match and Re.search

Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

Instance:

#!/usr/bin/python

Import re

Line = "Cats is smarter than dogs";

Matchobj = Re.match (R ' Dogs ', line, re. M|re. I)

If Matchobj:

Print "Match-and Matchobj.group ():", Matchobj.group ()

Else

Print "No match!!"

Matchobj = Re.search (R ' Dogs ', line, re. M|re. I)

If Matchobj:

Print "Search--Matchobj.group ():", Matchobj.group ()

Else

Print "No match!!"

The results of the above example operation are as follows:

No match!!

Search--Matchobj.group (): Dogs

Retrieving and replacing

The Python re module provides re.sub to replace matches in a string.

Grammar:

Re.sub (Pattern, Repl, String, max=0)

The returned string is replaced by a match that is not repeated on the leftmost side of the re in the string. If the pattern is not found, the character will be returned unchanged.

The optional parameter count is the maximum number of times a pattern match is replaced, and count must be a non-negative integer. The default value is 0 to replace all matches.

Instance:

#!/usr/bin/python

Import re

Phone = "2004-959-559 # This is Phone number"

# Delete Python-style Comments

num = re.sub (R ' #.*$ ', "", phone)

print "Phone num:", num

# Remove anything other than digits

num = re.sub (R ' \d ', "", phone)

print "Phone num:", num

The results of the above instance execution are as follows:

Phone num:2004-959-559

Phone num:2004959559

Regular expression modifier-optional flag

A regular expression can contain some optional flag modifiers to control the pattern that is matched. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|). such as Re. I | Re. M is set to the I and M flags:

Modifier

Describe

Re. I make the match case insensitive

Re. L do localization identification (locale-aware) matching

Re. M multiple lines match, affect ^ and $

Re. S make. Match all characters, including line breaks

Re. U resolves characters based on the Unicode character set. This sign affects \w, \w, \b, \b.

Re. X This flag is given by giving you a more flexible format so that you can write regular expressions more easily.

Regular expression pattern

A pattern string uses a special syntax to represent a regular expression:

Letters and numbers denote themselves. The letters and numbers in a regular expression pattern match the same string.

Most letters and numbers have a different meaning when they are put in front of a backslash.

Punctuation marks only match themselves if they are escaped, otherwise they represent special meanings.

Backslashes themselves need to be escaped with backslashes.

Because regular expressions usually contain backslashes, you might want to use the original string to represent them. The pattern element (such as R '/t ', equivalent to '//t ') matches the corresponding special character.

The following table lists the special elements in the regular expression pattern syntax. If you use the pattern while providing optional flag parameters, the meaning of some pattern elements will change.

Mode

Describe

^ matches the beginning of the string

$ matches the end of the string.

. Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.

[...] Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '

[^...] Characters not in []: [^ABC] matches characters other than a,b,c.

re* matches 0 or more expressions.

Re+ matches 1 or more expressions.

Re? Matches 0 or 1 fragments defined by a preceding regular expression, greedy way

re{N}

re{N,} exactly matches n preceding expressions.

re{N, m} matches N to M times the fragment defined by the preceding regular expression, greedy way

a| b matches A or b

(RE) The G matches the expression in parentheses, and also represents a group

The (? IMX) Regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.

(?-imx) Regular expression Close I, M, or x optional flag. Affects only the areas in parentheses.

(?: RE) similar (...), but does not represent a group

(? imx:re) use I, M, or x optional flag in parentheses

(?-imx:re) do not use I, M, or x optional flags in parentheses

(?#...) Comments.

(? = RE) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.

(?! Re) forward negative delimiter. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string

(?> re) matches the standalone mode, eliminating backtracking.

\w Match Alpha-numeric

\w matches non-alphanumeric numbers

\s matches any whitespace character, equivalent to [\t\n\r\f].

\s matches any non-null character

\d matches any number, equivalent to [0-9].

\d matches any non-numeric

\a Match string start

\z matches the end of the string, if there is a newline, matches only the end string before the line break. C

\z Match string End

\g matches the position where the last match was completed.

\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.

\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.

\ n, \ t, and so on. Matches a line break. Matches a tab character. such as

\1...\9 matches the sub-expression of the nth grouping.

\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.

Regular expression Instances

Character matching

Instance

Describe

Python matches "Python".

Character class

Instance

Describe

[Pp]ython matches "python" or "python"

Rub[ye] matches "Ruby" or "Rube"

[Aeiou] matches any one of the letters within the brackets

[0-9] matches any number. Similar to [0123456789]

[A-z] matches any lowercase letter

[A-z] matches any uppercase letter

[a-za-z0-9] matches any letter and number

[^aeiou] All characters except the Aeiou letter

[^0-9] matches characters except for numbers

Special character Classes

Instance

Describe

. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.

\d matches a numeric character. equivalent to [0-9].

\d matches a non-numeric character. equivalent to [^0-9].

\s matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].

\s matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].

\w matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.

\w matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.