Regular expression for Python3

Source: Internet
Author: User
Tags character classes modifier uppercase letter

Previously summary

It seems to be the best way to find what you want from a large amount of text content. It is also an indispensable skill to write crawlers. So, don't ink it!

Tutorials from http://www.runoob.com/python3/python3-reg-expressions.html, thanks to the rookie tutorial.

I. The RE module in Python3

Import re

Two. Re.match function

Re.match–> matches a pattern from the starting position of the string, and if the match is not successful, match () returns none, Syntax:

Re.match (pattern, string, flags = 0)

Pattern–> Matching Regular expressions

String–> the string to match.

The flags–> flag bit, used to control the regular match, such as: case sensitivity, multi-line matching, and so on.

The match succeeds Re.match method returns a matching object, otherwise none is returned.

You can use the group (NUM) or groups () match object functions to get a matching expression.

Group (num = 0) –> A string that matches the entire expression, and group () can enter more than one level number at a time, in which case it returns a tuple that contains the corresponding values for those groups.

Groups () –> returns a tuple containing all the group strings, from 1 to the included group number.

Example:

#!/usr/bin/python3

#-*-Coding:utf-8-*-

Import re

Print (Re.match (' www ', ' www.runoob.com). span ()) # matches at the starting position. –> (0, 3)

Print (Re.match (' com ', ' www.runoob.com ') # does not match the starting position .–> None

Example 2:

Import re

Line = ' Cats is smarter than dogs '

Matchobj = Re.match (R ' (. *) is (. *?). * ', line, re. M|re. I)

If Matchobj:

Print (' Matchobj.group (): ', Matchobj.group ())

Print (' Matchobj.group (1): ', Matchobj.group (1))

Print (' Matchobj.group (2): ', Matchobj.group (2))

Else

Print (' No match!! ')

Three. Re.search method

Re.search scans the entire and returns the first successful match. Syntax:

Re.search (Pattern, String, flags=0) –> parameter description is the same as above

A match succeeds in returning a matching object, otherwise none is returned.

The group (num=0) and groups () usages are the same as above.

Example:

Import re

Print (Re.search (' www ', ' www.baidu.com '). span ()) # matches at the starting position. –> (0,3)

Print (re.search (' com ', ' www.baidu.com '. span ()) # does not match the starting position .–> (11,14)

Example 2:

Import re

Line = ' Cats is smarter than dogs '

Searchobj = Re.search (R ' (. *) is (. *?). * ', line, re. M|re. I)

If Searchobj:

Print ('searchobj.group (): ', searchobj. Group ())

Print ('searchobj.group (1): ', searchobj. Group (1))

Print ('searchobj.group (2): ', searchobj. Group (2))

Four. The difference between Re.match and Re.search

Re.match matches only the beginning of the string, and if the string does not begin to match the regular, then the match fails, returning none. The Re.search matches the entire string until a match is found.

Example:

Import re

Line = ' Cats is smarter than dogs '

Matchobj = Re.match (R ' Dogs ', line, re. M|re. I)

If Matchobj:

Print (' Match–>matchobj.group (): ', Matchobj.group ())

Else

Print (' No match ')

Matchobj = Re.search (R ' Dogs ', line, re. M|re. I)

If Matchobj:

Print (' Search–> matchobj.group (): ', Matchobj.group ())

Else

Print (' No match ')

Five. Search and replace

Re.sub (Pattren, Repl, String, count=0)

Parameters:

Pattren–> slightly

Repl–> the replacement string, or a function.

String–> the original string to be looked up for replacement.

Count–> the maximum number of substitutions after a pattern match, and the default of 0 means that all matches are replaced.

Example:

Import re

Phone = ' 2004-959-559 # This is a phone number '

# Delete Comments

num = re.sub (R ' #.*$ ', "", phone)

Print (' phone number ', num)

#移除非数字的内容

num = re.sub (R ' \d, "", phone)

Print (' Phone number: ', num)

Six. The REPL parameter is a function

Import re

#把匹配的数字

def double (matched):

value = Int (Matched.group (' value '))

return str (value * 2)

s = ' a23g4hfd567 '

Print (Re.sub (? p<balue>\d+) ', double, s)

Seven. Regular expression modifier – optional flag

Regular expressions can contain optional flag modifiers to control the matching weapon. The modifier is specified as an optional flag. Multiple flags can be passed by bitwise OR (|) They are to be specified, such as re. I|re. M is set to the I and M flags:

re.l–> case matching is not case sensitive.

Re. l–> localization identification (locale-aware) matching

Re. M–> multi-line matching, affecting ^ and $

Re. S–>. Matches all characters, including line breaks.

Re. U–> parsing characters According to the Unicode character set, this flag affects \w, \w, \b, \b

Re. X–> by giving you a more flexible format so that you can write regular expressions more easily.

Eight. Regular expression pattern

A pattern string uses a special syntax to represent a regular:

Letters and numbers denote themselves. The letters and numbers in a regular expression pattern match the same string.

Most letters and numbers have a different meaning when they are put in front of a backslash.

Punctuation only matches themselves when they are black, otherwise they represent special meanings.

Backslashes themselves need to be escaped with backslash

Since regular expressions usually contain backslashes, you might want to use the original string to represent them. The pattern element (such as R '/t ' = = '//t ') matches the corresponding special character.

The following is a list of the special elements in the regular expression pattern syntax. If you use patterns with optional flag parameters, the meaning of some pattern elements changes.

^–> matches the beginning of a string

$–> matches the end of a string

. –> matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.

[...] –> is used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ', or ' k '

[^ ...] –> characters that are not in []: [^ABC] matches characters other than a,b,c.

A re*–> matches 0 or more expressions.

A re+–> matches 1 or more expressions.

Re? –> matches 0 or 1 fragments defined by a preceding regular expression, not greedy.

Re{n}–>

Re{n,}–> exactly matches n preceding expressions.

Re{n, m}–> matches N to M times the fragment defined by the preceding expression, greedy way

A|b–> match A or b

(re) –> G matches the expression in parentheses, which also represents a group

The (? imx) –> Regular expression contains three optional flags: I,M,X, which affects only the areas in parentheses.

(?-imx) –> Regular expression Close I, M, or x optional flag. Affects only the areas in parentheses.

(?: RE) –> similar (...), but does not represent a group

(? imx:re) –> use I, M, or x optional flag in parentheses

(?-imx:re) –> do not use I, M, or x optional flags in parentheses

(?#...) –> notes

(? = re) –> forward positive qualifier. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.

(?! Re) –> the forward Negation qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string

(?> re) –> a matching standalone mode, eliminating backtracking.

\w–> Match Alpha-numeric

\w–> matches non-alphanumeric numbers

\s–> matches any white space character, equivalent to [\t\n\r\f]

\s–> matches any non-null character

\d–> matches any number, equal parts in [0-9]

\d–> matches any non-numeric

\a–> Match string start

\z–> matches the end of the string, if there is a newline, matches only the end string before the line break.

\z–> Match string End

\g–> matches the position where the last match was completed.

\b–> matches a word boundary, which is the position between a word and a space. For example, ' re\b ' can match ' er ' in ' Never ' but not ' er ' in ' verb '

\b–> matches a non-word boundary. and \b in turn

\ t \ –> matches a newline character tab, and so on.

\1...\9–> matches the sub-expression of the nth grouping.

\10–> matches the sub-expression of the nth grouping, if it is matched, it refers to an expression of octal character code

Nine. Regular expression instances

python–> matching "python"

Character class

[pp]ython–> ' python ' or ' python '

Rub[ye]–> ' Ruby ' or ' rube '

[aeiou]–> matches any one of the letters within the brackets

[0-9]–> matches any number similar to [0123456789]

[a-z]–> matches any lowercase letter

[a-z]–> matches any uppercase letter

[a-za-z0-9]–> can match any letter or number

[^aeiou]–> all characters except the Aeiou letter

[^0-9]–> characters except for numbers

Special character Classes

. –> matches any single character except \ n. to match any character including \ n, use [. \ n's mode ]

\d \d \s \s \w \w.

Regular expression for Python3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.