Python's Re module

Last Update:2017-10-10 Source: Internet

Author: User

Tags locale

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

function functions commonly used in re modules

First, load module

Import re

Second, compile the regular expression

The Re.compile (PAT, flags = 0) #把一个正则表达式pat编译成正则对象并返回 so that the regular object's match and search methods can be used.

When Pat compiles, the regular expression string

Flags compile flags that modify the way regular expressions are matched, such as case sensitivity, multiline matching, and so on. The usual flags are:

Re. Ire.ignorecase let the regular expression ignore case, so that [a-z] can also match lowercase letters. This attribute is independent of locale. Re. Lre.locale allows \w, \w, \b, \b, \s, and \s to rely on the current locale. Re. Mre.multiline affects the behavior of ' ^ ' and ' $ ', specified later, ' ^ ' increases the start of the match for each line (i.e. the position after the newline character); ' $ ' increases the end of the match for each line (that is, the position before the line break). Re. Sre.dotall influence '. ' The act, usually '. ' Matches all characters except the newline character, which after this flag is specified, can also match line breaks. Re. Ure.unicode makes \w, \w, \b, \b, \d, \d, \s, and \s dependent on the Unicode library. Re. Xre.verbose using this flag, you can write a more readable regular expression: all whitespace characters except the square brackets and the backslash escape are ignored, and in each line, all the character after a normal pound sign is ignored, which makes it easy to write comments inside the regular expression.

A = Re.compile (r "" "\d +  # the integral part                   \.    # The decimal point                   \d *  # some fractional digits "", Re. X) b = Re.compile (r "\d+\.\d*") re.search (pattern, string[, flags])

Third, search, execute function function

1. Greedy and non-greedy search

* ', ' + ' and '? ' (*?, +?,??, {m,n}) are greedy, match as many pattern strings as possible, you can add a question mark later, change greed to non-greedy, match only as few pattern strings as possible.

2. Matching group: (...) ）

Matches the contents of the RE in parentheses and specifies the start and end positions of the group. The contents of the group can be extracted (0 for all matches, N for nth grouped matches), or a special sequence such as \number can be used for subsequent matches. To match the literal ' (' and ') ', you can escape with a backslash: \ (, \), or enclosed in parentheses: [(], [)].

3. Search Matches

3.1 Re.match and Re.search

Re.match (PAT, string[, flags = 0])

Returns an instance of the corresponding Matchobject if the start of the string and the regular expression Pat match, otherwise none is returned

Re.search (PAT, string[, flags = 0])

Finds a pattern match within a string (not necessarily the beginning of a string) and returns an instance of the corresponding Matchobject if the first match is found, otherwise none is returned.

Note: The Matchobject object has the following methods:

1?? M.group () does not add parameters, returning the whole matching string of re;

2?? M.group (n, m) returns one or more subgroups. If the argument is one, a substring is returned, and if there are multiple arguments, the tuples registered by multiple substrings are returned. If you do not pass any parameters, the effect will return an entire match, as it does with a 0 pass. If a groupn does not match, the corresponding location returns none. If a groupn is negative or is greater than the total number of the group, the Indexerror exception is thrown.

3?? M.start ([group]) and M.end ([group]) return the position of the substring in the original string that is matched to the set group. If you do not specify group or group designation as 0, the entire match is represented. If group does not match, 1 is returned.

Equivalent for the specified m and G,m.group (g) and M.string[m.start (g): M.end (g).
If group matches to an empty string, M.start (group) and M.end (group) will be equal.

4?? M.span ([group]) returns a tuple containing the position of the match (start, end)

5?? M.groups () returns a tuple containing all the group strings in the regular expression, from 1 to the included group number, usually groups () does not require parameters, returns a tuple, and the tuples in the tuple are the groups defined in the regular expression.

3.2 Re.findall (PAT, string, flags = 0)

To iterate through the match, you can get all the matching strings in the string and return a list.

Import Rett = "Tina is a good girl, she's cool, clever, and so on ..." rr = Re.compile (R ' \w*oo\w* ') print (Rr.findall (TT)) PRI NT (Re.findall (R ' (\w) *oo (\w) ', TT)) # () indicates that the sub-expression executes as follows: [' good ', ' cool '] [(' G ', ' d '), (' C ', ' l ')]

3.3 Re.finditer (PAT, string, flags = 0)

Searches for a string that returns an iterator that accesses each matching result (match object) sequentially. Find all the substrings that the RE matches and return them as an iterator.

ITER = Re.finditer (R ' \d+ ', ' drumm44ers drumming, 11 ... ... ') for I in ITER:    print (i)    print (I.group ()) Print (    i.span ()) execution results are as follows: <_sre. Sre_match object; span= (0, 2), match= ' > (0, 2) <_sre. Sre_match object; Span= (8, ten), Match= ' > (8, ten) <_sre. Sre_match object; span=, match= ' one > ' (<_sre). Sre_match object; Span= (+), match= ' > (31, 33)

3.4 Re.split (Pat, string[, Maxsplit])

Returns a list after splitting a string by a substring that can be matched. You can use Re.split to split a string, such as: Re.split (R ' \s+ ', text), and divide the string into a word list by space. The maxsplit is used to specify the maximum number of splits and does not specify that all will be split.

3.5 Re.sub (Pat, REPL, string[, Count])

In string, match the part of Pat, replace with REPL, replace at most count times (the remaining matches will not be processed), and return the replaced string. If there is no string in string that matches the pattern, it will be returned intact. The REPL can be a string, or it can be a function.

If Repl is a function, each time the pattern is matched, it will be called once, passing in a matching Matchobject object, returning a string, and filling in the returned string where it matches.

>>> def dashrepl (matchobj): ...     If Matchobj.group (0) = = '-': return ' ...     Else:return '-' >>> re.sub ('-{1,2} ', Dashrepl, ' pro----gram-files ') ' Pro--gram files '

re.subn (Pattern, REPL, string[, Count]) is the same as the sub () function above, except that it returns a tuple (new string, number of matches)

Python's Re module

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More