Python Regular Expression Learning summary and data

Last Update:2016-08-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source: Michael_ Xiang _

Summary

In regular expressions, if a character is given directly, it is exactly the exact match.
{m,n}? Repeat m to n for the previous character, and take as few cases as possible in the string ' aaaaaa ', a{2,4} will match 4 A, but a{2,4}? Matches only 2 A.

^ Represents the beginning of a line, and ^\d indicates that it must begin with a number.

$ represents the end of the line, and \d$ indicates that it must end with a number.

You may have noticed that the Py can also match the ' Python ' –>py;

But with ^py$ it becomes the whole line match, it can only match ' py ', match ' python ', there is nothing to get.

Reference table

Special sequence of regular expressions

^Match start $ matches the end of the line, matching any single character other than the line break, using-The m option allows it to match a newline character as well [...] to match any of the characters in parentheses (or meaning) [^...] Match a single character or multiple characters not in parentheses*match 0 or more of the preceding expressions+match 1 or more occurrences of the preceding expression? Matches 0 or 1 occurrences of the preceding expression {n} exactly matches the number of expressions preceding the previous occurrence {n,m} matches at least n times to M times a|B matches A or B*？ + ,??, {m,n}? This is *,+.,? , {m,n} becomes a non-greedy mode (RE) group regular expression and matches the text in a timely fashion (? IMX) temporarily toggles the options on the I,m or X-quake expression, if only the region is affected by the (?: RE) group regular expression and matches the remembered text (?#....) Notes(?=re) specifies the mode location to use, without a range (?! RE) uses the specified mode to take the inverse position, without a range (?<n1>..) Match \d numbers in a list [0-9] Digit \d non-digital= = [^0-9]or[^\d] \s white space character \s non-whitespace character \w alphanumeric underline word \w non-alphanumeric underline

Re module

Re.compile (pattern[, flags])

Converts the pattern and identity of regular expressions into regular expression objects for use by the match () and search () functions.

The flags defined by re include:

Re. I Ignore case

Re. L represents a special character set \w, \w, \b, \b, \s, \s dependent on the current environment

Re. M Multi-line mode

Re. S is the '. ' and include a newline character ('. ' Do not include newline characters.

Re. U represents special character set \w, \w, \b, \b, \d, \d, \s, \s dependent on Unicode character Property database

Re. X to increase readability, ignore spaces and comments after ' # '

The following two usage results are the same:

Compiled_pattern = Re.compile (pattern)

result = Compiled_pattern.match (string)

result = Re.match (pattern, string)

s = ' abc\\-001 ' # python string

#对应的正则表达式字符串变成:

# ' abc\-001 '

Therefore, we strongly recommend that you use the Python R prefix without considering escaping the problem.

s = R ' abc\-001 ' # python string

# The corresponding regular expression string does not change:

# ' abc\-001 '

Re.search (pattern, string[, flags])

Finds the position in the string that matches the regular expression pattern, returns an instance of Matchobject, or none if no matching position is found.

For compiled regular expression objects (re. Regexobject), you have the following search methods:

Search (string[, pos[, Endpos])

If the regex is a compiled regular expression object, Regex.search (string, 0, 50) is equivalent to Regex.search (String[:50], 0).

>>> pattern = Re.compile ("a")

>>> pattern.search ("ABCDE") # Match at index 0

>>> pattern.search ("ABCDE", 1) # No match;

Match

Re.match (pattern, string[, flags])

Determines whether the pattern matches at the beginning of the string. For Regexobject, there are:

Match (string[, pos[, Endpos])

The match () function attempts to match the regular expression only at the beginning of the string, that is, only the match that starts at position 0 is reported, and the search () function scans the entire string to find a match. If you want to search the entire string for a match, you should use Search ().

>>> pattern.match (' BCA ', 2). Group ()

A

Although, match defaults to match from the beginning, but if the location is specified, it can still succeed; Match also starts at the specified position, and the mismatch still fails, which is different from search.

The match () method determines if the match is true and returns a match object if the match succeeds, otherwise none is returned.

Test = ' user-entered string '

If Re.match (R ' Regular expression ', test):

Print (' OK ')

Else

Print (' failed ')

Split

Re.split (Pattern, string[, maxsplit=0, flags=0])

This feature is often used to split the part of a string-matching regular expression and return a list. For Regexobject, there are functions:

Split (string[, maxsplit=0])

Split does not split a string that cannot find a match

>>> ' a b C '. Split (')

[' A ', ' B ', ', ', ', ' C ']

The split method, which comes with strings, is not flexible.

>>> Re.split (R ' \s+ ', ' a B C ')

[' A ', ' B ', ' C ']

See the difference, very powerful!

One more Ultimate:

>>> Re.split (R ' [\s\,\;] + ', ' A-B;; C d ')

[' A ', ' B ', ' C ', ' d ']

R ' [\s\,\;] + ' Regular expression means: a space or, or, 1 or more than 1 occurrences of the condition of the split symbol! So, the final result is still very clean.

FindAll

Re.findall (pattern, string[, flags])

Finds all substrings that match the regular expression in the string and makes up a list to return. The same regexobject are:

FindAll (string[, pos[, Endpos])

#get all content enclosed with [], and return a list

>>> Pattern=re.compile (R ' HH ')

>>> pattern.findall (' hhmichaelhh ')

[' hh ', ' hh ']

Finditer

Re.finditer (pattern, string[, flags])

Similar to FindAll, finds all substrings that match the regular expression in the string and makes up an iterator to return. The same regexobject are:

Finditer (string[, pos[, Endpos])

Sub

Re.sub (Pattern, REPL, string[, Count, flags])

Finds all substrings matching the regular expression pattern in string strings and replaces them with another string repl. If no string matching the pattern is found, a string that has not been modified is returned. Repl can be either a string or a function.

The return value is the new string after replacement.

For Regexobject there are:

Sub (REPL, string[, count=0])

>>> pattern=re.compile (R ' \d ')

>>> pattern.sub (' No ', ' 12hh34hh ')

' Nonohhnonohh '

>>> pattern.sub (' No ', ' 12hh34hh ', 0)

' Nonohhnonohh '

>>> pattern.sub (' No ', ' 12hh34hh ', count=0)

' Nonohhnonohh '

>>> pattern.sub (' No ', ' 12hh34hh ', 1)

' No2hh34hh '

As you can see from the above example, count is the default, and the default value is 0, which means replace all;

Subn

RE.SUBN (Pattern, REPL, string[, Count, flags])

The function has the same function as a sub (), but it also returns the new string and the number of substitutions. The same regexobject are:

Subn (Repl, string[, count=0])

>>> pattern.subn (' No ', ' 12hh34hh ', count=0)

(' Nonohhnonohh ', 4)

Group

In addition to simply judging whether a match is matched, the regular expression also has the power to extract substrings. The group (group) to be extracted is represented by (). Like what:

^ (\d{3})-(\d{3,8}) $ defines two groups, which can extract the area code and local numbers directly from the matching string:

>>> m = Re.match (R ' ^ (\d{3})-(\d{3,8}) $ ', ' 010-12345 ')

>>> m

<_sre. Sre_match object; span= (0, 9), match= ' 010-12345 ' >

>>> M.group (0)

' 010-12345 '

>>> M.group (1)

' 010 '

>>> M.group (2)

' 12345 '

>>> m.groups ()

(' 010 ', ' 12345 ')

Through the experiment, if you do not use parentheses, the resulting match object class can be used such as A.group (0) or a.group () However, using A.group (1) will give an error.

Greedy match

A regular match is a greedy match by default, which is to match as many characters as possible. For example, match the 0 following the number:

>>> Re.match (R ' ^ (\d+) (0*) $ ', ' 102300 '). Groups ()

(' 102300 ', ')

Since the \d+ uses greedy matching, the following 0 are all matched directly, the result 0* can only match the empty string.

You must let \d+ use a non-greedy match (that is, as few matches as possible) in order to match the back of the 0, add a? You can let the d+ use a non-greedy match:

>>> Re.match (R ' ^ (\d+?) (0*) $ ', ' 102300 '). Groups ()

(' 1023 ', ' 00 ')

Python Regular Expression Learning resources

Python Regular Expression Learning summary and data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Regular Expression Learning summary and data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Regular Expression Learning summary and data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support