Python Regular Expressions

Source: Internet
Author: User
Tags expression engine

Regular expressions provide the basis for advanced text pattern matching, extraction, and/or text-based search and replace functionality. Regular expressions are supported through the RE module in the standard library.

Common regular expression symbols and special characters
notation Describe Regular expression Examples
Symbol
Re1|re2 Match regular expression Re1 or Re2 Foo|bat
. Match any character (except \ n) B.b
^ Match the starting part of a string ^dear
$ Match the terminating part of a string /bin/*sh$
* Matches 0 or more occurrences of the preceding regular expression [a-za-z0-9]*
+ Matches 1 or more occurrences of the preceding regular expression [a-z]+\.com
? Matches a regular expression that appears before 0 or 1 times Goo?
N Matches regular expressions that appear before n times [0-9] {3}
{M,n} Matches regular expressions that appear before m-n [0-9] {5,9}
[...] Match any single character from a character set [Aeiou]
[... x-y ...]

Matches any single character in the X-y range

[0-9]
[^...] does not match any one of the characters appearing in this character set, including a range of characters (if present in this character set) [^aeiou]
(...) Match a closed regular expression, and then save as a child group ([0-9]{3})?
Special characters
\d Matches any decimal number, consistent with [0-9] (\d and \d, does not match any non-numeric number) Data\d+.txt

\w

Matches any alphanumeric character, same as [a-za-z0-9] (as opposed to \w) [a-za-z]\w
\s Matches any space character, same as [\n\t\r\v\f] (opposite to \s) Of\she
\b Match any word boundary (\b opposite) \bthe\b
\a (\z) Start of Match string (end) \adear

If the question mark is immediately followed by any match using the closing operator, it will directly require the regular expression engine to match as few times as possible.

What does it mean to be as few times as possible? When pattern matching uses the grouping operator, the regular expression engine attempts to "absorb" as many characters as possible to match the pattern. This is often called a greedy match. The question mark requires the regular expression engine to be "lazy" and, if possible, to match as few characters as possible in the current regular expression, leaving as many characters as possible to the subsequent pattern, if any.

When using regular expressions, a pair of parentheses can implement any of the following (or two) functions:

    • The regular expressions are grouped;
    • Matching subgroups

Common Regular Expression properties

Functions/Methods Describe
Just the RE module
Compile Compiles the pattern of the regular expression with any optional markup, and then returns a regular Expression object
Re-module functions and methods of regular expression objects
Match Attempts to match a string with a pattern of regular expressions with optional tokens. If the match succeeds, the matching object is returned, and if it fails, it returns none.
Search Use a regular expression that is the first occurrence of a tagged search string. If the match succeeds, the matching object is returned and none is returned if it fails
FindAll Finds all (non-repeating) occurrences of a regular expression pattern in a string and returns a matching object
Finditer Is the same as the FindAll () function, but returns an iterator instead of a list. For each match, the iterator returns a matching object.
Split Based on the pattern delimiter of the regular expression, the Split function splits the string into a list and then returns a list of successful matches, with the maximum number of delimiters (the default splits all successful locations)
Re-module functions and methods of regular expression objects
Sub Replaces the position of all regular expression patterns in a string with REPL, unless count is defined, and all occurrences are replaced
Purge () Eliminate implicit compilation of regular expressions
Common matching objects
Group Returns the entire matching object, or a specific subgroup numbered NUM
Groups Returns a Ganso that contains all matching subgroups (no success, returns an empty tuple)
Groupdict Returns a dictionary containing all matching named subgroups, with all child group names as keys to the dictionary
Common Module Properties
Re. I Case-insensitive matching

Match objects and the group () and groups () methods

The object returned by match () and search () were successfully called.

Group () either returns the entire matching object or returns a specific subgroup as required. Groups () returns only one tuple that contains a unique or all child group. If there is no subgroup requirement, groups () returns an empty tuple when group () still returns the entire match.

Match a string using the match () method

The match () function attempts to match the pattern from the beginning of the string.

>>> re.match (' foo ', ' foo '). Group () ' foo ' >>> re.match (' foo ', ' Food on Match '). Group () ' foo ' >> > Re.match (' foo ', ' Fodo on Match '). Group () Traceback (most recent call last):  File "<stdin>", line 1, in <mo Dule>attributeerror: ' Nonetype ' object has no attribute ' group '

Use Search () to find patterns in a string (comparison of search and match)

Search () and match () work in exactly the same way, except that search uses its string arguments to find the first occurrence of a match for a given regular expression pattern at any point in time.

>>> re.match (' foo ', ' Sea food '). Group () Traceback (most recent):  File "<stdin>", line 1, in &L T;module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> re.search (' foo ', ' Sea food '). Group () ' Foo '

  

Match multiple strings
>>> bt = ' bat|bet|bit ' >>> re.match (BT, ' Bat '). Group () ' bat ' >>> >>> re.match (BT, ' Bit '). Group () ' bit ' >>> >>> re.match (BT, ' BLT '). Group () Traceback (most recent call last):  File " <stdin> ", line 1, in <module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> >>& Gt Re.match (BT, ' he bit me '). Group () Traceback (most recent):  File "<stdin>", line 1, in <module>att Ributeerror: ' Nonetype ' object has no attribute ' group ' >>> >>> re.search (BT, ' he bit me '). Group () ' bit '

Match any single character
>>> anyend = '. End ' >>> re.match (anyend, ' Bend '). Group () ' Bend ' >>> >>> re.match ( Anyend, ' end '). Group () Traceback (most recent):  File "<stdin>", line 1, <module> Attributeerror: ' Nonetype ' object has no attribute ' group ' >>> >>> re.match (anyend, ' \nend '). Group () Traceback (most recent):  

  

Create a character set []
>>> Re.match (' [Cr][23][dp][o2] ', ' C3PO '). Group () ' C3PO ' >>> >>> re.match (' [Cr][23][dp][o2 ] ', ' c2do '). Group () ' C2do ' >>> >>> re.match (' R2d2|c3po ', ' c2do '). Group () Traceback (most recent call Last):  

  

Repeating, special characters, and grouping

>>> m = re.match (' ab ', ' ab ')    #没有子组 >>> m.group ()                         #完整匹配 ' ab ' >>> m.groups ()                       #所有子组 >>> >>> m = Re.match (' (AB) ', ' ab ')    >>> m.group () ' AB ' >>> m.groups () (' AB ',) > >> >>> m= Re.match (' (a) (b) ', ' ab ') >>> m.group () ' AB ' >>> m.group (1)            # Sub-group 1 ' a ' > >> M.group (2)            

Matches the beginning and end of a string and the word boundary
>>> m = re.search (' ^the ', ' the End ') >>> M.group () ' The ' >>> >>> m = re.search (' ^the ', ' end. ') >>> M.group () Traceback ( Most recent call last):  File "<stdin>", line 1, in <module>attributeerror: ' Nonetype ' object have no attrib Ute ' Group ' >>> >>> m = re.search (R ' \bthe ', ' is the Yes ') >>> M.group () ' The ' >>> > >> m = re.search (R ' \bthe ', ' isthe yes ')      #有边界 >>> m.group () Traceback (most recent call last):  File " <stdin> ", line 1, in <module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> >>& Gt m = Re.search (R ' \bthe ', ' isthe yes ')      #没有边界 >>> m.group () ' The '

  

Use FindAll () and Finditer () to find the location of each occurrence

FindAll () The occurrence of all non-repetition of a regular expression pattern in a query string. Always returns a list.

Use sub () and SUBN () search and replace

The two are almost the same, and all the parts of a string that match a regular expression are replaced in some way. The part used to replace is usually a string, but it can also be a function that returns a string to replace. SUBN () is the same as sub (), but SUBN () also returns a total number of replacements, followed by a replacement string and a number representing the total number of replacements as a tuple of two elements.

Use Split () to split a string in qualified mode

If you do not want to split the string for each occurrence of the pattern, you can set the maximum number of splits by setting a value (not 0) for the max parameter.

If the given delimiter is not a regular expression that uses a special symbol to match multiple patterns, then re.split () works the same way as Str.split (), as in the example below

>>> re.split (': ', ' str1:str2:str3 ') [' str1 ', ' str2 ', ' STR3 ']

  

Python Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.