Regular expressions provide the basis for advanced text pattern matching, extraction, and/or text-based search and replace functionality. Regular expressions are supported through the RE module in the standard library.
Common regular expression symbols and special characters
notation |
Describe |
Regular expression Examples |
Symbol |
|
|
Re1|re2 |
Match regular expression Re1 or Re2 |
Foo|bat |
. |
Match any character (except \ n) |
B.b |
^ |
Match the starting part of a string |
^dear |
$ |
Match the terminating part of a string |
/bin/*sh$ |
* |
Matches 0 or more occurrences of the preceding regular expression |
[a-za-z0-9]* |
+ |
Matches 1 or more occurrences of the preceding regular expression |
[a-z]+\.com |
? |
Matches a regular expression that appears before 0 or 1 times |
Goo? |
N |
Matches regular expressions that appear before n times |
[0-9] {3} |
{M,n} |
Matches regular expressions that appear before m-n |
[0-9] {5,9} |
[...] |
Match any single character from a character set |
[Aeiou] |
[... x-y ...] |
Matches any single character in the X-y range |
[0-9] |
[^...] |
does not match any one of the characters appearing in this character set, including a range of characters (if present in this character set) |
[^aeiou] |
(...) |
Match a closed regular expression, and then save as a child group |
([0-9]{3})? |
Special characters |
|
|
\d |
Matches any decimal number, consistent with [0-9] (\d and \d, does not match any non-numeric number) |
Data\d+.txt |
\w |
Matches any alphanumeric character, same as [a-za-z0-9] (as opposed to \w) |
[a-za-z]\w |
\s |
Matches any space character, same as [\n\t\r\v\f] (opposite to \s) |
Of\she |
\b |
Match any word boundary (\b opposite) |
\bthe\b |
\a (\z) |
Start of Match string (end) |
\adear |
|
|
|
If the question mark is immediately followed by any match using the closing operator, it will directly require the regular expression engine to match as few times as possible.
What does it mean to be as few times as possible? When pattern matching uses the grouping operator, the regular expression engine attempts to "absorb" as many characters as possible to match the pattern. This is often called a greedy match. The question mark requires the regular expression engine to be "lazy" and, if possible, to match as few characters as possible in the current regular expression, leaving as many characters as possible to the subsequent pattern, if any.
When using regular expressions, a pair of parentheses can implement any of the following (or two) functions:
- The regular expressions are grouped;
- Matching subgroups
Common Regular Expression properties
Functions/Methods |
Describe |
Just the RE module |
|
Compile |
Compiles the pattern of the regular expression with any optional markup, and then returns a regular Expression object |
Re-module functions and methods of regular expression objects |
|
Match |
Attempts to match a string with a pattern of regular expressions with optional tokens. If the match succeeds, the matching object is returned, and if it fails, it returns none. |
Search |
Use a regular expression that is the first occurrence of a tagged search string. If the match succeeds, the matching object is returned and none is returned if it fails |
FindAll |
Finds all (non-repeating) occurrences of a regular expression pattern in a string and returns a matching object |
Finditer |
Is the same as the FindAll () function, but returns an iterator instead of a list. For each match, the iterator returns a matching object. |
Split |
Based on the pattern delimiter of the regular expression, the Split function splits the string into a list and then returns a list of successful matches, with the maximum number of delimiters (the default splits all successful locations)
|
Re-module functions and methods of regular expression objects |
|
Sub |
Replaces the position of all regular expression patterns in a string with REPL, unless count is defined, and all occurrences are replaced |
Purge () |
Eliminate implicit compilation of regular expressions |
Common matching objects |
|
Group |
Returns the entire matching object, or a specific subgroup numbered NUM |
Groups |
Returns a Ganso that contains all matching subgroups (no success, returns an empty tuple) |
Groupdict |
Returns a dictionary containing all matching named subgroups, with all child group names as keys to the dictionary |
Common Module Properties |
|
Re. I |
Case-insensitive matching |
Match objects and the group () and groups () methods
The object returned by match () and search () were successfully called.
Group () either returns the entire matching object or returns a specific subgroup as required. Groups () returns only one tuple that contains a unique or all child group. If there is no subgroup requirement, groups () returns an empty tuple when group () still returns the entire match.
Match a string using the match () method
The match () function attempts to match the pattern from the beginning of the string.
>>> re.match (' foo ', ' foo '). Group () ' foo ' >>> re.match (' foo ', ' Food on Match '). Group () ' foo ' >> > Re.match (' foo ', ' Fodo on Match '). Group () Traceback (most recent call last): File "<stdin>", line 1, in <mo Dule>attributeerror: ' Nonetype ' object has no attribute ' group '
Use Search () to find patterns in a string (comparison of search and match)
Search () and match () work in exactly the same way, except that search uses its string arguments to find the first occurrence of a match for a given regular expression pattern at any point in time.
>>> re.match (' foo ', ' Sea food '). Group () Traceback (most recent): File "<stdin>", line 1, in &L T;module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> re.search (' foo ', ' Sea food '). Group () ' Foo '
Match multiple strings
>>> bt = ' bat|bet|bit ' >>> re.match (BT, ' Bat '). Group () ' bat ' >>> >>> re.match (BT, ' Bit '). Group () ' bit ' >>> >>> re.match (BT, ' BLT '). Group () Traceback (most recent call last): File " <stdin> ", line 1, in <module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> >>& Gt Re.match (BT, ' he bit me '). Group () Traceback (most recent): File "<stdin>", line 1, in <module>att Ributeerror: ' Nonetype ' object has no attribute ' group ' >>> >>> re.search (BT, ' he bit me '). Group () ' bit '
Match any single character
>>> anyend = '. End ' >>> re.match (anyend, ' Bend '). Group () ' Bend ' >>> >>> re.match ( Anyend, ' end '). Group () Traceback (most recent): File "<stdin>", line 1, <module> Attributeerror: ' Nonetype ' object has no attribute ' group ' >>> >>> re.match (anyend, ' \nend '). Group () Traceback (most recent):
Create a character set []
>>> Re.match (' [Cr][23][dp][o2] ', ' C3PO '). Group () ' C3PO ' >>> >>> re.match (' [Cr][23][dp][o2 ] ', ' c2do '). Group () ' C2do ' >>> >>> re.match (' R2d2|c3po ', ' c2do '). Group () Traceback (most recent call Last):
Repeating, special characters, and grouping
>>> m = re.match (' ab ', ' ab ') #没有子组 >>> m.group () #完整匹配 ' ab ' >>> m.groups () #所有子组 >>> >>> m = Re.match (' (AB) ', ' ab ') >>> m.group () ' AB ' >>> m.groups () (' AB ',) > >> >>> m= Re.match (' (a) (b) ', ' ab ') >>> m.group () ' AB ' >>> m.group (1) # Sub-group 1 ' a ' > >> M.group (2)
Matches the beginning and end of a string and the word boundary
>>> m = re.search (' ^the ', ' the End ') >>> M.group () ' The ' >>> >>> m = re.search (' ^the ', ' end. ') >>> M.group () Traceback ( Most recent call last): File "<stdin>", line 1, in <module>attributeerror: ' Nonetype ' object have no attrib Ute ' Group ' >>> >>> m = re.search (R ' \bthe ', ' is the Yes ') >>> M.group () ' The ' >>> > >> m = re.search (R ' \bthe ', ' isthe yes ') #有边界 >>> m.group () Traceback (most recent call last): File " <stdin> ", line 1, in <module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> >>& Gt m = Re.search (R ' \bthe ', ' isthe yes ') #没有边界 >>> m.group () ' The '
Use FindAll () and Finditer () to find the location of each occurrence
FindAll () The occurrence of all non-repetition of a regular expression pattern in a query string. Always returns a list.
Use sub () and SUBN () search and replace
The two are almost the same, and all the parts of a string that match a regular expression are replaced in some way. The part used to replace is usually a string, but it can also be a function that returns a string to replace. SUBN () is the same as sub (), but SUBN () also returns a total number of replacements, followed by a replacement string and a number representing the total number of replacements as a tuple of two elements.
Use Split () to split a string in qualified mode
If you do not want to split the string for each occurrence of the pattern, you can set the maximum number of splits by setting a value (not 0) for the max parameter.
If the given delimiter is not a regular expression that uses a special symbol to match multiple patterns, then re.split () works the same way as Str.split (), as in the example below
>>> re.split (': ', ' str1:str2:str3 ') [' str1 ', ' str2 ', ' STR3 ']
Python Regular Expressions