Python regular expression,
Regular Expressions provide the basis for advanced text pattern matching, extraction, and/or text search and replacement. Supports regular expressions through the re module in the standard library.
Common Regular Expression symbols and special characters
| Representation |
Description |
Regular Expression example |
| Symbol |
|
|
| Re1 | re2 |
Match the regular expression re1 or re2 |
Foo | bat |
| . |
Match any character (except \ n) |
B. B |
| ^ |
Start part of the matched string |
^ Dear |
| $ |
End Part of matching string |
/Bin/* sh $ |
| * |
Matches the regular expression that appears before 0 or multiple times. |
[A-Za-z0-9] * |
| + |
Match the regular expression that appears before one or more times |
[A-z] + \. com |
| ? |
Matches the regular expression that appears before 0 or 1 times. |
Goo? |
| {N} |
Match the regular expression that appears before N times |
[0-9] {3} |
| {M, N} |
Match the regular expression that appears before a M-N |
[0-9] {5, 9} |
| [...] |
Match any single character from Character Set |
[Aeiou] |
| [... X-y...] |
Match any single character in the x-y range |
[0-9] |
| [^...] |
Does not match any character in this character set, including characters in a certain range (if this character set appears) |
[^ Aeiou] |
| (...) |
Match the Closed Regular Expression and save it as a sub-group |
([0-9] {3 })? |
| Special characters |
|
|
| \ D |
Match any decimal number, consistent with [0-9] (\ D is opposite to \ d, does not match any non-numeric number) |
Data \ d0000.txt |
\ W |
Match any alphanumeric character, same as [A-Za-z0-9] (opposite to \ W) |
[A-Za-z] \ w |
| \ S |
Match any space character, same as [\ n \ t \ r \ v \ f] (opposite to \ S) |
Of \ she |
| \ B |
Match any word boundary (\ B opposite) |
\ BThe \ B |
| \ A (\ Z) |
Start (end) of the matched string) |
\ ADear |
| |
|
|
If the question mark is followed by any match using the closed operator, it requires that the regular expression engine match as few times as possible.
What does the minimum number of times mean? When pattern matching uses grouping operators, the Regular Expression Engine tries to "absorb" as many characters as possible to match the pattern. This is usually called greedy matching. Question marks require the Regular Expression Engine to be "lazy". If possible, match as few characters as possible in the current regular expression and leave as many characters as possible to the subsequent mode (if any ).
When a regular expression is used, a pair of parentheses can implement either of the following (or two) functions:
- Groups regular expressions;
- Match Sub-group
Common Regular Expression attributes
| Functions/methods |
Description |
| Only re Module |
|
| Compile |
Use any optional flag to compile the regular expression mode, and then return a regular expression object. |
| Methods for re module functions and regular expression objects |
|
| Match |
Try to use the pattern of a regular expression with an optional flag to match the string. If the match is successful, the matching object is returned. If the match fails, None is returned. |
| Search |
Use the regular expression that can mark the first occurrence of a string. If the match is successful, the matching object is returned. If the match fails, None is returned. |
| Findall |
Searches for all (non-repeated) regular expression modes in the string and returns a matching object. |
| Finditer |
Similar to the findall () function, the returned result is not a list but an iterator. For each match, the iterator returns a matching object. |
| Split |
Based on the pattern Separator of the regular expression, the split function splits the string into a list and returns a list of successful matches. The delimiter can be operated MAX times at most (all successfully matched positions are separated by default)
|
| Methods for re module functions and regular expression objects |
|
| Sub |
Use repl to replace the position where all regular expressions appear in the string. Unless count is defined, all occurrences are replaced. |
| Purge () |
Eliminate implicitly compiled regular expressions |
| Common matching objects |
|
| Group |
Returns the entire matching object or a special sub-group numbered num. |
| Groups |
Returns the ancestor of all matched sub-groups (null tuples are returned if the sub-groups are not successful) |
| Groupdict |
Returns a dictionary containing all matched sub-groups. All sub-groups are used as the dictionary keys. |
| Common module attributes |
|
| Re. I |
Case-insensitive matching |
Matching object and group () and groups () Methods
The objects returned by match () and search () are successfully called.
Group () either returns the entire matching object or the special sub-group as required. Groups () returns only one tuples that contain a unique or all sub-groups. If no sub-group is required, when group () still returns the entire match, groups () returns an empty tuples.
Use the match () method to match a string
The match () function tries to match the pattern from the starting part of the string.
>>> re.match('foo','foo').group()'foo'>>> re.match('foo','food on match').group()'foo'>>> re.match('foo','fodo on match').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group‘
Use search () to search for a string (comparison between search and matching)
The working mechanism of search () and match () is exactly the same, except that search uses its string parameters to search for the first matching condition in the given regular expression mode at any location.
>>> re.match('foo','sea food').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> re.search('foo','sea food').group()'foo'
Match multiple strings
>>> bt = 'bat|bet|bit'>>> re.match(bt,'bat').group()'bat'>>> >>> re.match(bt,'bit').group()'bit'>>> >>> re.match(bt,'blt').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match(bt,'he bit me').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.search(bt,'he bit me').group()'bit'
Match any single character
>>> anyend = '.end'>>> re.match(anyend,'bend').group()'bend'>>> >>> re.match(anyend,'end').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match(anyend,'\nend').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.search('.end','The end.').group()' end'>>>
Create Character Set []
>>> re.match('[cr][23][dp][o2]','c3po').group()'c3po'>>> >>> re.match('[cr][23][dp][o2]','c2do').group()'c2do'>>> >>> re.match('r2d2|c3po','c2do').group()Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match('r2d2|c3po','r2d2').group()'r2d2'>>>
Repeated, special characters, and groups
>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group()'abc-123'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group(1)'abc'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group(2)'123'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').groups()('abc', '123')>>>
>>> M = re. match ('AB', 'AB') # No Sub-Groups> m. group () # complete match 'AB' >>> m. groups () # all sub-groups >>>>> m = re. match ('(AB)', 'AB')> m. group () 'AB'> m. groups () ('AB',) >>>>> m = re. match ('(a) (B)', 'AB')> m. group () 'AB'> m. group (1) # Child group 1 'A'> m. group (2) # Sub-group 2 'B'> m. groups () ('A', 'B') >>>>>> m = re. match ('(a (B)', 'AB')> m. group () 'AB'> m. group (1) 'AB'> m. group (2) 'B' >>> m. groups () ('AB', 'B') >>>
Matches the start and end of a string and the word boundary.
>>> M = re. search ('^ the', 'the end. ') >>> m. group () 'The '>>>>> m = re. search ('^ the', 'end. the ') >>> m. group () Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'nonetype 'object has no attribute 'group' >>>>>> m = re. search (R' \ bthe ',' is the yes')> m. group () 'The '>>>>> m = re. search (R' \ bthe ', 'isthe yes') # boundary> m. group () Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'nonetype 'object has no attribute 'group' >>>>>> m = re. search (R' \ Bthe ', 'isthe yes') # No Boundaries> m. group () 'The'
Use findall () and finditer () to locate each occurrence
Findall () is used to query all non-repeated occurrences of a regular expression in a string. A list is returned.
>>> re.findall('car','car')['car']>>> re.findall('car','scary')['car']>>> re.findall('car','carry the barcardi to car')['car', 'car', 'car']>>>
Search and replace using sub () and subn ()
The two are almost the same. They both replace all the matching regular expressions in a string in some form. The part to be replaced is usually a string, but it may also be a function that returns a string to be replaced. Subn () is the same as sub (), but subn () returns a total number of replicas, the replaced string is the same as the number indicating the total number of replicas. It is returned as a tuples with two elements.
>>> re.sub('X','Mr.Smith','atten:X\n\nDear X,\n')'atten:Mr.Smith\n\nDear Mr.Smith,\n'>>> re.subn('X','Mr.Smith','atten:X\n\nDear X,\n')('atten:Mr.Smith\n\nDear Mr.Smith,\n', 2)>>> >>> re.sub('[ae]','X','abcdef')'XbcdXf'>>> re.subn('[ae]','X','abcdef')('XbcdXf', 2)>>>
Use split () to separate strings in the limited Mode
If you do not want to split the string for each appearance of the mode, you can set a value (non-zero) for the max parameter to specify the maximum number of splits.
If the given delimiter does not use special characters to match Regular Expressions in multiple modes, re. split () and str. split () work in the same way. The example is as follows:
>>> re.split(':','str1:str2:str3')['str1', 'str2', 'str3']