Python regular expression,

Source: Internet
Author: User

Python regular expression,

Regular Expressions provide the basis for advanced text pattern matching, extraction, and/or text search and replacement. Supports regular expressions through the re module in the standard library.

Common Regular Expression symbols and special characters
Representation Description Regular Expression example
Symbol    
Re1 | re2 Match the regular expression re1 or re2 Foo | bat
. Match any character (except \ n) B. B
^ Start part of the matched string ^ Dear
$ End Part of matching string /Bin/* sh $
* Matches the regular expression that appears before 0 or multiple times. [A-Za-z0-9] *
+ Match the regular expression that appears before one or more times [A-z] + \. com
? Matches the regular expression that appears before 0 or 1 times. Goo?
{N} Match the regular expression that appears before N times [0-9] {3}
{M, N} Match the regular expression that appears before a M-N [0-9] {5, 9}
[...] Match any single character from Character Set [Aeiou]
[... X-y...]

Match any single character in the x-y range

[0-9]
[^...] Does not match any character in this character set, including characters in a certain range (if this character set appears) [^ Aeiou]
(...) Match the Closed Regular Expression and save it as a sub-group ([0-9] {3 })?
Special characters    
\ D Match any decimal number, consistent with [0-9] (\ D is opposite to \ d, does not match any non-numeric number) Data \ d0000.txt

\ W

Match any alphanumeric character, same as [A-Za-z0-9] (opposite to \ W) [A-Za-z] \ w
\ S Match any space character, same as [\ n \ t \ r \ v \ f] (opposite to \ S) Of \ she
\ B Match any word boundary (\ B opposite) \ BThe \ B
\ A (\ Z) Start (end) of the matched string) \ ADear
     

If the question mark is followed by any match using the closed operator, it requires that the regular expression engine match as few times as possible.

What does the minimum number of times mean? When pattern matching uses grouping operators, the Regular Expression Engine tries to "absorb" as many characters as possible to match the pattern. This is usually called greedy matching. Question marks require the Regular Expression Engine to be "lazy". If possible, match as few characters as possible in the current regular expression and leave as many characters as possible to the subsequent mode (if any ).

When a regular expression is used, a pair of parentheses can implement either of the following (or two) functions:

  • Groups regular expressions;
  • Match Sub-group

 

Common Regular Expression attributes

Functions/methods Description
Only re Module  
Compile Use any optional flag to compile the regular expression mode, and then return a regular expression object.
Methods for re module functions and regular expression objects  
Match Try to use the pattern of a regular expression with an optional flag to match the string. If the match is successful, the matching object is returned. If the match fails, None is returned.
Search Use the regular expression that can mark the first occurrence of a string. If the match is successful, the matching object is returned. If the match fails, None is returned.
Findall Searches for all (non-repeated) regular expression modes in the string and returns a matching object.
Finditer Similar to the findall () function, the returned result is not a list but an iterator. For each match, the iterator returns a matching object.
Split Based on the pattern Separator of the regular expression, the split function splits the string into a list and returns a list of successful matches. The delimiter can be operated MAX times at most (all successfully matched positions are separated by default)
Methods for re module functions and regular expression objects  
Sub Use repl to replace the position where all regular expressions appear in the string. Unless count is defined, all occurrences are replaced.
Purge () Eliminate implicitly compiled regular expressions
Common matching objects  
Group Returns the entire matching object or a special sub-group numbered num.
Groups Returns the ancestor of all matched sub-groups (null tuples are returned if the sub-groups are not successful)
Groupdict Returns a dictionary containing all matched sub-groups. All sub-groups are used as the dictionary keys.
Common module attributes  
Re. I Case-insensitive matching

 

Matching object and group () and groups () Methods

The objects returned by match () and search () are successfully called.

Group () either returns the entire matching object or the special sub-group as required. Groups () returns only one tuples that contain a unique or all sub-groups. If no sub-group is required, when group () still returns the entire match, groups () returns an empty tuples.

 

Use the match () method to match a string

The match () function tries to match the pattern from the starting part of the string.

>>> re.match('foo','foo').group()'foo'>>> re.match('foo','food on match').group()'foo'>>> re.match('foo','fodo on match').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group‘

 

Use search () to search for a string (comparison between search and matching)

The working mechanism of search () and match () is exactly the same, except that search uses its string parameters to search for the first matching condition in the given regular expression mode at any location.

>>> re.match('foo','sea food').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> re.search('foo','sea food').group()'foo'

  

Match multiple strings
>>> bt = 'bat|bet|bit'>>> re.match(bt,'bat').group()'bat'>>> >>> re.match(bt,'bit').group()'bit'>>> >>> re.match(bt,'blt').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match(bt,'he bit me').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.search(bt,'he bit me').group()'bit'

 

Match any single character
>>> anyend = '.end'>>> re.match(anyend,'bend').group()'bend'>>> >>> re.match(anyend,'end').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match(anyend,'\nend').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.search('.end','The end.').group()' end'>>> 

  

Create Character Set []
>>> re.match('[cr][23][dp][o2]','c3po').group()'c3po'>>> >>> re.match('[cr][23][dp][o2]','c2do').group()'c2do'>>> >>> re.match('r2d2|c3po','c2do').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match('r2d2|c3po','r2d2').group()'r2d2'>>> 

  

Repeated, special characters, and groups
>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group()'abc-123'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group(1)'abc'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group(2)'123'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').groups()('abc', '123')>>> 

 

>>> M = re. match ('AB', 'AB') # No Sub-Groups> m. group () # complete match 'AB' >>> m. groups () # all sub-groups >>>>> m = re. match ('(AB)', 'AB')> m. group () 'AB'> m. groups () ('AB',) >>>>> m = re. match ('(a) (B)', 'AB')> m. group () 'AB'> m. group (1) # Child group 1 'A'> m. group (2) # Sub-group 2 'B'> m. groups () ('A', 'B') >>>>>> m = re. match ('(a (B)', 'AB')> m. group () 'AB'> m. group (1) 'AB'> m. group (2) 'B' >>> m. groups () ('AB', 'B') >>>

 

Matches the start and end of a string and the word boundary.
>>> M = re. search ('^ the', 'the end. ') >>> m. group () 'The '>>>>> m = re. search ('^ the', 'end. the ') >>> m. group () Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'nonetype 'object has no attribute 'group' >>>>>> m = re. search (R' \ bthe ',' is the yes')> m. group () 'The '>>>>> m = re. search (R' \ bthe ', 'isthe yes') # boundary> m. group () Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'nonetype 'object has no attribute 'group' >>>>>> m = re. search (R' \ Bthe ', 'isthe yes') # No Boundaries> m. group () 'The'

  

Use findall () and finditer () to locate each occurrence

Findall () is used to query all non-repeated occurrences of a regular expression in a string. A list is returned.

>>> re.findall('car','car')['car']>>> re.findall('car','scary')['car']>>> re.findall('car','carry the barcardi to car')['car', 'car', 'car']>>> 

 

Search and replace using sub () and subn ()

The two are almost the same. They both replace all the matching regular expressions in a string in some form. The part to be replaced is usually a string, but it may also be a function that returns a string to be replaced. Subn () is the same as sub (), but subn () returns a total number of replicas, the replaced string is the same as the number indicating the total number of replicas. It is returned as a tuples with two elements.

>>> re.sub('X','Mr.Smith','atten:X\n\nDear X,\n')'atten:Mr.Smith\n\nDear Mr.Smith,\n'>>> re.subn('X','Mr.Smith','atten:X\n\nDear X,\n')('atten:Mr.Smith\n\nDear Mr.Smith,\n', 2)>>> >>> re.sub('[ae]','X','abcdef')'XbcdXf'>>> re.subn('[ae]','X','abcdef')('XbcdXf', 2)>>> 

 

Use split () to separate strings in the limited Mode

If you do not want to split the string for each appearance of the mode, you can set a value (non-zero) for the max parameter to specify the maximum number of splits.

If the given delimiter does not use special characters to match Regular Expressions in multiple modes, re. split () and str. split () work in the same way. The example is as follows:

>>> re.split(':','str1:str2:str3')['str1', 'str2', 'str3']

  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.