Python regular expression,

Last Update:2017-09-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular Expressions provide the basis for advanced text pattern matching, extraction, and/or text search and replacement. Supports regular expressions through the re module in the standard library.

Common Regular Expression symbols and special characters

Representation	Description	Regular Expression example
Symbol
Re1 \| re2	Match the regular expression re1 or re2	Foo \| bat
.	Match any character (except \ n)	B. B
^	Start part of the matched string	^ Dear
$	End Part of matching string	/Bin/* sh $
*	Matches the regular expression that appears before 0 or multiple times.	[A-Za-z0-9] *
+	Match the regular expression that appears before one or more times	[A-z] + \. com
?	Matches the regular expression that appears before 0 or 1 times.	Goo?
{N}	Match the regular expression that appears before N times	[0-9] {3}
{M, N}	Match the regular expression that appears before a M-N	[0-9] {5, 9}
[...]	Match any single character from Character Set	[Aeiou]
[... X-y...]	Match any single character in the x-y range	[0-9]
[^...]	Does not match any character in this character set, including characters in a certain range (if this character set appears)	[^ Aeiou]
(...)	Match the Closed Regular Expression and save it as a sub-group	([0-9] {3 })?
Special characters
\ D	Match any decimal number, consistent with [0-9] (\ D is opposite to \ d, does not match any non-numeric number)	Data \ d0000.txt
\ W	Match any alphanumeric character, same as [A-Za-z0-9] (opposite to \ W)	[A-Za-z] \ w
\ S	Match any space character, same as [\ n \ t \ r \ v \ f] (opposite to \ S)	Of \ she
\ B	Match any word boundary (\ B opposite)	\ BThe \ B
\ A (\ Z)	Start (end) of the matched string)	\ ADear

If the question mark is followed by any match using the closed operator, it requires that the regular expression engine match as few times as possible.

What does the minimum number of times mean? When pattern matching uses grouping operators, the Regular Expression Engine tries to "absorb" as many characters as possible to match the pattern. This is usually called greedy matching. Question marks require the Regular Expression Engine to be "lazy". If possible, match as few characters as possible in the current regular expression and leave as many characters as possible to the subsequent mode (if any ).

When a regular expression is used, a pair of parentheses can implement either of the following (or two) functions:

Groups regular expressions;
Match Sub-group

Common Regular Expression attributes

Functions/methods	Description
Only re Module
Compile	Use any optional flag to compile the regular expression mode, and then return a regular expression object.
Methods for re module functions and regular expression objects
Match	Try to use the pattern of a regular expression with an optional flag to match the string. If the match is successful, the matching object is returned. If the match fails, None is returned.
Search	Use the regular expression that can mark the first occurrence of a string. If the match is successful, the matching object is returned. If the match fails, None is returned.
Findall	Searches for all (non-repeated) regular expression modes in the string and returns a matching object.
Finditer	Similar to the findall () function, the returned result is not a list but an iterator. For each match, the iterator returns a matching object.
Split	Based on the pattern Separator of the regular expression, the split function splits the string into a list and returns a list of successful matches. The delimiter can be operated MAX times at most (all successfully matched positions are separated by default)
Methods for re module functions and regular expression objects
Sub	Use repl to replace the position where all regular expressions appear in the string. Unless count is defined, all occurrences are replaced.
Purge ()	Eliminate implicitly compiled regular expressions
Common matching objects
Group	Returns the entire matching object or a special sub-group numbered num.
Groups	Returns the ancestor of all matched sub-groups (null tuples are returned if the sub-groups are not successful)
Groupdict	Returns a dictionary containing all matched sub-groups. All sub-groups are used as the dictionary keys.
Common module attributes
Re. I	Case-insensitive matching

Matching object and group () and groups () Methods

The objects returned by match () and search () are successfully called.

Group () either returns the entire matching object or the special sub-group as required. Groups () returns only one tuples that contain a unique or all sub-groups. If no sub-group is required, when group () still returns the entire match, groups () returns an empty tuples.

Use the match () method to match a string

The match () function tries to match the pattern from the starting part of the string.

>>> re.match('foo','foo').group()'foo'>>> re.match('foo','food on match').group()'foo'>>> re.match('foo','fodo on match').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group‘

Use search () to search for a string (comparison between search and matching)

The working mechanism of search () and match () is exactly the same, except that search uses its string parameters to search for the first matching condition in the given regular expression mode at any location.

>>> re.match('foo','sea food').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> re.search('foo','sea food').group()'foo'

Match multiple strings

>>> bt = 'bat|bet|bit'>>> re.match(bt,'bat').group()'bat'>>> >>> re.match(bt,'bit').group()'bit'>>> >>> re.match(bt,'blt').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match(bt,'he bit me').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.search(bt,'he bit me').group()'bit'

Match any single character

>>> anyend = '.end'>>> re.match(anyend,'bend').group()'bend'>>> >>> re.match(anyend,'end').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match(anyend,'\nend').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.search('.end','The end.').group()' end'>>>

Create Character Set []

>>> re.match('[cr][23][dp][o2]','c3po').group()'c3po'>>> >>> re.match('[cr][23][dp][o2]','c2do').group()'c2do'>>> >>> re.match('r2d2|c3po','c2do').group()Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> >>> re.match('r2d2|c3po','r2d2').group()'r2d2'>>>

Repeated, special characters, and groups

>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group()'abc-123'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group(1)'abc'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').group(2)'123'>>> re.match('(\w\w\w)-(\d\d\d)','abc-123').groups()('abc', '123')>>>

>>> M = re. match ('AB', 'AB') # No Sub-Groups> m. group () # complete match 'AB' >>> m. groups () # all sub-groups >>>>> m = re. match ('(AB)', 'AB')> m. group () 'AB'> m. groups () ('AB',) >>>>> m = re. match ('(a) (B)', 'AB')> m. group () 'AB'> m. group (1) # Child group 1 'A'> m. group (2) # Sub-group 2 'B'> m. groups () ('A', 'B') >>>>>> m = re. match ('(a (B)', 'AB')> m. group () 'AB'> m. group (1) 'AB'> m. group (2) 'B' >>> m. groups () ('AB', 'B') >>>

Matches the start and end of a string and the word boundary.

>>> M = re. search ('^ the', 'the end. ') >>> m. group () 'The '>>>>> m = re. search ('^ the', 'end. the ') >>> m. group () Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'nonetype 'object has no attribute 'group' >>>>>> m = re. search (R' \ bthe ',' is the yes')> m. group () 'The '>>>>> m = re. search (R' \ bthe ', 'isthe yes') # boundary> m. group () Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'nonetype 'object has no attribute 'group' >>>>>> m = re. search (R' \ Bthe ', 'isthe yes') # No Boundaries> m. group () 'The'

Use findall () and finditer () to locate each occurrence

Findall () is used to query all non-repeated occurrences of a regular expression in a string. A list is returned.

>>> re.findall('car','car')['car']>>> re.findall('car','scary')['car']>>> re.findall('car','carry the barcardi to car')['car', 'car', 'car']>>>

Search and replace using sub () and subn ()

The two are almost the same. They both replace all the matching regular expressions in a string in some form. The part to be replaced is usually a string, but it may also be a function that returns a string to be replaced. Subn () is the same as sub (), but subn () returns a total number of replicas, the replaced string is the same as the number indicating the total number of replicas. It is returned as a tuples with two elements.

>>> re.sub('X','Mr.Smith','atten:X\n\nDear X,\n')'atten:Mr.Smith\n\nDear Mr.Smith,\n'>>> re.subn('X','Mr.Smith','atten:X\n\nDear X,\n')('atten:Mr.Smith\n\nDear Mr.Smith,\n', 2)>>> >>> re.sub('[ae]','X','abcdef')'XbcdXf'>>> re.subn('[ae]','X','abcdef')('XbcdXf', 2)>>>

Use split () to separate strings in the limited Mode

If you do not want to split the string for each appearance of the mode, you can set a value (non-zero) for the max parameter to specify the maximum number of splits.

If the given delimiter does not use special characters to match Regular Expressions in multiple modes, re. split () and str. split () work in the same way. The example is as follows:

>>> re.split(':','str1:str2:str3')['str1', 'str2', 'str3']

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python regular expression,

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support