Python Regular Expressions

Last Update:2017-09-19 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular expressions provide the basis for advanced text pattern matching, extraction, and/or text-based search and replace functionality. Regular expressions are supported through the RE module in the standard library.

Common regular expression symbols and special characters

notation	Describe	Regular expression Examples
Symbol
Re1\|re2	Match regular expression Re1 or Re2	Foo\|bat
.	Match any character (except \ n)	B.b
^	Match the starting part of a string	^dear
$	Match the terminating part of a string	/bin/*sh$
*	Matches 0 or more occurrences of the preceding regular expression	[a-za-z0-9]*
+	Matches 1 or more occurrences of the preceding regular expression	[a-z]+\.com
?	Matches a regular expression that appears before 0 or 1 times	Goo?
N	Matches regular expressions that appear before n times	[0-9] {3}
{M,n}	Matches regular expressions that appear before m-n	[0-9] {5,9}
[...]	Match any single character from a character set	[Aeiou]
[... x-y ...]	Matches any single character in the X-y range	[0-9]
[^...]	does not match any one of the characters appearing in this character set, including a range of characters (if present in this character set)	[^aeiou]
(...)	Match a closed regular expression, and then save as a child group	([0-9]{3})?
Special characters
\d	Matches any decimal number, consistent with [0-9] (\d and \d, does not match any non-numeric number)	Data\d+.txt
\w	Matches any alphanumeric character, same as [a-za-z0-9] (as opposed to \w)	[a-za-z]\w
\s	Matches any space character, same as [\n\t\r\v\f] (opposite to \s)	Of\she
\b	Match any word boundary (\b opposite)	\bthe\b
\a (\z)	Start of Match string (end)	\adear

If the question mark is immediately followed by any match using the closing operator, it will directly require the regular expression engine to match as few times as possible.

What does it mean to be as few times as possible? When pattern matching uses the grouping operator, the regular expression engine attempts to "absorb" as many characters as possible to match the pattern. This is often called a greedy match. The question mark requires the regular expression engine to be "lazy" and, if possible, to match as few characters as possible in the current regular expression, leaving as many characters as possible to the subsequent pattern, if any.

When using regular expressions, a pair of parentheses can implement any of the following (or two) functions:

The regular expressions are grouped;
Matching subgroups

Common Regular Expression properties

Functions/Methods	Describe
Just the RE module
Compile	Compiles the pattern of the regular expression with any optional markup, and then returns a regular Expression object
Re-module functions and methods of regular expression objects
Match	Attempts to match a string with a pattern of regular expressions with optional tokens. If the match succeeds, the matching object is returned, and if it fails, it returns none.
Search	Use a regular expression that is the first occurrence of a tagged search string. If the match succeeds, the matching object is returned and none is returned if it fails
FindAll	Finds all (non-repeating) occurrences of a regular expression pattern in a string and returns a matching object
Finditer	Is the same as the FindAll () function, but returns an iterator instead of a list. For each match, the iterator returns a matching object.
Split	Based on the pattern delimiter of the regular expression, the Split function splits the string into a list and then returns a list of successful matches, with the maximum number of delimiters (the default splits all successful locations)
Re-module functions and methods of regular expression objects
Sub	Replaces the position of all regular expression patterns in a string with REPL, unless count is defined, and all occurrences are replaced
Purge ()	Eliminate implicit compilation of regular expressions
Common matching objects
Group	Returns the entire matching object, or a specific subgroup numbered NUM
Groups	Returns a Ganso that contains all matching subgroups (no success, returns an empty tuple)
Groupdict	Returns a dictionary containing all matching named subgroups, with all child group names as keys to the dictionary
Common Module Properties
Re. I	Case-insensitive matching

Match objects and the group () and groups () methods

The object returned by match () and search () were successfully called.

Group () either returns the entire matching object or returns a specific subgroup as required. Groups () returns only one tuple that contains a unique or all child group. If there is no subgroup requirement, groups () returns an empty tuple when group () still returns the entire match.

Match a string using the match () method

The match () function attempts to match the pattern from the beginning of the string.

>>> re.match (' foo ', ' foo '). Group () ' foo ' >>> re.match (' foo ', ' Food on Match '). Group () ' foo ' >> > Re.match (' foo ', ' Fodo on Match '). Group () Traceback (most recent call last):  File "<stdin>", line 1, in <mo Dule>attributeerror: ' Nonetype ' object has no attribute ' group '

Use Search () to find patterns in a string (comparison of search and match)

Search () and match () work in exactly the same way, except that search uses its string arguments to find the first occurrence of a match for a given regular expression pattern at any point in time.

>>> re.match (' foo ', ' Sea food '). Group () Traceback (most recent):  File "<stdin>", line 1, in &L T;module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> re.search (' foo ', ' Sea food '). Group () ' Foo '

Match multiple strings

>>> bt = ' bat|bet|bit ' >>> re.match (BT, ' Bat '). Group () ' bat ' >>> >>> re.match (BT, ' Bit '). Group () ' bit ' >>> >>> re.match (BT, ' BLT '). Group () Traceback (most recent call last):  File " <stdin> ", line 1, in <module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> >>& Gt Re.match (BT, ' he bit me '). Group () Traceback (most recent):  File "<stdin>", line 1, in <module>att Ributeerror: ' Nonetype ' object has no attribute ' group ' >>> >>> re.search (BT, ' he bit me '). Group () ' bit '

Match any single character

>>> anyend = '. End ' >>> re.match (anyend, ' Bend '). Group () ' Bend ' >>> >>> re.match ( Anyend, ' end '). Group () Traceback (most recent):  File "<stdin>", line 1, <module> Attributeerror: ' Nonetype ' object has no attribute ' group ' >>> >>> re.match (anyend, ' \nend '). Group () Traceback (most recent):

Create a character set []

>>> Re.match (' [Cr][23][dp][o2] ', ' C3PO '). Group () ' C3PO ' >>> >>> re.match (' [Cr][23][dp][o2 ] ', ' c2do '). Group () ' C2do ' >>> >>> re.match (' R2d2|c3po ', ' c2do '). Group () Traceback (most recent call Last):

Repeating, special characters, and grouping

>>> m = re.match (' ab ', ' ab ')    #没有子组 >>> m.group ()                         #完整匹配 ' ab ' >>> m.groups ()                       #所有子组 >>> >>> m = Re.match (' (AB) ', ' ab ')    >>> m.group () ' AB ' >>> m.groups () (' AB ',) > >> >>> m= Re.match (' (a) (b) ', ' ab ') >>> m.group () ' AB ' >>> m.group (1)            # Sub-group 1 ' a ' > >> M.group (2)

Matches the beginning and end of a string and the word boundary

>>> m = re.search (' ^the ', ' the End ') >>> M.group () ' The ' >>> >>> m = re.search (' ^the ', ' end. ') >>> M.group () Traceback ( Most recent call last):  File "<stdin>", line 1, in <module>attributeerror: ' Nonetype ' object have no attrib Ute ' Group ' >>> >>> m = re.search (R ' \bthe ', ' is the Yes ') >>> M.group () ' The ' >>> > >> m = re.search (R ' \bthe ', ' isthe yes ')      #有边界 >>> m.group () Traceback (most recent call last):  File " <stdin> ", line 1, in <module>attributeerror: ' Nonetype ' object have no attribute ' group ' >>> >>& Gt m = Re.search (R ' \bthe ', ' isthe yes ')      #没有边界 >>> m.group () ' The '

Use FindAll () and Finditer () to find the location of each occurrence

FindAll () The occurrence of all non-repetition of a regular expression pattern in a query string. Always returns a list.

Use sub () and SUBN () search and replace

The two are almost the same, and all the parts of a string that match a regular expression are replaced in some way. The part used to replace is usually a string, but it can also be a function that returns a string to replace. SUBN () is the same as sub (), but SUBN () also returns a total number of replacements, followed by a replacement string and a number representing the total number of replacements as a tuple of two elements.

Use Split () to split a string in qualified mode

If you do not want to split the string for each occurrence of the pattern, you can set the maximum number of splits by setting a value (not 0) for the max parameter.

If the given delimiter is not a regular expression that uses a special symbol to match multiple patterns, then re.split () works the same way as Str.split (), as in the example below

>>> re.split (': ', ' str1:str2:str3 ') [' str1 ', ' str2 ', ' STR3 ']

Python Regular Expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More