International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Regular expressions in Python (re module)

Last Update:2018-07-29 Source: Internet

Author: User

Tags locale

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction

The regular expression itself is a small, highly specialized programming language, whereas in Python the Cheng can be called directly to implement a regular match by embedding the RE module inline. The regular expression pattern is compiled into a series of bytecode, which is then executed by a matching engine written in C. second, the common character meaning in regular expressions

1, ordinary characters and 11 meta characters:

Ordinary characters	Match itself	Abc	Abc
.	Matches any character except the newline character "\ n" (also matches a newline character in Dotall mode)	A.c	Abc
\	Escape character so that the latter character changes the original meaning	A\.c;a\\c	A.c;a\c
*	Matches a previous character 0 or more times	abc*	Ab;abccc
+	Matches the previous character 1 or infinitely times	abc+	Abc;abccc
?	Match one character 0 or 1 times	Abc?	Ab;abc
^	Matches the beginning of a string. Match the beginning of each line in multiline mode	^abc	Abc
$	Matches the end of a string, matching the end of each row in a multiline pattern	abc$	Abc
\|	Or. Match \| Any one of the left and right expressions, matching left-to-right, if \| not included in (), then its scope is the entire regular expression	Abc\|def	ABC def
{}	{m} matches the previous character m times, {m,n} matches the previous character M to n times, and if N is omitted, matches m to infinity	Ab{1,2}c	ABC ABBC
[]	Character. The corresponding position can be any character in the character set. Characters in the character set can be listed individually, or they can be given a range, such as [ABC] or [A-c]. [^ABC] Represents the reverse, that is, non-ABC. All special characters lose their original special meaning in the character set. Escape the special meaning of restoring special characters with the \ backslash.	A[bcd]e	Abe ace Ade
()	The enclosed expression will be grouped, starting at the left of the expression without encountering a grouped opening parenthesis "(", Number +1. Group expressions, as a whole, can be followed by a number of words. The \| In expression is only valid in that group.	(ABC) {2} A (123\|456) c	ABCABC a456c

Here need to emphasize the role of the backslash: back to the backslash with the metacharacters to remove special features, (the special character escape into ordinary characters) after the backslash with ordinary characters to implement special features, (that is, predefined characters) to refer to the number of the corresponding word group matching the string.

A=re.search (R ' (Tina) (FEI) haha\2 ', ' Tinafeihahafei Tinafeihahatina '). Group ()
print (a)
results:
Tinafeihahafei

2, predefined character set (can be written in the character set [...] IN)

\d	Number: [0-9]	A\bc	A1c
\d	Non-digit: [^\d]	A\dc	Abc
\s	Match any white space character:[< space >\t\r\n\f\v]	A\sc	A C
\s	Non-whitespace characters: [^\s]	A\sc	Abc
\w	Matches any character that includes an underscore: [a-za-z0-9_]	A\wc	Abc
\w	Matches non-alphabetic characters, that is, matching special characters	A\wc	A C
\a	Matches only the beginning of the string, the same ^	\aabc	Abc
\z	Matches only the end of the string, same $	Abc\z	Abc
\b	Match between \w and \w, that is, matching word boundaries to match a word boundary, which refers to the position between words and spaces. For example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '.	\babc\b A\b!bc	Space ABC Space A!bc
\b	[^\b]	A\bbc	Abc

Here we need to emphasize the understanding of \b's word boundaries:
W = re.findall (' \btina ', ' Tian Tinaaaa ')
print (w)
s = Re.findall (R ' \btina ', ' Tian Tinaaaa ')
print (s)
V = Re.findall (R ' \btina ', ' tian#tinaaaa ')
print (v)
a = Re.findall (R ' \btina\b ', ' tian#tina@aaa ')
print (a) The results of the
implementation are as follows:
[] [' Tina '] [' Tina '
]
[' Tina ']

3. Special Grouping Usage:

(? p<name>)	Group, specify an additional alias in addition to the original number	(? P<ID>ABC) {2}	Abcabc
(? P=name)	Reference alias to the <name> group match to the string	(? p<id>\d) ABC (? P=id)	1ABC1 5abc5
\<number>	The reference number for the <number> group matches to the string	(\d) abc\1	1ABC1 5abc5

Common function function in the RE module

1, compile ()

Compiles a regular expression pattern that returns the pattern of an object. (You can compile common regular expressions into regular expression objects, which can be a little more efficient.) ）

Format:

Re.compile (pattern,flags=0)

Pattern: An expression string used at compile time.

Flags compile flags that modify the way regular expressions are matched, such as case sensitivity, multiline matching, and so on. The flags that are commonly used are:

Sign	Meaning
Re. S (Dotall)	make. Match all characters, including line wraps
Re. I (IGNORECASE)	Make matching not sensitive to case
Re. L (LOCALE)	Do localized identification (locale-aware) matching, French, etc.
Re. M (MULTILINE)	Multiple lines matching, affecting ^ and $
Re. X (VERBOSE)	This flag is easier to read by giving a more flexible format to the regular expression
Re. U	Resolves characters based on the Unicode character set, which affects \w,\w,\b,\b

Import re
tt = "Tina is a good girl, she's cool, clever, and"
rr = Re.compile (R ' \w*oo\w* ')
print (RR). FindAll (TT))   #查找所有包含 ' oo ' word
execution results are as follows:
[' good ', ' cool ']

2, Match ()

Determines whether the re matches at a position where the string is just beginning. Note: This method does not match exactly. When pattern ends, string and remaining characters are still considered successful. To match exactly, you can add a boundary match ' $ ' to the end of the expression

Format:

Re.match (Pattern, string, flags=0)

Print (Re.match (' com ', ' Comwww.runcomoob '). Group ())
print (re.match (' com ', ' Comwww.runcomoob ', re.) I). Group ()
results are as follows:
com
com

3, search ()

Format:

Re.search (Pattern, string, flags=0)

The Re.search function finds pattern matches within a string, and returns none if the first match is found and then returned, if the string does not match.

Print (Re.search (' \dcom ', ' www.4comrunoob.5com '). Group ()) The
results are as follows:
4com

* Note: Match and search once the match is successful, it is a match object, and the match object has the following methods: Group () returns the string start () matched by the RE () returns the position end of the match () the SP An () returns a tuple that contains the matching (start, end) position group () returns the string that the re overall matches, can enter multiple group numbers at a time, corresponding to the string matching the group number.

A. Group () returns the string that the re whole matches.
B. Group (N,M) returns a string that matches the group number n,m, and returns a Indexerror exception if the group number does not exist
The C.groups () groups () method returns a tuple that contains all the group strings in the regular expression, from 1 to the contained group number, usually groups () does not require arguments, returns a tuple, and the tuples in the tuple are the groups defined in the regular expression.

Import re
a = "123abc456"
 print (Re.search ([0-9]*) ([a-z]*) ([0-9]*), a). Group (0))   #123abc456, return to the whole
 Print (Re.search (0-9]*) ([a-z]*) ([0-9]*), a). Group (1))   #123
 Print (Re.search ([0-9]*) ([a-z]*) ( 0-9]*) ", a). Group (2))   #abc
 Print (Re.search ([0-9]*) ([a-z]*) ([0-9]*), a). Group (3))   #456
# # #group (1) lists the first bracket matching part, Group (2) lists the second bracket matching part, and group (3) lists the third bracket matching part. ###

4, FindAll ()

Re.findall traversal match, you can get all the matching strings in the string, return a list.

Format:

Re.findall (Pattern, string, flags=0)

p = re.compile (R ' \d+ ')
print (P.findall (' O1n2m3k4 ')) The
results are as follows:
[' 1 ', ' 2 ', ' 3 ', ' 4 ']

Import re
tt = "Tina is a good girl, she's cool, clever, and"
rr = Re.compile (R ' \w*oo\w* ')
print ( Rr.findall (TT))
print (Re.findall (R ' (\w) *oo (\w) ', TT)) # () indicates that the subexpression 
executes as follows:
[' good ', ' cool ']
[(' G ', ' d '), (' C ', ' l ')]

5, Finditer ()

Searches for a string that returns an iterator that accesses each matching result (match object) sequentially. Find all the substrings that the RE matches and return them as an iterator.

Format:

Re.finditer (Pattern, string, flags=0)

ITER = Re.finditer (R ' \d+ ', ' drumm44ers drumming, 11 ... Ten ... ') for
i in ITER: print (
    i) print (
    i.group ())
    print (I.span ())
results are as follows:
<_sre. Sre_match object; span= (0, 2), match= ' >
(0, 2)
<_sre. Sre_match object; Span= (8), match= ' >
(8,)
<_sre. Sre_match object; span=, match= ' one ' > One
(
<_sre). Sre_match object; span=, match= ' a ' >
(31, 33)

6, Split ()

Returns a list after the string is split by a substring that can match.

You can use Re.split to split strings, such as: Re.split (R ' \s+ ', text), and split the string into a single word list.

Format:

Re.split (Pattern, string[, Maxsplit])

Maxsplit is used to specify the maximum number of partitions, without specifying that all will be split.

The results of print (Re.split ' \d+ ', ' one1two2three3four4five5 ')
are as follows:
[' One ', ' two ', ' three ', ' four ', ' five ', ']

7. Sub ()

Returns a replacement string after each matching substring in string is replaced with the re.

Format:

Re.sub (Pattern, REPL, string, count)

import re
text = "Jgood is a handsome boy, it is cool, clever, and"
print (Re.sub (R ' \s+ ', '-', text)
The results of the implementation are as follows:
jgood-is-a-handsome-boy,-he-is-cool,-clever,-and-so-on ...

The second function is the replacement string, in this case '-'

The fourth parameter refers to the number of replacements. The default is 0, which means that each match is replaced.

Re.sub also allows the use of functions to perform complex processing of the substitution of matches.

For example: Re.sub (R ' \s ', Lambda m: ' [' + m.group (0) + '] ', text, 0); "Replace the space in the string with ' [] '.

import re
text = "Jgood is a handsome boy, it is cool, clever, and"
print (Re.sub (R ' \s+ ', Lambda m: ' [' +M.G Roup (0) + '] ', text,0) the
results are as follows: jgood[]is[]a[]handsome[]boy,[]he[]is[-]cool,[]clever,[]and[]so[]on
...

8, subn ()

Return number of replacements

Format:

SUBN (Pattern, Repl, String, count=0, flags=0)

Print (Re.subn (' [1-2] ', ' A ', ' 123456abcdef ')) print (
re.sub ("g.t", "have", ' I get A,  I got B, I gut C '))
print ( RE.SUBN ("g.t", "have", ' I get A,  I got B, I gut C ')
perform the following results:
(' Aa3456abcdef ', 2)
I have a,  I have B, I h Ave C
(' I have A,  I have B, I have C ', 3)

four or one more notes.

1, the difference between Re.match and Re.search and Re.findall:

Re.match matches only the beginning of a string, if the string does not start with a regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

A=re.search (' [\d] ', "Abc33"). Group ()
print (a)
p=re.match (' [\d] ', ' abc33 ')
print (p)
b= Re.findall (' [\d] ', "abc33")
print (b)
results:
3
None
[' 3 ', ' 3 ']

2. Greedy match and non-greedy match

*?,+?,??, {m,n}? The front of the *,+,? etc are greedy match, that is, match as far as possible, followed by the number to make it into a lazy match

A = Re.findall (r "A (\d+?)", ' a23b ')
print (a)
B = Re.findall (R "A (\d+)", ' a23b ')
print (b)
results:
[' 2 ']
[' 23 ']

A = Re.match (' < (. *) > ', ' <H1>title<H1> '). Group ()
print (a)
B = Re.match (' < (. *?) > ', ' <H1>title<H1> '). Group ()
print (b)
results:
<H1>title<H1>
 
A = Re.findall (r "A (\d+) b", ' a3333b ')
print (a)
B = Re.findall (R "a" (\d+?) B ", ' a3333b ')
print (b) The results of the
implementation are as follows:
[' 3333 ']
[' 3333 ']
#######################
It should be noted here that if there is a limited condition before and after, there is no greedy mode, mismatched mode failure. 
3, with flags encountered in the small pits

Print (Re.split (' A ', ' 1a1a2a3 '), re. I) #输出结果并未能区分大小写
This is because Re.split (pattern,string,maxsplit,flags) defaults to four parameters, and when we pass in three parameters, the system defaults to the RE. I was the third parameter, so it didn't work. If you want to get here the re. I work, write Flags=re. I can. 
Five, regular small practice 
1. Matching telephone number

p = re.compile (R ' \d{3}-\d{6} ')
print (P.findall (' 010-628888 ')) 
2, matching IP

Re.search (R) (([01]?\d?\d|2[0-4]\d|25[0-5]) \.) {3} ([01]?\d?\d|2[0-4]\d|25[0-5]\.) "," 192.168.1.1 ") 



Reproduced from: http://www.cnblogs.com/tina-python/p/5508402.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

using regular expressions in php regular expressions perl compatible regular expressions regular expressions explained regular expressions for dummies bgp regular expressions regular expressions cookbook

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular expressions in Python (re module)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support