The regular expression of Python

Last Update:2015-05-09 Source: Internet

Author: User

Tags locale locale setting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

Native string is recommended when writing regular expressions in Python

Greedy mode and non-greedy mode of quantitative words

Regular expressions are typically used to find matching strings in text. The number of words in Python is greedy by default (which may be the default non-greedy in a few languages), always trying to match as many characters as possible, and not greedy, instead, always trying to match as few characters as possible. For example: the regular expression "ab*" will find "abbb" if it is used to find "ABBBC". And if you use a non-greedy quantity word "ab*?", you will find "a".

Regular expression meta-characters

^ Start of matching string
$ match End of string
？ Repeat 0 to 1 times for previous character characters
* Repeat 0 times to infinity for the previous character
+ repeat 1 times to infinity for the previous character
{m} repeats m times for previous character
{M,n} repeats to the previous character M to n times
\d match number, equivalent to [0-9]
\d matches any non-numeric character equivalent to [^0-9]
\s matches any whitespace character equivalent to [FV]
\s matches any non-whitespace character, equivalent to [^ FV]
\w matches any alphanumeric character, equivalent to [a-za-z0-9_]
\w matches any non-alphanumeric character equivalent to [^a-za-z0-9_]
. Match any character other than line break
[...] Character sets, all of the special characters here are going to make sense (except] 、-、 ^, which can be escaped with \)

Matching mode

Re. I (re. IGNORECASE): Ignore case (full notation in parentheses, same as below)
M (MULTILINE): Multiline mode, changing the behavior of ' ^ ' and ' $ ' (see)
S (dotall): Point any matching pattern, change '. ' Behavior (. Can match across rows)
L (LOCALE): Make a predetermined character class \w \w \b \b \s \s depends on the current locale setting
U (UNICODE): Make a predetermined character class \w \w \b \b \s \s \d \d Depending on the UNICODE-defined character attribute
X (VERBOSE): Verbose mode. In this mode, the regular expression can be multiple lines, ignore whitespace characters, and can be added to comments.

Common regular expression processing functions

1.re.search (Pattern, string, flags=0)
The Re.search function looks for pattern matching within a string until the first match is found and then returns none if the string does not match.

First parameter: rule
Second argument: Represents the string to match
Third parameter: Peugeot bit, used to control how regular expressions are matched

Name="hello,my name is Kuangl,nice to meet ... " k=re.search (R'K (uan) GL', name)if  k:printk.group (0), K.group (1)else:print"sorry,not search! "------------------------- kuangl Uan

2.re.match (Pattern, string, flags=0)
Re.match tries to match a pattern from the beginning of the string, which is equal to the first word.

Name="hello,my name is Kuangl,nice to meet ... " k=re.match (R"(\h ...) " , name) if K: print k.group (0), K.group (1)else:print"  Sorry,not match! "--------------------------Hello Hello

The difference between Re.match and Re.search: Re.match matches only the beginning of the string, if the string starts not conforming to the regular expression, the match fails, the function returns none, and the Re.search matches the entire string until a match is found.

3.re.findall (Pattern, string, flags=0)
The returned result is a list of strings that match the rules, and a null value if there are no strings that match the rules.

Mail='<[email protected]> <[email protected]> [email protected]'  Re.findall (R'(\[email protected][a-z]{3})', mail)---------------------- ------------------------------['[email protected]'[ Email protected]"[email protected]']

4, Re.sub (Pattern, Repl, String, count=0)
Re.sub to replace a string match
First parameter: rule
Second parameter: replaced string
The third argument: a string
Fourth parameter: Number of replacements. The default is 0, which means that each match is replaced

test="Hi, Nice to meet."Re.sub (R'\s','-', test) re.sub (R'\s','-', test,5) ---------------------------------------'hi,-nice-to-meet-you-where-are-you-from?''Hi,-nice-to-meet-you-where is?'

5.re.split (Pattern, string, maxsplit=0)

test="Hi, Nice to meet."Re.split (R"\s+", test) Re.split (R"\s+", test,3) --------------------------------------------------['Hi,',' Nice',' to','Meet',' You','where',' is',' You','From ?']['Hi,',' Nice',' to','meet you where is your from?']

6.re.compile (pattern, flags=0)
The regular expression can be compiled into a regular object

Pattern = Re.compile (r'hello'= Pattern.match ('Hello world! ' )print  match.group ()-------------------------------------Hello

2015-05-09

The regular expression of Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More