Regular Expressions (Python)

Last Update:2016-10-16 Source: Internet

Author: User

Tags locale setting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Understanding Regular Expressions

A regular expression is a logical formula for a string operation, which is a "rule string" that is used to express a filter logic for a string, using predefined specific characters and combinations of these specific characters. Regular expressions are a very powerful tool for matching strings, and in other programming languages there is also the concept of regular expressions, and Python is no exception, and using regular expressions, we want to extract what we want from the returned page content.

The approximate matching process of regular expressions

Take out the expression in turn and compare the characters in the text,
If each character matches, the match succeeds, and the match fails once there is a character that matches unsuccessfully.
If there are quantifiers or boundaries in an expression, the process is slightly different.

Syntax rules for regular expressions (Python)

Greedy mode and non-greedy mode of quantitative words

Regular expressions are typically used to find matching strings in text. The number of words in Python is greedy by default (which may be the default non-greedy in a few languages), always trying to match as many characters as possible, and not greedy, instead, always trying to match as few characters as possible. For example, if the regular expression "AB" is used to find "ABBBC", "abbb" will be found. If you use a non-greedy quantity word "ab?", you will find "a".

Note: We generally use non-greedy mode to extract.

Anti-slash problem

As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, you will need 4 backslashes "\ \" in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes and then escaped in the regular expression into a backslash.

The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\". Similarly, a "\d" that matches a number can be written as r "\d".

Python RE Module

Python has its own RE module, which provides support for regular expressions. The main usage examples are as follows:

# return Pattern Object Re.compile (String[,flag])   # The following functions are used to match the

PATTERNG Concept:

Pattern can be understood as a matching pattern, so how do we get this matching pattern? Very simply, we need to use the Re.compile method. For example

Pattern = Re.compile (r'hello')

In the argument we pass in the native string object, build a pattern object by compiling the compile method, and then we use this object for further matching.

In addition, you may notice another parameter, flags, explaining the meaning of this parameter here:

The parameter flag is a matching pattern, and the value can use the bitwise OR operator ' | ' To take effect at the same time, such as re. I | Re. M.

The optional values are:

? re. I (full spell: IGNORECASE): Ignoring case (full notation in parentheses, same as below)? Re. M (full spell: MULTILINE): Multiline mode, changing the behavior of '^' and '$' (see)? re. S (full spell: dotall): Point any match mode, change '. ' behavior? re. L (full spell: locale): Make the predetermined character class \w \w \b \b \s \s depends on the current locale setting? Re. U (full spell: Unicode): Make a predetermined character class \w \w \b \b \s \s \d \d depends on the character attributes of the UNICODE definition? Re. X (full spell: VERBOSE): Verbose mode. In this mode, the regular expression can be multiple lines, ignore whitespace characters, and can be added to comments.

We need to use this pattern in a few other ways, such as Re.match, which we have described below.

Note: The following seven methods of flags also represent the meaning of the matching pattern, if the pattern generated by the flags have been indicated, then in the following method does not need to pass this parameter.

Re.match function

Re.match tries to match a pattern from the beginning of the string.

function Syntax:

Re.match (Pattern, string, flags=0)

Function parameter Description:

Parameters	Describe
Pattern	Matches a regular expression.
String	The string to match.
Flags	A flag bit that controls how regular expressions are matched, such as case sensitivity, multiline matching, and so on.

The match succeeds Re.match method returns a matching object, otherwise none is returned.

We can use the group (NUM) or groups () matching object function to get a match expression

Matching Object methods	Describe
Group (num=0)	A string that matches the entire expression, group () can enter more than one group number at a time, in which case it returns a tuple that contains the corresponding values for those groups.
Groups ()	Returns a tuple containing all the group strings, from 1 to the included group number.

Instance:

1 #!/usr/bin/python2 ImportRe3  4line ="Cats is smarter than dogs"5  6Matchobj = Re.match (r'(. *) is (. *?). *', line, re. m|Re. I)7  8 ifMatchobj:9    Print("Matchobj.group ():", Matchobj.group ())Ten    Print("Matchobj.group (1):", Matchobj.group (1)) One    Print("Matchobj.group (2):", Matchobj.group (2)) A Else: -    Print("No match!!")

The result of the above instance execution:

Matchobj.group ():  Cats is smarter than Dogsmatchobj.group (1):  catsmatchobj.group (2):  Smarter

Re.search method

Re.search tries to match a pattern from the beginning of the string.

function Syntax:

Re.search (Pattern, string, flags=0)

Instance:

1 #!/usr/bin/python2 ImportRe3  4line ="Cats is smarter than dogs";5  6Matchobj = Re.search (r'(. *) is (. *?). *', line, re. m|Re. I)7  8 ifMatchobj:9    Print("Matchobj.group ():", Matchobj.group ())Ten    Print("Matchobj.group (1):", Matchobj.group (1)) One    Print("Matchobj.group (2):", Matchobj.group (2)) A Else: -    Print("No match!!")

Execution Result:

Matchobj.group ():  Cats is smarter than Dogsmatchobj.group (1):  catsmatchobj.group (2):  Smarter

The difference between Re.match and Re.search

Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

Instance:

1 ImportRe2 3line ="Cats is smarter than dogs"4 5Matchobj = Re.match (r'Dogs', line, re. m|Re. I)6 ifMatchobj:7     Print("match--matchobj.group ():", Matchobj.group ())8 Else:9     Print("No match!!")Ten  OneSearchobj = Re.search (r'Dogs', line, re. m|Re. I) A ifSearchobj: -     Print("Search--Matchobj.group ():", Searchobj.group ()) - Else: the     Print("No match!!")

Operation Result:

-Matchobj.group ():  dogs

Retrieving and replacing

The Python re module provides re.sub to replace matches in a string.

Grammar:

Re.sub (Pattern, Repl, String, max=0)

The returned string is replaced by a match that is not repeated on the leftmost side of the re in the string. If the pattern is not found, the character will be returned unchanged.

The optional parameter count is the maximum number of times a pattern match is replaced, and count must be a non-negative integer. The default value is 0 to replace all matches.

Instance:

1 #!/usr/bin/python2 ImportRe3  4Phone ="2004-959-559 # This is Phone number"5  6 #Delete Python-style Comments7num = Re.sub (r'#.*$',"", phone)8 Print("Phone Num:", num)9  Ten #Remove anything other than digits Onenum = Re.sub (r'\d',"", phone) A Print("Phone Num:", num)

Operation Result:

Phone num:  2004-959-559phone num:  2004959559

Regular Expressions (Python)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More