Python--Regular expressions
A regular expression is a special sequence of characters that can help you easily check whether a string matches a pattern.
Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern.
The RE module enables the Python language to have all the regular expression functionality.
The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution.
The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.
This article mainly introduces the regular expression processing functions commonly used in Python.
Re.match function
Re.match attempts to match a pattern from the starting position of the string, and if the match is not successful, match () returns none.
function Syntax :
Re.match (Pattern, string, flags=0)
Function parameter Description:
The match succeeds Re.match method returns a matching object, otherwise none is returned.
We can use the group (NUM) or groups () matching object function to get a matching expression.
#-*-coding:utf-8-*- ImportRePrint(Re.match ('www','www.runoob.com'). span ())#match at start position (0,3)Print(Re.match ('com','www.runoob.com')) #Do not match none in start position
ImportRe line="Cats is smarter than dogs"Matchobj= Re.match (r'(. *) is (. *?). *', line, re. m|Re. I)ifMatchobj:Print "Matchobj.group ():", Matchobj.group ()Print "Matchobj.group (1):", Matchobj.group (1) Print "Matchobj.group (2):", Matchobj.group (2)Else: Print "No match!!"#Output#Matchobj.group (): Cats is smarter than dogs#Matchobj.group (1): Cats#Matchobj.group (2): Smarter
Re.search method
Re.search scans the entire string and returns the first successful match.
function Syntax:
Re.search (Pattern, string, flags=0)
Function parameter Description:
The match succeeds Re.search method returns a matching object, otherwise none is returned.
We can use the group (NUM) or groups () matching object function to get a matching expression.
#-*-coding:utf-8-*- ImportRePrint(Re.search ('www','www.runoob.com'). span ())#match at starting position (0, 3)Print(Re.search ('com','www.runoob.com'). span ())#does not match the starting position (one, one)
ImportRe line="Cats is smarter than dogs"; Searchobj= Re.search (r'(. *) is (. *?). *', line, re. m|Re. I)ifSearchobj:Print "Searchobj.group ():", Searchobj.group ()Print "Searchobj.group (1):", Searchobj.group (1) Print "Searchobj.group (2):", Searchobj.group (2)Else: Print "Nothing found!!"#Results#Searchobj.group (): Cats is smarter than dogs#Searchobj.group (1): Cats#Searchobj.group (2): Smarter
The difference between Re.match and Re.search
Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.
ImportRe line="Cats is smarter than dogs"; Matchobj= Re.match (r'Dogs', line, re. m|Re. I)ifMatchobj:Print "match--matchobj.group ():", Matchobj.group ()Else: Print "No match!!"Matchobj= Re.search (r'Dogs', line, re. m|Re. I)ifMatchobj:Print "Search--Matchobj.group ():", Matchobj.group ()Else: Print "No match!!"#Results#No match!!#Search--Matchobj.group (): Dogs
Retrieving and replacing
The Python re module provides re.sub to replace matches in a string.
Grammar:
Re.sub (Pattern, Repl, String, count=0, flags=0)
Parameters: The first three are required parameters, the latter two are optional parameters
- Pattern: The modal string in the regular.
- REPL: The replacement string, or a function.
- String: The original string to be looked up for replacement.
- Count: The maximum number of times a pattern match is replaced, and the default of 0 means that all matches are replaced.
- Flags: Flag bit
ImportRe Phone="2004-959-559 # This is a foreign phone number" #Delete a python comment from a stringnum = Re.sub (r'#.*$',"", phone)Print "The phone number is:", Num#Delete A string that is not a number (-)num = Re.sub (r'\d',"", phone)Print "The phone number is:", Num#Results#phone number is: 2004-959-559#phone number is: 2004959559
The REPL parameter is a function
The matching numbers in the string are multiplied by 2 in the following instance:
#-*-coding:utf-8-*- ImportRe#multiply the number of matches by 2defdouble (matched): Value= Int (Matched.group ('value')) returnSTR (Value * 2) s='a23g4hfd567'Print(Re.sub ('(? p<value>\d+)', double, s))#Value = 23, 4, 567, and so on, call double again to return values that are multiplied by 2#Results#a46g8hfd1134
Regular expression modifier-optional flag
A regular expression can contain some optional flag modifiers to control the pattern that is matched. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|). such as Re. I | Re. M is set to the I and M flags:
Regular expression pattern
A pattern string uses a special syntax to represent a regular expression:
Letters and numbers denote themselves. The letters and numbers in a regular expression pattern match the same string.
Most letters and numbers have a different meaning when they are put in front of a backslash.
Punctuation marks only match themselves if they are escaped, otherwise they represent special meanings.
Backslashes themselves need to be escaped with backslashes.
Because regular expressions usually contain backslashes, you might want to use the original string to represent them. The pattern element (such as R ' \ t ', equivalent to ' \\t ') matches the corresponding special character.
The following table lists the special elements in the regular expression pattern syntax. If you use the pattern while providing optional flag parameters, the meaning of some pattern elements will change.
Regular expression Instance character matching
Character class
Special character Classes
Regular Expression instances:
Importreline="Cats is smarter than dogs"Matchobj= Re.match (r'(. *) is (. *?). *', line, re. m|Re. I)ifMatchobj:Print "Matchobj.group ():", Matchobj.group ()Print "Matchobj.group (1):", Matchobj.group (1) Print "Matchobj.group (2):", Matchobj.group (2)Else: Print "No match!!"#Regular Expressions:R'(. *) is (. *?). *'
Analytical:
First, this is a string, preceded by an R to indicate that the string is a non-escaped original string, so that the compiler ignores the backslash, which is, the escape character is ignored. But there is no backslash in this string, so this R is optional.
- (. *) first matching grouping,. * represents all characters except the line break.
- (.*?) The second matching grouping,. *? Multiple question marks, representing non-greedy mode, that is, matching only the minimum characters that match the criteria
- The next one . * No parentheses surround, so not grouped, the match effect is the same as the first, but not counted in the match result.
Matchobj.group () is equivalent to Matchobj.group (0), which represents the full text character that is matched to
Matchobj.group (1) Gets the first set of matching results, i.e. (. *) to the
Matchobj.group (2) Gets the second set of matching results, i.e. (. *?) Match to the
There are only two groups in the match result, so if you fill 3, you will get an error.
Results
Matchobj.group (): Cats is smarter than Dogsmatchobj.group (1): catsmatchobj.group (2): Smarter
Python--Regular expressions