Learn about regular expressions in Python today. On the syntax of regular expressions, there are many studies on the Internet without much explanation. This article mainly introduces the regular expression handler functions commonly used in Python.
Method/Property |
Role |
Match () |
Determines if RE is matched at the beginning of the string |
Search () |
Scan the string to find the location of the RE match |
FindAll () |
Find all the substrings that the RE matches and return them as a list |
Finditer () |
Find all the substrings that the RE matches and return them as an iterator |
The match () function only checks to see if the RE is matched at the beginning of the string, and search () scans the entire string.
Match () only reports a successful match, it starts at 0, and if the match does not start at 0, match () will not report it.
Search () scans the entire string and reports the first match it finds.
Match (), Seerch (), Finditer () if the match succeeds, returns a Match object object that has the following properties, methods:
Method/Property |
Role |
Group () |
Returns the string that is matched by the RE |
Start () |
Returns the position where the match started |
End () |
Returns the position where the match ended |
Span () |
Returns a tuple containing the location of a match (start, end) |
Group () returns a string that matches the whole of the RE, and can enter multiple group numbers at a time, corresponding to the string matching the group number.
1. Group () returns the whole string of re-matches,
2. Group (N,M) returns a string that matches the group number n,m and returns the Indexerror exception if the group number does not exist
#!python
>>> p = re.compile (' (A (b) c) d ')
>>> m = p.match (' ABCD ')
>>> M.group (0)
' ABCD '
>>> M.group (1)
' ABC '
>>> M.group (2)
' B '
The groups () method returns a containing
all groups in regular expressionsA tuple of strings, from 1 to the included group number,
typically groups () does not require parameters, and returns a tuple in which the tuple is defined in a regular expression.
#!python
>>> p = re.compile (' (A (b) c) d ')
>>> m = p.match (' ABCD ')
>>> m.groups ()
(' abc ', ' B ')
Use the index to get the appropriate group content, for example: m.groups () [0]
P2=re.compile (R "' (\d) +\w", Re. X
>>> p2.match (' 123a b12123c '). Group () # Re regular expression ' (\d) +\w matching string
' 123a '
>>> p2.match (' 123a b12123c '). Group (0)
' 123a '
>>> p2.match (' 123a b12123c '). String matched by group (1) #返回正则表达式中第一个小组即 (\d)
' 3 '
>>> P2.match (' 123a B12 123c '). Groups ()
(' 3 ',)
Re.match, matches from the beginning of the string, returns a match Object, or none
Re.match tries to match a pattern from the beginning of the string, such as: The following example matches the first word.
Import Retext = "Jgood is a handsome boy, he's cool, clever, and so on ..." M = Re.match (r "(\w+) \s", text) if M:print M.grou P (0), ' \ n ', M.group (1) else:print ' not match '
Re.match's function prototype is: Re.match (pattern, string, flags)
The first parameter is a regular expression, here is "(\w+) \s", if the match succeeds, returns a match, otherwise returns a none;
The second parameter represents the string to match;
The third parameter is the Peugeot bit, which controls how regular expressions are matched, such as case sensitivity, multiline matching, and so on.
Method/Property |
Role |
Group () |
Returns the string that is matched by the RE |
Start () |
Returns the position where the match started |
End () |
Returns the position where the match ended |
Span () |
Returns a tuple containing the location of a match (start, end) |
Re.search finds a match within a string, finds the first match, returns a match Object, or none
The Re.search function looks for pattern matching within a string until the first match is found and then returns none if the string does not match.
Import Retext = "Jgood is a handsome boy, he's cool, clever, and so on ..." M = Re.search (R ' \shan (ds) ome\s ', text) if M:prin T M.group (0), M.group (1) else:print ' not search '
Re.search's function prototype is: Re.search (pattern, string, flags)
Each parameter has the same meaning as Re.match.
the difference between Re.match and Re.search:Re.match matches only the beginning of the string, if the string starts not conforming to the regular expression, the match fails, the function returns none, and the Re.search matches the entire string until a match is found.
Re.sub replaces all occurrences, returns a replacement string, returns the original string if the match fails
The re.sub is used to replace a match in a string. The following example replaces a space in a string with a '-':
Re.sub's function prototype is: re.sub (Pattern, REPL, string, count)
Where the second function is the replaced string, in this case '-'
The fourth parameter refers to the number of replacements. The default is 0, which means that each match is replaced.
Re.sub also allows for complex processing of replacements for matches using functions. such as: Re.sub (R ' \s ', Lambda m: ' [' + m.group (0) + '] ', text, 0); Replace the space in the string ' ' with ' [] '.
The sub () method provides a replacement value, which can be a string or a function, and a string to be processed
When a module-level re.sub () function is used, the pattern is used as the first parameter. The pattern may be a string or a ' regexobject '; If you need to specify a regular expression flag, you must either use ' Regexobject ' to do the first argument, or use the pattern inline modifier, such as Sub ("(? i) B +", "X", "BBBB bbbb") Returns ' X x '.
Import re
def hexrepl (Match):
"Return the hex string for a decimal number"
value = Int (Match.group ())
return Hex (value)
p = re.compile (R ' \d+ ')
Print p.sub (HEXREPL, ' Call 65490 for printing, 49152 for user code. ')
#Call 0xffd2 for printing, 0xc000 for user code.
Import re
Text = "Jgood is a handsome boy, he's cool, clever, and so on ..."
Print re.sub (R ' \s+ ', '-', text)
#JGood-is-a-handsome-boy,-he-is-cool,-clever,-and-so-on ...
Print re.sub (R ' \s ', Lambda m: ' [' + m.group (0) + '] ', text)
#JGood []is[]a[]handsome[]boy,[]he[]is[]cool,[]clever,[]and[]so[]on ...
Print re.sub (R ' a ', Lambda m: ' [' + m.group (0) + '] ', text) #在a的两边加 [], can also be used with String.Replace ()
#JGood is [a] h[a]ndsome boy, he's cool, clever, [a]nd so on ...
SUBN () is the same as sub (), but returns a new string and number of replacements
Print re.subn (' i ', ' I ', ' Paris in the Spring ') # (' Paris in the Spring ', 3)
Empty matches are replaced only if they are not next to the previous match.
#!python
>>> p = re.compile (' x* ')
>>> p.sub ('-', ' abxd ')
'-a-b-d-'
Re.split returning a segmented string as a list
You can use Re.split to split a string, such as: Re.split (R ' \s+ ', text), and divide the string into a word list by space.
Split (string [, Maxsplit = 0])
You can limit the number of shards by setting the Maxsplit value. When Maxsplit is nonzero, there can be only maxsplit shards, and the remainder of the string is returned as the last part of the list. In the following example, the delimiter can be any sequence of non-alphanumeric alphabetic characters.
#!python
>>> p = re.compile (R ' \w+ ')
>>> P.split (' This was a test, short and sweet, of Split (). ')
[' This ', ' was ', ' a ', ' test ', ' short ', ' and ', ' Sweet ', ' of ', ' split ', ']
>>> P.split (' This was a test, short and sweet, of Split (). ', 3)
[' This ', ' was ', ' a ', ' test, short and sweet, ' of Split () '.
Sometimes, you are not only interested in the text between delimiters, you also need to know what the delimiter is. delimiters can be any sequence of non-alphanumeric alphabetic characters, and if the capturing brackets are used in Re, their (delimiter) values are returned as part of the list. Compare the following calls:
Re.split ("([AB])", "Carbs") # [' C ', ' A ', ' R ', ' B ', ' s '] delimiter is a or B, the result returns the qualifier A, B.
Re.split ("([ab]#)", "Carbs") # [' Carbs '] delimiter is a# or b#, result [' carbs ']
#!python
>>> p = re.compile (R ' \w+ ')
>>> P2 = re.compile (R ' (\w+) ')
>>> p.split (' This ... is a test. ')
[' This ', ' is ', ' a ', ' test ', ']
>>> p2.split (' This ... is a test. ')
[' This ', ' ... ', ' is ', ', ' a ', ' ', ' test ', '. '
Re.findall returns all matching strings as a list
Re.findall can get all the matching strings in the string. such as: Re.findall (R ' \w*oo\w* ', text); Gets all the words in the string that contain ' oo '.
Pattern matches pattern and gets this match
Import re
Text = "Jgood is a handsome boy,he are handsome and cool,clever,and so on ...."
Print Re.findall (R ' \w*oo\w* ', text) #结果: [' jgood ', ' cool ']
Print Re.findall (R ' (\w) *oo (\w) * ', Text) # () indicates the result of the subexpression: [(' G ', ' d '), (' C ', ' l ')]
In Python 2.2, you can also use the Finditer () method.
#!python>>> iterator = P.finditer (' Drummers drumming, 11 ... (... ') >>> Iterator<callable-iterator object at 0x401833ac>>>> to match in iterator: ... Print Match.group (), Match.span () ... 12 (0, 2) 11 (22, 24) 10 (29, 31)
Re.compile
You can compile a regular expression into a regular expression object. It is possible to compile regular expressions that are often used as regular expression objects, which can improve some efficiency. Here is an example of a regular expression object:
Import Retext = "Jgood is a handsome boy, he's cool, clever, and so on ..." regex = Re.compile (R ' \w*oo\w* ') print Regex.find All (text) #查找所有包含 ' oo ' word print regex.sub (lambda m: ' [' + m.group (0) + '] ', text) #将字符串中含有 ' oo ' words are enclosed in [].
Transferred from: http://www.python8.org/a/fenleiwenzhang/yuyanjichu/2009/0901/150.html
-
Top
-
0
-
Step
Python in Re (regular expression) module function learning