RegexObject
Instances have several methods and properties. Here are just a few of the most important ones, if you want to see the full list please refer to the Python Library Reference
Method/Property |
function |
Match () |
Determines if RE is matched at the beginning of the string |
Search () |
Scan the string to find the location of the RE match |
FindAll () |
Find all the substrings that the RE matches and return them as a list |
Finditer () |
Find all the substrings that the RE matches and return them as an iterator |
If no match is reached, match () and search () will return none. If successful, an instance is returned MatchObject
with the matching information: where it starts and ends, the substring it matches, and so on.
You can learn it by using a man-machine conversation and experimenting with the RE module. If you have Tkinter, you might consider referring to tools/scripts/redemo.py, a demonstration program included in the Python release.
First, run the Python interpreter, import the re module, and compile a re:
#!pythonPython 2.2.2 (#1, Feb 10 2003, 12:57:01)>>> import re>>> p = re.compile(‘[a-z]+‘)>>> p<_sre.SRE_Pattern object at 80c3c28>
Now, you can try to match the different strings with the [a-z]+] of the RE. An empty string will not match at all, because the + means "one or more repetitions". In this case, match () returns none because it causes the interpreter to have no output. You can clearly print out the results of match () to figure this out.
#!python>>> p.match("")>>> print p.match("")None
Now, let's try to match a string, such as "tempo", with it. At this point, match () returns a matchobject. So you can save the results in a variable for use in the back side.
#!python>>> m = p.match( ‘tempo‘)>>> print m<_sre.SRE_Match object at 80c4f68>
Now you can query the MatchObject
relevant information about the matching string. Matchobject instances also have several methods and properties, the most important of which are as follows:
Method/Property |
function |
Group () |
Returns the string that is matched by the RE |
Start () |
Returns the position where the match started |
End () |
Returns the position where the match ended |
Span () |
Returns a tuple containing the location of a match (start, end) |
Try these methods soon to be clear about their role:
#!python>>> m.group()‘tempo‘>>> m.start(), m.end()(0, 5)>>> m.span()(0, 5)
Group () returns a substring that matches the RE. Start () and end () return the index at the start and end of the match. Span () returns the start and end indexes with a single tuple. Because the match method checks that if RE starts to match at the beginning of the string, start () will always be zero. However, the RegexObject
search method of the instance scans the following string, in which case the position of the match start may not be zero.
#!python>>> print p.match(‘::: message‘)None>>> m = p.search(‘::: message‘) ; print m<re.MatchObject instance at 80c9650>>>> m.group()‘message‘>>> m.span()(4, 11)
In the actual program, the most common practice is to MatchObject
save in a variable, and then check whether it is None, which is usually as follows:
#!pythonp = re.compile( ... )m = p.match( ‘string goes here‘ )if m:print ‘Match found: ‘, m.group()else:print ‘No match‘
Two RegexObject
methods return substrings of all matching patterns. FindAll () returns a matching string row table:
#!python>>> p = re.compile(‘\d+‘)>>> p.findall(‘12 drummers drumming, 11 pipers piping, 10 lords a-leaping‘)[‘12‘, ‘11‘, ‘10‘]
FindAll () has to create a list when it returns results. In Python 2.2, you can also use the Finditer () method.
#!python>>> iterator = p.finditer(‘12 drummers drumming, 11 ... 10 ...‘)>>> iterator<callable-iterator object at 0x401833ac>>>> for match in iterator:... print match.span()...(0, 2)(22, 24)(29, 31)
Perform a match