Python Regular expression re module detailed introduction

Source: Internet
Author: User
Tags tagname
This module provides functionality similar to regular expressions in Perl, regardless of whether the regular expression itself or the string being searched, can be Unicode characters, not to worry, Python will handle the same beautiful as ASCII characters.

The regular expression uses the backslash (\) to escape special characters so that they can match the character itself rather than specifying other special meanings. This may conflict with the literal string escape of Python, which may be confusing. For example, to match a backslash itself, you might want to use ' \\\\ ' as a string for the regular expression, because if the regular expression is \ \, and the string, each backslash is written \ \.

You can also avoid partial confusion by prefixing the string with R, because the python string that starts with R is the raw string, so all the characters inside are not escaped, such as R ' \ n ', which is a backslash plus a letter n, and ' \ n ' We know it's a newline character. Therefore, the above ' \\\\ ' You can also write R ' \ \ ', so that it should be good to understand a lot. You can look at the following paragraph:
Copy the Code code as follows:


>>> Import re
>>> s = ' \x5c ' #0x5c就是反斜杆
>>> Print S
\
>>> re.match (' \\\\ ', s) #这样可以匹配
<_sre. Sre_match Object at 0xb6949e20>
>>> re.match (r ' \ \ ', s) #这样也可以
<_sre. Sre_match Object at 0x80ce2c0>
>>> re.match (' \ \ ', s) #但是这样不行
Traceback (most recent):
File " ", line 1, in
File "/usr/lib/python2.6/re.py", line 137, in match
return _compile (pattern, flags). Match (String)
File "/usr/lib/python2.6/re.py", line 245, in _compile
Raise error, V # invalid expression
Sre_constants.error:bogus Escape (end of line)
>>>


It is also worth mentioning that the method of the re module, mostly the method of Regexobject objects, the difference between the implementation of efficiency. This is going to unfold at the end.

Regular expression syntax

The regular expression (RE) specifies a matching character set, and the function provided by this module will be used to check whether the given string matches the specified regular expression.
Regular expressions can be concatenated to form new regular expressions, such as a and B are regular expressions, and AB is also a regular expression. Generally, if the string P matches a and q matches B, then the string PQ will also match the AB, except where A or B contains boundary conditions or named group operations. In other words, a complex regular expression can be made with a simple connection.
Regular expressions can contain special characters and ordinary characters, most characters such as ' A ', ' a ' and ' 0 ' are ordinary characters, and if they are regular expressions, they will match themselves. Since regular expressions can be concatenated, the last regular expression that joins multiple ordinary characters will also match ' last '. (followed by an unquoted representation of the regular expression, quoted string)

The following is a description of the special characters of regular expressions:

'.'
The dot, in normal mode, matches any one character except for the line break, and if the Dotall tag is specified, matches any one of the characters within the line feed.

'^'
The cusp, which matches the beginning of a string, in MULTILINE mode, will also match the start of any new line.

'$'
The dollar sign, which matches the end of a string or the newline character at the very back of the string, in MULTILINE mode, also matches the end of any line. In other words, normal mode, foo.$ to search ' foo1\nfoo2\n ' will only find ' foo2′, but in MULTILINE mode, can also find ' foo1′, and use a $ to search ' foo\ n ' words, you will find two empty matches: one is the last line break, the other is the end of the string, and the demo:
Copy the Code code as follows:


>>> Re.findall (' (foo.$) ', ' foo1\nfoo2\n ')
[' Foo2 ']
>>> Re.findall (' (foo.$) ', ' foo1\nfoo2\n ', re. MULTILINE)
[' Foo1 ', ' Foo2 ']
>>> Re.findall (' ($) ', ' foo\n ')
['', '']

'*'
An asterisk that specifies that the previous re repeats 0 or more times, and always tries to match as many times as possible.

'+'
Plus, specifies that the previous re repeats 1 or more times, and always tries to match as many times as possible.

'?'
Question mark, specify to repeat the previous re 0 or 1 times, if any, and try to match 1 times.

*?, +?,??
From the previous description you can see ' * ', ' + ' and '? ' are greedy, but this may not be what we say, so, you can add a question mark in the back, change the strategy to non-greedy, only match as few re. example, realize the difference between the two:
Copy the Code code as follows:


>>> Re.findall (' < (. *) > ', '

Title

')
[' H1>title>>> Re.findall (' < (. *?) > ', '

Title

')
[' H1 ', '/h1 ']
{m}


M is a number that specifies that the previous re repeats m times.

{M,n}
Both M and n are numbers, specifying that the previous re repeats m to n times, such as a{3,5} to match 3 to 5 consecutive a. Note that if M is omitted, it will match 0 to N before the re; If you omit N, it will match N to Infinity before the re, and of course the comma in the middle cannot be omitted, otherwise it becomes the previous form.

{m,n}?
The previous said {M,n}, is also greedy, a{3,5} if there are more than 5 consecutive a words, will match 5, this can also be changed by adding a question mark. a{3,5}? If possible, only 3 a will be matched.

'\'
Anti-oblique lever, escaping ' * ', '? ' Special characters, or specify a special sequence (detailed below)
Because of the reasons described earlier, it is strongly recommended to use raw strings to express the regular.

[]
A square bracket that specifies a collection of characters. You can list the characters individually, or you can use '-' to concatenate the utilises to indicate a range. Special characters will be invalidated in brackets, such as [akm$] representing the character ' a ', ' k ', ' m ', or ' $ ', where $ is also changed to ordinary characters. [A-z] matches any lowercase letter, [a-za-z0-9] matches any letter or number. If you want to match '] ' or '-' itself, you need to escape with a backslash, or place it at the top of the bracket, such as [] to match '] '
You can also reverse a character set to match any character that is not in the character set, and the inverse action is placed at the top of the collection with a ' ^ ', and the ' ^ ' in other places will not play a special role. For example [^5] will match any character that is not ' 5 '; [^^] will match any character that is not ' ^ '.
Note: In brackets, characters such as +, *, (,) will lose their special meaning, only as normal characters. The reverse reference cannot be used within brackets.

'|'
Pipe symbols, A and B are arbitrary re, then a| B is a new re that matches a or B. Any number of re can be connected like this with a pipe symbol interval. This form can be used in groups (which will be detailed later). For the target string, the ' | ' The split re will be tested from left to right, and once a test succeeds, the back will no longer be tested, even if the subsequent re may match a longer string, in other words, ' | ' The operator is non-greedy. To match the literal ' | ', you can escape with a backslash: \|, or enclosed in parentheses: [|].

(...)
Matches the contents of the RE in parentheses and specifies the start and end positions of the group. The contents of the group can be extracted, or a special sequence such as \number can be used for subsequent matches. To match the literal ' (' and ') ', you can escape with a backslash: \ (, \), or enclosed in parentheses: [(], [)].

(?...)
This is an extended symbol of an expression. '?' The first letter determines the syntax and meaning of the entire expression except for (? P ...) , the expression does not produce a new group. Here are a few extensions that are currently supported:

(? ilmsux)
One or more letters in ' I ', ' L ', ' m ', ' s ', ' u ', ' X '. The expression does not match any characters, but specifies the appropriate flag: RE. I (ignoring case), re. L (dependent locale), re. M (multi-line mode), re. S (. Match all characters), re. U (Unicode dependent), re. X (verbose mode). There will be a special section on the differences between the different modes. This syntax can be used instead of specifying the flag parameter at the time of re.compile () or when it is called.
For example, the example above can be rewritten like this (and the re is specified. Multiline is the same effect):
Copy the Code code as follows:


>>> Re.findall (' (? m) (foo.$) ', ' foo1\nfoo2\n ')
[' Foo1 ', ' Foo2 ']


Also, be aware that the (? x) flag, if any, should be placed at the front.

(?:...)
Matches the contents of the internal re, but does not establish a group.

(? P ...)
is similar to normal parentheses, but the string to which the substring matches will be extracted with the named Name parameter. The name of the group must be a valid Python identifier, and it does not have duplicate names within this expression. The named group, like the normal group, is also extracted with numbers, which means that the name is just an extra attribute.
Demo:
Copy the Code code as follows:


>>> M=re.match (' (? P [a-za-z_]\w*] ', ' abc=123 ')
>>> m.group (' var ')
' ABC '
>>> M.group (1)
' ABC '
(? P=name)


Matches the contents of the group previously named with name.
Demo:
Copy CodeThe code is as follows:


>>> Re.match (' < (? P \w*) >.* ', '

Xxx

') #这个不匹配
>>> Re.match (' < (? P \w*) >.* ', '

Xxx

') #这个匹配
<_sre. Sre_match Object at 0xb69588e0>

(?#...)
Note, the content in parentheses is ignored.

(?=...)
If... Matches the next character, but does not consume any characters that are matched. For example ISAAC (? =asimov) will only match the ' ISAAC ' followed by ' Asimov ', which is called "forward-looking assertions".

(?! ...)
In contrast to the above, only match the next string mismatch ... The string, which is called "anti-forward assertion."

(? <= ...)
Only if the string before the current position matches ..., the entire match is valid, which is called "Looking back assertion". The string ' abcdef ' can match the regular (? <=ABC) def because it will look up to 3 characters to see if it is ABC. So the built-in sub-RE, need to be fixed-length, such as can be ABC, A|B, but not a *, a{3,4}. Note that this re never matches the beginning of the string. For example, find the word after a hyphen ('-'):
Copy the Code code as follows:


>>> m = Re.search (' (? <=-) \w+ ', ' Spam-egg ')
>>> M.group (0)
' Egg '


(? Similarly, this is called "anti-Looking back assertion", the child re needs a fixed length, meaning that the preceding string does not match ... The whole thing is just a match.

(? (id/name) Yes-pattern|no-pattern)
If there is a group specified by the ID or name, it will match yes-pattern, otherwise it will match the No-pattern, usually no-pattern can be omitted. For example: (<)? (\w+@\w+ (?: \. \w+) +) (? ( 1) >) can match ' and ' user@host.com ', but will not match ' <> < p=""><>

The following is a list of special sequences that begin with ' \ '. If a character is not listed below, then the result of RE will only match that letter itself, for example, \$ only matches the literal meaning of ' $ '.

\number
Matches the same string as the group referred to by number. The sequence number of the group starts from 1. For example: (. +) \1 can match ' the ' and ' 55 55 ', but does not match ' the end '. This sequence can have up to 99 in a regular expression, and if number starts with 0 or has more than 3 digits, it is treated as a character in octal notation. Also, this cannot be used in square brackets.

\a
Matches only the beginning of the string.

\b
Match word boundaries (including start and end), where "words" are strings of contiguous letters, numbers, and underscores. Note that the definition of \b is the junction of \w and \w, so the exact definition depends on both the Unicode and locale flags.

\b
In contrast to \b, \b matches non-word boundaries. It also relies on both the Unicode and locale flags.

\d
When the Unicode flag is not specified, the matching number is equivalent to: [0-9]. When a Unicode flag is specified, symbols that are described as strings in other Unicode libraries are also matched. Easy to understand, for example (hard to find examples ah, hehe):
Copy the Code code as follows:


#\u2076\ and u2084, respectively, are superscript 6 and subscript 4, which belong to the Unicode digit
>>> unistr = U ' \u2076\u2084abc '
>>> Print Unistr
⁶₄abc
>>> print Re.findall (' \d+ ', Unistr, re. U) [0]
⁶₄

\d
and \d opposite, not much to say.

\s
When both the Unicode and locale flags are not specified, any white-space characters are matched, equivalent to [\t\n\r\f\v]. If locale is specified, the locale-related whitespace character is also added, and if Unicode is specified, Unicode whitespace characters are added, such as the more common empty-width connection spaces (\ufeff), 0-width non-connected spaces (\u200b), and so on.

\s
Contrary to \s, I do not say much.

\w
When both the Unicode and locale flag bits are not specified, the equivalent of [a-za-z0-9_]. When locale is specified, [0-9_] is added to the letter specified by the current local. When Unicode is specified, all letters in the Unicode library are added to [0-9_].

\w
and \w opposite, not much to say.

\z
Matches only the end of the string.

Matching Search

Python provides two actions based on regular expressions: Match matches the check string from the beginning of the string for a regular match. The search checks to see if there are any matching substrings anywhere in the string (this is the default for Perl).
Note that even if the search's regular starts with ' ^ ', match and search are still a lot different.
Copy the Code code as follows:


>>> Re.match ("C", "abcdef") # mismatch
>>> Re.search ("C", "ABCdef") # match
<_sre. Sre_match Object at ...>

Properties and methods of the module

Re.compile (pattern[, flags])
Compiles a regular expression pattern into a regular object so that the match and search methods of the regular object can be used.
The behavior of the resulting regular object (that is, the pattern) can be specified with flags, and the value can be obtained from several of the following values.
The following two paragraphs are syntactically equivalent:
Copy the Code code as follows:


Prog = Re.compile (pattern)
result = Prog.match (string)
result = Re.match (pattern, string)

The difference is that after using Re.compile, the regular object will be retained, so that when the regular object needs to be used more than once, the efficiency will be greatly improved. Using the example above to demonstrate, with the same regular match the same string, executed 1 million times, it reflects the efficiency of the compile (data from my 1.86G CPU of the Shenzhou book):
Copy the Code code as follows:


>>> Timeit.timeit (
... setup= ' import re; reg = Re.compile (' < (? P \w*) >.* ')''',
... stmt= ' Reg.match ('

Xxx

')''',
... number=1000000)
1.2062149047851562
>>> Timeit.timeit (
... setup= ' import re ',
... stmt= ' Re.match (' <? P \w*) >.* ', '

Xxx

')''',
... number=1000000)
4.4380838871002197



Re. I
Re. IGNORECASE

Let the regular expression ignore the case, so that the [a-z] can match the lowercase letter. This attribute is independent of locale.

Re. L
Re. LOCALE
Make \w, \w, \b, \b, \s, and \s dependent on the current locale.

Re. M
Re. MULTILINE
The behavior that affects ' ^ ' and ' $ ' is specified later, ' ^ ' increases the start of the match for each line (that is, the position after the line break); ' $ ' increases the end of the match for each line (that is, the position before the line break).

Re. S
Re. Dotall
Influence '. ' The act, usually '. ' Matches all characters except the newline character, which after this flag is specified, can also match line breaks.

Re. U
Re. Unicode
Make \w, \w, \b, \b, \d, \d, \s, and \s dependent on the Unicode library.

Re. X
Re. VERBOSE
Using this flag, you can write a more readable regular expression: all whitespace characters except the square brackets and the backslash escape are ignored, and in each row, all the character after a normal pound sign is ignored, which makes it easy to write comments inside the regular expression. "." In other words, the following two regular expressions are equivalent:
Copy the Code code as follows:


A = Re.compile (r "" "\d + # The integral part
\. # The decimal point
\d * # Some fractional digits "" ", Re. X
b = Re.compile (r "\d+\.\d*")
Re.search (pattern, string[, flags])


Scan a string to see if there is a place to match the regular expression pattern. If found, returns an instance of Matchobject, otherwise none is returned, note that this is not the same as finding a substring of length 0. The search process is affected by the flags.

Re.match (pattern, string[, flags])

Returns an instance of the corresponding Matchobject if the beginning of the string and the regular expression pattern match, otherwise returns none

Note: To search anywhere in the string, you need to use Search () above.

Re.split (Pattern, string[, maxsplit=0])

String is split with a substring matching the pattern, and if the pattern uses parentheses, the string that is matched to the pattern will also be part of the return value list. If the maxsplit is not 0, it is split up to maxsplit substrings, and the remainder is returned completely.
Copy the Code code as follows:


>>> re.split (' \w+ ', ' Words, Words, Words. ')
[' Words ', ' Words ', ' Words ', ']
>>> re.split (' (\w+) ', ' Words, Words, Words. ')
[' Words ', ', ', ' Words ', ', ', ' Words ', '. ', ']
>>> re.split (' \w+ ', ' Words, Words, Words. ', 1)
[' Words ', ' Words, Words. ']


If there is a regular parenthesis and can match to the beginning of the string, the first item of the return value will have an empty string. The same applies to the end of the character:
Copy CodeThe code is as follows:


>>> re.split (' (\w+) ', ' ... words, words ... ')
[', ' ... ', ' words ', ', ', ' words ', ' ... ', ']


Note that split is not separated by a 0-length regular, for example:
Copy CodeThe code is as follows:


>>> re.split (' x* ', ' foo ')
[' foo ']
>>> re.split ("(? m) ^$", "foo\n\nbar\n")
[' foo\n\nbar\n ']



Re.findall (pattern, string[, flags])

Returns a string of non-overlapping substrings that match pattern in the form of a list. The string is scanned from left to right, and the returned list is matched from left to right. If the pattern contains a group, the list of matching groups is returned, and if there are multiple groups in the pattern, the groups will first form a tuple, and then the return value will be a list of tuples.
Since this function does not involve concepts such as matchobject, it should be one of the best-understood and easiest-to-use functions for beginners. Here are a few simple examples:
Copy CodeThe code is as follows:


#简单的findall
>>> re.findall (' \w+ ', ' Hello, world! ')
[' Hello ', ' world ']
#这个返回的就是元组的列表
>>> Re.findall (' (\d+) \. ( \d+) \. (\d+) \. (\d+) ', ' My IP is 192.168.0.2, and your is 192.168.0.3. ')
[(' 192 ', ' 168 ', ' 0 ', ' 2 '), (' 192 ', ' 168 ', ' 0 ', ' 3 ')]
Re. Finditer (pattern, string[, flags])


Similar to the above FindAll (), but returns an iterator to the instance of Matchobject.
Or an example to illustrate the problem:
Copy CodeThe code is as follows:

>>> for M in Re.finditer (' \w+ ', ' Hello, world! '):
... print m.group ()
...
Hello
World

Re.sub (Pattern, REPL, string[, Count])

Replace the string, match the pattern part, replace it with REPL, replace the most count (the remaining match will not be processed), and return the replaced string. If there is no string in string that matches the pattern, it will be returned intact. The REPL can be a string or a function (or refer to my previous example). If Repl is a string, the backslash will be processed, such as \ n will be converted to a newline character, and the backslash plus number will be replaced with the corresponding group, such as \6, which represents the contents of the 6th group to which pattern matches.
Example:
Copy the Code code as follows:


>>> re.sub (R ' def\s+ ([a-za-z_][a-za-z_0-9]*) \s*\ (\s*\): ',
... r ' static pyobject*\npy_\1 (void) \n{',
... ' Def myfunc (): ')
' Static pyobject*\npy_myfunc (void) \n{'


If Repl is a function, each time the pattern is matched, it will be called once, passing in a matching Matchobject object, returning a string, and filling in the returned string where it matches.
Example:
Copy CodeThe code is as follows:


>>> def dashrepl (matchobj):
... if matchobj.group (0) = = '-': return '
... else:return '-'
>>> re.sub ('-{1,2} ', Dashrepl, ' pro----gram-files ')
' Pro--gram files '


A 0-length match will also be replaced, such as:
Copy CodeThe code is as follows:


>>> re.sub (' x* ', '-', ' abcxxd ')
'-a-b-c-d-'


Specifically, in the substitution string, if there is a \g such a notation, it will match the regular named group (the one previously described, (? P ...) The definition of something like this). \g such a notation, is also a group of numbers, that is,,\g<2> general and \2 is equivalent, but in case you want to write in the back of \2 immediately after the literal meaning of 0, you can not be written \20 (because this represents the 20th group), this time must be written \g<2>0, in addition, The \g<0> represents the entire substring that is matched to.
Example:
Copy CodeThe code is as follows:


>>> re.sub ('-(\d+)-', '-\g<1>0\g<0> ', ' a-11-b-22-c ')
' A-110-11-b-220-22-c '

RE.SUBN (Pattern, REPL, string[, Count])

Just like the sub () function above, it only returns a tuple (new string, number of matches)
, or use an example to speak:
Copy the Code code as follows:


>>> re.subn ('-(\d+)-', '-\g<1>0\g<0> ', ' a-11-b-22-c ')
(' A-110-11-b-220-22-c ', 2)



Re.escape (String)

Add a backslash to the string, except for letters and numbers.
Copy CodeThe code is as follows:


>>> print re.escape (' abc123_@#$ ')
abc123\_\@\#\$

Exception Re.error

This exception is thrown if the string cannot be successfully compiled into a regular expression or if the regular expression fails during the matching process. However, if the regular expression does not match any text, the exception is not thrown.

Regular objects

The regular object is returned by Re.compile (). It has the following properties and methods.

Match (string[, pos[, Endpos])

The function is similar to the match () function of the module, and the difference is the next two parameters.
POS is the location to start the search, which defaults to 0. Endpos is the end of the search, if Endpos is smaller than the POS, the result is definitely empty. This means that only the pos-to-endpos-1 string will be searched.
Example:
Copy the Code code as follows:


>>> pattern = re.compile ("o")
>>> Pattern.match ("Dog") # start position is not O, so does not match
>>> Pattern.match ("Dog", 1) # The second character is O, so match
<_sre. Sre_match Object at ...>
Search (string[, pos[, Endpos])


The function is similar to the search () function of the module, and the POS and Endpos parameters are similar to the match () function above.

Split (string[, maxsplit=0])
FindAll (string[, pos[, Endpos])
Finditer (string[, pos[, Endpos])
Sub (REPL, string[, count=0])
Subn (Repl, string[, count=0])
These functions are consistent with the corresponding functions of the module.

Flags
When you compile this re, the specified flag bit, or 0 if no flag bit is specified.
Copy the Code code as follows:


>>> pattern = re.compile ("O", re. S|re. U
>>> Pattern.flags
48

Groups
The number of groups that the re contains.

Groupindex

A dictionary that defines the relationship between the name and ordinal of a named group.
Example: This regular has 3 groups, if matched to, the first called the area code, the last one called the extension number, the middle of the unnamed
Copy the Code code as follows:

>>> pattern = Re.compile (? P \d+)-(\d+)-(? P \d+) ")
>>> pattern.groups
3
>>> Pattern.groupindex
{' Fenjihao ': 3, ' Quhao ': 1}

Pattern

The original string to create this re, the equivalent of the source code, hehe.
Or the above regular, you can see that it will return as it is:
Copy the Code code as follows:


>>> Print Pattern.pattern
(? P \d+)-(\d+)-(? P \d+)

Match Object

Re. Matchobject is always returned true when used for Boolean judgments, so you can use the IF statement to determine if a match () is successful.
It has the following methods and properties:

Expand (Template)

As a template, the Matchobject is expanded, just like the behavior in sub (), see Example:
Copy the Code code as follows:


>>> m = Re.match (' a= (\d+) ', ' a=100 ')
>>> M.expand (' Above A is \g<1> ')
' Above a is 100 '
>>> M.expand (R ' above A is \1 ')
' Above a is 100 '



Group ([Group1, ...])

Returns one or more subgroups. If the argument is one, a substring is returned, and if there are multiple arguments, the tuples registered by multiple substrings are returned. If you do not pass any parameters, the effect will return an entire match, as it does with a 0 pass. If a groupn does not match, the corresponding location returns none. If a groupn is negative or is greater than the total number of the group, the Indexerror exception is thrown.
Copy CodeThe code is as follows:


>>> m = Re.match (r "(\w+) (\w+)", "Isaac Newton, physicist")
>>> M.group (0) # entire match
' Isaac Newton '
>>> M.group (1) # First substring
' ISAAC '
>>> M.group (2) # Second substring
' Newton '
>>> M.group (1, 2) # tuples consisting of multiple substrings
(' Isaac ', ' Newton ')

If there is one that is useful (? P ...) If this syntax names a substring, the corresponding GROUPN can also be a string of names. For example:
Copy the Code code as follows:


>>> m = Re.match (r "(? P \w+) (? P \w+) "," Malcolm Reynolds ")
>>> m.group (' first_name ')
' Malcolm '
>>> m.group (' last_name ')
' Reynolds '

If a group is matched to multiple times, only the last data can be extracted to:
Copy the Code code as follows:


>>> m = Re.match (r "(..) + "," a1b2c3 ") # matched to 3 times
>>> M.group (1) # returns the last time
' C3 '



groups ([default])

Returns a tuple that consists of all matching substrings. The default parameter, which is the default value for those groups that do not have a match, and it defaults to none
For example:
Copy CodeThe code is as follows:


>>> m = Re.match (r "(\d+) \. ( \d+) "," 24.1632 ")
>>> m.groups ()
(' 24 ', ' 1632 ')

The role of default:
Copy the Code code as follows:


>>> m = Re.match (r "(\d+) \.? (\d+)? "," 24 ")
>>> m.groups () # A second default is None
(' + ', None)
>>> m.groups (' 0 ') # Now the default is 0.
(' 24 ', ' 0 ')

Groupdict ([default])

Returns a dictionary containing the names and substrings of all named groups, the default parameter, which defaults to a group that does not have a match, and its default value is None, for example:
Copy the Code code as follows:


>>> m = Re.match (r "(? P \w+) (? P \w+) "," Malcolm Reynolds ")
>>> m.groupdict ()
{' first_name ': ' Malcolm ', ' last_name ': ' Reynolds '}



start ([group])
End ([group])

Returns the position of the substring in the original string that is matched to the group. If you do not specify group or group designation as 0, the entire match is represented. If group does not match, 1 is returned.
Equivalent for the specified m and G,m.group (g) and M.string[m.start (g): M.end (g).
Note: If group matches to an empty string, M.start (group) and M.end (group) will be equal.
For example:
Copy the Code code as follows:


>>> m = Re.search (' B (c?) ', ' CBA ')
>>> M.start (0)
1
>>> m.end (0)
2
>>> M.start (1)
2
>>> M.end (1)
2


Here is an example of removing the "remove_this" from the email address:
Copy CodeThe code is as follows:


>>> email = "Tony@tiremove_thisger.net"
>>> m = Re.search ("remove_this", email)
>>> Email[:m.start ()] + email[m.end ():]
' Tony@tiger.net '

span ([group])
Return a tuple: (M.start (group), M.end (group))

Pos
is the parameter pos to the search () or match () method passed to the re object, which represents the position where the re begins searching for a string.

Endpos
is the parameter endpos of the search () or match () method passed to the re object, representing the end position of the re-searched string.

Lastindex
The number of digits of the group that was last matched to, if not matched to, will get none.
For example: (a) B, ((a) (b)) and ((AB)) to match ' AB ', the resulting lastindex is 1. and (a) (b) to match ' AB ', the resulting lastindex is 2.

Lastgroup
The last match to the name of the group, if no match to or the last group does not have a name, will get none.

Re
Gets the regular expression object for this Match object, which is the object that executes search () or match ().

String
A string passed to search () or match ().


The following example is a little bit, the text has added a lot of my own examples, need more examples, refer to the English language (https://docs.python.org/2/library/re.html).

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.