There are two basic operations for regular expressions, namely, matching and substitution.
Matching is the search for a particular expression in a text string;
The substitution is to find and replace a string in a string that matches a particular expression.
1. Basic elements
A regular expression defines a series of special character elements to perform a matching action.
Regular Expression Basic character
character |
Description |
Text |
Match text string |
. |
Match any single character except for a line break |
^ |
Matches the beginning of a string |
$ |
Matches the end of a string |
In regular expressions, we can also use match qualifiers to constrain the number of matches.
Match qualifier
Maximum Match |
Minimum Match |
Description |
* |
* |
Repeat match before expression 0 or more times |
+ |
+ |
Repeat match before expression one or more times |
|
|
Repeat match before expression 0 or one time |
{m} |
{m} |
Exact repetition of the pre-expression m times |
{m,} |
{m,} |
At least repeat the pre-expression m times |
{M,n} |
{M,n} |
At least repeat the previous expression m times, at most repeat match before expression n times |
According to the above, ". *" is the maximum match, can match all the strings that can match the source string. ". *" is the minimum match, matching only the first occurrence of the string. For example: D.*g can match any string starting with D, ending with G, such as "Debug" and "debugging", or even "dog is walking". and d.* G can only match "debug", in "Dog is Walking" string, only match to "dog".
In some more complex matches, we are available to groups and operators.
Groups and operators
Group |
Description |
[...] |
Matches a character within a set, such as [a-z],[1-9] or [,./; '] |
[^...] |
Matches all characters except the set, which is the equivalent of taking the inverse action |
a| B |
Match expression A or B, equivalent to an OR operation |
(...) |
Expressions are grouped, each pair of parentheses is a group, such as ([a-b]+) ([a-z]+) ([1-9]+)] |
\number |
Match text within the number expression group |
There is a special set of character sequences that are used to match specific character types or character contexts. such as \b matches the character boundary, food\b matches "food", "Zoofood", and "foodies" does not match.
Special character Sequences
character |
Description |
\a |
Match only the beginning of a string |
\b |
Match a word boundary |
\b |
Matches the non-boundary of a word |
\d |
matches any decimal digit character, equivalent to R ' [0-9] ' |
\d |
Matches any non-decimal numeric character equivalent to R ' [^0-9] ' |
\s |
Match any empty characters (space, tab tab, line feed, carriage return, page break, vertical line symbol) |
\s |
Match any non-whitespace character |
\w |
Match any alphanumeric character |
\w |
Match any non-alphanumeric character |
\z |
Match only the tail of a string |
\\ |
Match backslash character |
A set of statements (assertion) declares a specific event.
Regular expression declarations
declaration |
Description |
(ILMSUX) The |
matches the empty string, and the Ilmsux character corresponds to the regular expression modifier for the following table. |
(: ...) | The
matches the expression defined within the parentheses, but does not populate the character Group table. |
(p<name>) |
matches the expression defined within parentheses, but the matching expression can also be used as a symbol group for name identification. |
(p=name) |
matches all text that matches the previously named group of characters. |
(# ...) | The
introduces comments, ignoring the contents within parentheses. |
(= ...) |
if the provided text matches the next regular expression element, there is no extra text to match. This allows for advanced operations in an expression without affecting the analysis of the rest of the regular expression. If "Martin" followed by "Brown", then "Martin" =brown only with "Martin" match. |
(!...) |
matches only if the specified expression does not match the next regular expression element, yes (= ...) The inverse of the operation. |
(<= ...) |
if the prefix string for the current position of the string is the given text, the entire expression is terminated at the current position. such as the (<=ABC) def expression matches "abcdef". This match is an exact match for the number of prefix characters. |
(<!...) |
if the prefix string for the current position of the string is not the given body, it matches, yes (<= ...) The inverse of the operation. |
Regular expressions also support some processing flags, which can affect the execution of a regular method.
Handling Flags
logo |
Description |
I or ignorecase |
Ignores the case of an expression to match the text. |
2. Operation
With the RE module, we can search, extract, and replace strings in Python using regular expressions. For example, the Re.search () function can perform a basic search operation, and it can return a Matchobject object. The Re.findall () function can return a matching list.
The code is as follows:
>>> Import re
>>> a= "This is my re module test"
>>> obj = Re.search (R '. *is ', a)
>>> Print obj
< _sre. Sre_match Object at 0xb7d7a218>
>>> Obj.group ()
' This is '
>>> Re.findall (R '. *is ', a)
[' This is ']
Matchobject Object Methods
Method |
Description |
Expand (Template) |
Expands the content defined in the template with backslashes. |
M.group ([group,...]) |
Returns the matched text, which is a tuple. This text is the text that matches the group defined by the given group or by its index number, and all occurrences are returned if there is no group-specific group name. |
M.groups ([default]) |
Returns a tuple that contains the text in the pattern that matches all groups. If the default parameter is given, the default parameter value is the return value of the group that does not match the given expression. The default parameter has a value of none. |
M.groupdict ([default]) |
Returns a dictionary that contains all child groups that match. If the default parameter is given, its value is the return value of those mismatched groups. The default parameter has a value of none. |
M.start ([group]) |
Returns the start position of the specified group, or returns the start position of all matches. |
M.end ([group]) |
Returns the end position of the specified group, or returns the end position of all matches. |
M.span ([group]) |
Returns a two element group that is equivalent to a list of (M.start (group), M.end (group)) for a given group or a complete match expression |
M.pos |
The POS value passed to the match () or the search () function. |
M.endpos |
The Endpos value passed to the match () or the search () function. |
M.lastindex |
|
M.lastgroup |
|
M.re |
Create a regular object for this Matchobject object |
M.string |
A string supplied to the match () or the search () function. |
Use the sub () or SUBN () function to perform a substitution operation on a string. The basic lattice R of the sub () function is as follows:
Sub (Pattern,replace,string[,count])
Example
The code is as follows:
>>> str = ' The Dog on my Bed '
>>> rep = re.sub (' dog ', ' cat ', str)
>>> Print Rep
The Cat on my Bed
The Replace parameter can accept the function. You can use the SUBN () function to get the number of replacements. The SUBN () function returns a tuple that contains the substituted text and the number of substitutions.
If we need to do multiple matches with the same regular, we can compile the regular form into internal language and improve the processing speed. The compiled regular is implemented using the compile () function. The basic format of the compile () function is as follows: Compile (Str[,flags])
STR indicates a regular string to compile, and flags is a modifier marker. The regular form is compiled into an object that has several methods and properties.
Regular-Object Methods/Properties
Method/Property |
Description |
R.search (String[,pos[,endpos]) |
Same as the search () function, but this function allows you to specify the start and end of the search |
R.match (String[,pos[,endpos]) |
With the match () function, but this function allows you to specify the start and end of the search |
R.split (String[,max]) |
The same split () function |
R.findall (String) |
Same FindAll () function |
R.sub (Replace,string[,count]) |
Same sub () function |
R.SUBN (Replace,string[,count]) |
Same subn () function |
R.flags |
Flags defined when creating an object |
R.groupindex |
Map the name of the symbol group defined by R ' (Pid) to the dictionary of the Group ordinal |
R.pattern |
The mode used when creating the object |
Escape string with the Re.escape () function.
Getting object references through GetAttr
The code is as follows:
>>> li=[' A ', ' B ']
>>> GetAttr (li, ' append ')
>>> GetAttr (li, ' append ') (' C ') #相当于li. Append (' C ')
>>> Li
[' A ', ' B ', ' C ']
>>> handler=getattr (li, ' append ', None)
>>> Handler
< built-in method Append of list object at 0xb7d4a52c>
>>> handler (' cc ') #相当于li. Append (' cc ')
>>> Li
[' A ', ' B ', ' C ', ' CC ']
>>> result = Handler (' BB ')
>>> Li
[' A ', ' B ', ' C ', ' cc ', ' BB ']
>>> Print Result
None
This article is from the "Big Plum" blog, make sure to keep this source http://n1lixing.blog.51cto.com/11772222/1954242
Python Regular Expressions