Regular Expressions
The approximate matching process for regular expressions is to take out the expression and compare the characters in the text, and if each character matches, the match succeeds; If a match is unsuccessful, the match fails.
The RE module is used for the operation of regular expressions.
I. Functions for matching1. FindAll (pattern,string,flags=0)
Matches all eligible elements in a string. Match, returns the list type element
Import= Re.findall ('\d+','hhh90080mmmbb2233pp' )print(obj)
2. Match (Pattern,string,flags=0)
Matches the qualifying element from the starting position of the string, with a single match. Match on, return an object
Note: The element at the beginning of the string must match the regular expression, otherwise none is returned
Import= Re.match ('\d+','0008lkk') Print (Obj,type (obj)) if obj: Print (Obj.group ())
3. Search (pattern,string,flags=0)
Look for matching elements in the string, single match. Match on, return an object
Import= Re.search ('\d+','hhh90080mmmbb2233pp' )if obj: Print(Obj.group ())
4. Group () and groups ()
A grouping is done with (), and a regular expression can be grouped in parentheses
Group () Gets a string that is intercepted by more than one packet, or a whole string of all grouped matches
Groups () returns the string intercepted by all groups in the form of a tuple
ImportRea="123abc456ooo"Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (0))#The number 0 represents the entire matched substring; When the parameter is not filled, it is equivalent to group (0)Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group#When multiple parameters are specified, they are returned as tuples (' 123 ', ' abc ')Print(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (2))#returns the string ABC when a parameter is specifiedPrint(Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Groups ())#(' 123 ', ' abc ', ' 456 ')5. Sub (pattern, REPL, String, count=0, flags=0)
To replace a string that matches a regular expression
Import"Hello World 001,789 Welcome"= re.sub ('\d+ ','amy'# is replaced only once, no count will replace all print( S_c)
More powerful than str.replace ().
6. Split (pattern,string,maxsplit=0,flag=0)
Splits according to the specified regular match.
Note: If the last character match succeeds, a space will be split
Import"split1nnnnn2mmmmm3"= Re.split ('[0-9] ' # split up once, Maxsplit will be split Print (N_c)
7. Compile (
strpattern[, Flag])
Compiles a regular expression in the form of a string into an object.
Import"you is so cool, oo"= re.compile (R'\w*oo\w* '# compile the regular expression into a pattern object print# Find all the words that contain OO Match text with pattern, get match result, cannot match when will return none
Second, the matching syntax
| Grammar |
Description |
An instance of an expression |
Matched string |
| Character |
| General characters |
Match itself |
Abc |
Abc |
| . |
Match any character (except line break)
If flag Dotall is specified, matches any character, including line breaks |
A.c |
Akc |
| \ |
An escape character that causes the character after it to become literal |
A\.c |
A.c |
| [...] |
Character set, [a *] match character is a or *; ' [A-z] matches any character from A to Z; [^f] matches any character that is not f; [\d] matches the number. Other special characters have no special meaning except-^ \ in the character set |
[A*]k |
Ak *k |
| Quantity (used in characters or (...). After |
| * |
Match the previous character 0 or unlimited times |
|
|
| + |
Match the previous character 1 or unlimited times |
|
|
| ? |
Match a previous character 0 or 1 times |
|
|
| {m} |
Matches the previous character m times |
|
|
| {M,n} |
Matches the previous character M to n times, omitting m,0 to N; omitting n,m to infinity |
|
|
| Predefined characters (can be written in the [] character set) |
| \d |
Number: [0-9] |
|
|
| \d |
Non-numeric: [^\d] |
|
|
| \w |
Alphanumeric underline: [a-za-z0-9_] |
|
|
| \w |
[^\w] |
|
|
| Boundary matching |
|
|
|
| ^ |
Match string start |
^abc |
Abc |
$ |
Match string End |
abc$ |
Abc |
| \a |
Just match the beginning of the string &NBSP ; , &NB Sp |
\aabc |
 ABC |
| \z |
Match string only End |
abc\z |
ABC |
| \b |
matches between \w and \w; start or end of Word |
A\B!BC |
A !BC |
| \b |
[^\b] |
A\BBC |
ABC |
| logic, grouping |
| | |
| represents any one of the left and right expressions. Match from left to right, the match on the left skips right |
abc|def |
ABC DfE |
| (...) |
the enclosed expression will be grouped; Group table as a whole, followed by quantity. The | In expression is only valid in this group. does not encounter a grouped opening parenthesis from the left side of the expression, number is added 1 |
(ABC) {2} A (123|456) c |
abcabc a456c |
Third, flag
# Flags # Ignore case # assume current 8-bit locale # assume Unicode locale # Make anchors look for newline # Make dot match newline # ignore whitespace and comments
Four, R original characters
The use of "\" as an escape character in regular expressions can cause a backslash to be disturbed. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes, and then escaped in the regular expression into a backslash. The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\ \". Similarly, a "\\d" that matches a number can be written as r "\d". With the native string, you no longer have to worry about missing the backslash, and the expression is more intuitive.
Learning content from: http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
Python Learning path-Regular expressions