The regular expression on the principle, only a little bit of stuff, is a state machine, can only be used in context-independent grammar environment.
But it is very flexible to use, those who are powerful, can play out the flowers, the efficiency of a lot.
1. Common Regular Expression symbols
Symbol |
Describe |
Example |
Literal |
The literal value of the matching text string literal |
Foo |
Re1|re2 |
Match expression Re1 or expression Re2 |
Foo|bar |
. |
Match any character except (\ n) |
B.b |
^ |
Matches the starting part of the string |
^dear |
$ |
Match the end of a string |
/bin/*sh$ |
* |
Matches 0 or more occurrences of the preceding regular expression |
[a-za-z0-9]* |
+ |
Matches 1 or more occurrences of the preceding regular expression |
[a-z]+\.com |
? |
Matches a regular expression that appears before 1 or 0 times |
Goo? |
N |
Matches regular expressions that appear before n times |
[0-9] {3} |
{M,n} |
Matches a regular expression that appears earlier in the M~n |
[0-9] {5,9} |
[...] |
Match any single character from a character set |
[Aeiou] |
[... x-y ...] |
Match to x~y any single character in the range |
[0-9],[a-za-z] |
[^...] |
does not match any one of the characters that appear in the character set, including a range of characters |
[^aeiou],[^a-za-z0-9] |
(*|+|?| {})? |
Used to match non-greedy versions that occur frequently/repeatedly above |
.*? [A-z] |
(...) |
Match a closed regular expression, and then save as a child group |
([0-9]{3})?, f (oo|u) bar |
\d |
Match any decimal number |
Data\d.txt |
\d |
does not match any decimal digits |
|
\w |
equivalent to [a-za-z0-9] |
[a-za-z]\w+ |
\s |
Match any space character [\n\t\r\v\f] |
Of\sthe |
\s |
Do not match any whitespace characters [^\n\t\r\v\f] |
\bthe\b |
\b |
Match any word boundary |
|
\b |
Contrary to \b |
|
\ n |
Match saved self-group n |
|
\c |
Match any special character C verbatim (that is, translation) |
\. \\ \* |
\a (\z) |
Match start of string (end) |
|
|
|
|
Extended notation |
|
|
(? ilmsaux) |
Embedding one or more special tag parameters (or through a function method) in a regular expression I: Do not distinguish between size matches M: Multiline text ^ $ will try to match the start and end of each line \a \z will not S: Single-line text A:ascii text U:unicode text |
(? x) (? IM) |
(?:...) |
Represents a group that matches without saving |
(?:\ W+\.) * |
(? P<name>, ...) |
Like a regular grouping match that is identified only by the name and not by the numeric ID |
(? p<data>) |
(? P=name) |
Matches in the same string by the (? p<name>) The previous text of the group |
(? p=data) |
(?#....) |
Represents a comment, all content is ignored |
(? #comment) |
(?=....) |
The match condition is if ... Occurs at a later position instead of using an input string, which is called a positive forward View assertion |
(? =.com) |
(?! ....) |
The match condition is if ... Does not show up after the position, instead of using the input string, called negative forward View assertion |
(?!. Net |
(? <= ...) |
The match condition is if ... Occurs at a later location, instead of using an input string, called a positive-backward-view assertion |
(? <=800-) |
(?<!...) |
The match condition is if ... Does not show up after the position, instead of using the input string, called negative backward view assertion |
(? <!192\.168\.) |
(? (Id|name) y| N |
If the ID provided by the grouping or name exists, the condition of the regular expression is returned to match y, Returns n if it does not exist; N is an option |
(? (1) y|x) |
2.re Module Core functions
|
|
Compile (pattern,flags=0) |
Compiles the pattern of the regular expression with any optional markup, and then returns a regular Expression object |
Match (Pattern,string,flags=0) |
Attempts to match a string using a pattern with a regular expression with an optional tag, and returns a matching object if the match succeeds If it fails, return none |
Search (pattern,string,flags=0) |
Searches for the first occurrence of the regular expression pattern in the string using an optional tag, and returns the matching object if successful If it fails, return none |
FindAll (Pattern,string[,flags]) |
Finds all non-repeating regular expression patterns in a string, returns a matching list |
Finditer (Pattern,string,[,flags]) |
The same as the FindAll function, but not a list, but a ITER, For each match, the iterator returns a matching object |
Split (pattern,string,max=0) |
Based on the pattern delimiter of the regular expression, the Split function splits the string into a list, and then returns a list of successful matches. Max times for split most operations |
|
|
Sub (pattern,repl,string,count=0) |
Use REPL to replace the position of all regular expression patterns in the string, unless count is defined, otherwise Replace all occurrences of the location |
Purge () |
Clear the implicit compilation of regular expressions |
|
|
Group (num=0) |
Returns the entire matching object or a specific subgroup numbered NUM |
Groups (Default=none) |
Returns a Ganso that contains all matched subgroups, and returns an empty tuple if there is no match |
Groupdict (Defalut=none) |
Returns a dictionary that contains all matching named groups, all sub-group nouns as keys to the dictionary |
|
|
Re. I, Re. IGNORECASE |
Case-insensitive matching |
Re. L, Re. LOCAL |
Match by \w \w \b \b \s \s based on the locale used |
Re. M Re. MULTILINE |
^ and $ match the start and end of the target string, instead of strictly matching the beginning and end of the entire string itself |
Re. S Re. Dotall |
"." (dot) usually matches any single character except \ n (line break); The token indicates "." (point number) to match all characters |
Re. X Re. VERBOSE |
Escaped by backslash, otherwise all spaces plus # (and all subsequent text in that line) are ignored, Unless you are in a character class or allow annotations and improve readability. |
|
|
3.MatchObject Common functions:
The Re.match (Pattern,string,flag) and Re.search (Pattern,string,flag) are all matched by Matchobject. Introduction Matchobject has a more common method, feel very practical.
1.start (groupnum=0)
This function returns the position where a groupnum of a matching result begins to match.
>>> string= ' xyz123123xyz ' >>> pattern= ' (123) ' >>> M=re.search (pattern,string) >> > M.group (0) ' 123 ' >>>>>> m.group (1) ' 123 ' >>> M.start (1) # This will return the first match to the 123 starting position in the entire XYZ123123XYZ 3
2.end (), Endpos (groupnum)
This ibid, except that the return result is the end position.
3.group (num=0), groups (), Groupdict ()
These functions are used to return the portion of a matched string that is saved by an additional group, and group (0) is the result of the entire match, starting with 1 to save the portion of the grouping. Groups () returns the extra Saved grouping section
Groupdict () returns groupname:groupvalue this dictionary
>>> pattern= ' 1 ([abc]+) 3 ' >>> string= ' 1BC3 ' >>> m=re.search (pattern,string) >>> M.group (0) ' 1BC3 ' >>> m.group (1) ' BC ' >>> m.groups () (' BC ',) >>> m.groupdict () {}
Groupdict ()
>>> string ' 1BC3 ' >>> pattern=r ' 1 (? p<g1>[abc]+) 3 ' >>> m=re.search (pattern,string) >>> m<_sre. Sre_match object at 0x00bd2e20>>>> m.groupdict () {' G1 ': ' BC '}
4.expand (Stringtemplate)
such as M.group (1) ==BC
Then M.expand (r "xxxx \1 zzzzz") will return to the xxxx BC ZZZZZ, \1 Place will be replaced with the value of group (1)
Similar to this function, Re.sub (Pattern,replacement,string,flag) is replaced with replacement in a string where the pattern fits
4.re Module Common functions
Pre-compilation
Re.compile (Pattern,flags) This method is used to compile a regular expression into an internal representation to speed up the efficiency of later use.
The
Re.match (Pattern,string,flags), Re.search (pattern,string,flags) These two functions are used to match, except that match must start at the beginning of the string, and if the starting position does not match, Even if there is no match, search returns a string in which the first match succeeds.
Find
Re.findall (Pattern,string,flags), Re.finditer (pattern,string,flags) returns all matching parts, FindAll compares the cost points of memory, Finditer save a bit of memory.
Segmentation
Re.split (pattern,string), splits the string, as long as it matches the pattern of the place are sliced.
>>> string= "a12b3223d55" >>> pattern=r ' [\d]+ ' >>> s=re.split (pattern,string) >>> S[' A ', ' B ', ' d ', ']
Replace
Re.sub (Pattern,repl,string[,count,flags])
>>> string= "a12b3223d55" >>> pattern=r ' [\d]+ ' >>>>>> s=re.sub (pattern, ' hello ', String) >>> s ' A hello b hello d hello '
5. Several points that may be less used
1. Use of groups
The usual way to do this is to change the date format.
Yyyy/mm/dd Transform to Dd/mm/yyy
string= ' 2016/6/24 '
Pattern=r ' ([\d]{4})/([\d]{1,2})/([\d]{1,2}) '
A=re.match (pattern,string)
A.groups ()-----> (' 2016 ', ' 6 ', ' 24 ')
Format conversion:
re.sub (pattern,r ' \3-\2-\1 ', string)-----> ' 24-6-2016 '
2. Look forward to affirmation, want to look at the negative, backwards to see affirmative, backward see negation (jargon called "Look Around")
string= ' ABCDEFG '
Look forward: The Expression pattern= ' CD (? =ef) ' means match the CD, but the CD must follow the ' EF ' Re.search (pattern,string) back to ' CD '
Forward negative: pattern= ' CD (?! EF) ' This expression is meant to match the CD, but the CD must not be followed by ' EF ' Re.search (pattern,string) to return None
The other two do not write, about the meaning, looking at is not a bit of the compiler principle inside the grammar analysis of the LR (k) algorithm of the meaning of the K, but this is not, the regular can only be used in the lexical analysis phase. Because he doesn't track contextual information.
3. When grouping, the difference between the +* in () and the Outside ()
。。。。。。。。
Just read the broken