Just read the broken

Source: Internet
Author: User

The regular expression on the principle, only a little bit of stuff, is a state machine, can only be used in context-independent grammar environment.

But it is very flexible to use, those who are powerful, can play out the flowers, the efficiency of a lot.

1. Common Regular Expression symbols

Symbol Describe Example
Literal The literal value of the matching text string literal Foo
Re1|re2 Match expression Re1 or expression Re2 Foo|bar
. Match any character except (\ n) B.b
^ Matches the starting part of the string ^dear
$ Match the end of a string /bin/*sh$
* Matches 0 or more occurrences of the preceding regular expression [a-za-z0-9]*
+ Matches 1 or more occurrences of the preceding regular expression [a-z]+\.com
? Matches a regular expression that appears before 1 or 0 times Goo?
N Matches regular expressions that appear before n times [0-9] {3}
{M,n} Matches a regular expression that appears earlier in the M~n [0-9] {5,9}
[...] Match any single character from a character set [Aeiou]
[... x-y ...] Match to x~y any single character in the range [0-9],[a-za-z]
[^...] does not match any one of the characters that appear in the character set, including a range of characters [^aeiou],[^a-za-z0-9]
(*|+|?| {})? Used to match non-greedy versions that occur frequently/repeatedly above .*? [A-z]
(...) Match a closed regular expression, and then save as a child group ([0-9]{3})?, f (oo|u) bar
\d Match any decimal number Data\d.txt
\d does not match any decimal digits
\w equivalent to [a-za-z0-9] [a-za-z]\w+
\s Match any space character [\n\t\r\v\f] Of\sthe
\s Do not match any whitespace characters [^\n\t\r\v\f] \bthe\b
\b Match any word boundary
\b Contrary to \b
\ n Match saved self-group n
\c Match any special character C verbatim (that is, translation) \. \\ \*
\a (\z) Match start of string (end)
Extended notation
(? ilmsaux)

Embedding one or more special tag parameters (or through a function method) in a regular expression

I: Do not distinguish between size matches

M: Multiline text ^ $ will try to match the start and end of each line \a \z will not

S: Single-line text

A:ascii text

U:unicode text

(? x) (? IM)
(?:...) Represents a group that matches without saving (?:\ W+\.) *
(? P<name>, ...) Like a regular grouping match that is identified only by the name and not by the numeric ID (? p<data>)
(? P=name) Matches in the same string by the (? p<name>) The previous text of the group (? p=data)
(?#....) Represents a comment, all content is ignored (? #comment)
(?=....) The match condition is if ... Occurs at a later position instead of using an input string, which is called a positive forward View assertion (? =.com)
(?! ....) The match condition is if ... Does not show up after the position, instead of using the input string, called negative forward View assertion (?!. Net
(? <= ...) The match condition is if ... Occurs at a later location, instead of using an input string, called a positive-backward-view assertion (? <=800-)
(?<!...) The match condition is if ... Does not show up after the position, instead of using the input string, called negative backward view assertion (? <!192\.168\.)
(? (Id|name) y| N

If the ID provided by the grouping or name exists, the condition of the regular expression is returned to match y,

Returns n if it does not exist; N is an option

(? (1) y|x)

2.re Module Core functions

Compile (pattern,flags=0) Compiles the pattern of the regular expression with any optional markup, and then returns a regular Expression object
Match (Pattern,string,flags=0)

Attempts to match a string using a pattern with a regular expression with an optional tag, and returns a matching object if the match succeeds

If it fails, return none

Search (pattern,string,flags=0)

Searches for the first occurrence of the regular expression pattern in the string using an optional tag, and returns the matching object if successful

If it fails, return none

FindAll (Pattern,string[,flags]) Finds all non-repeating regular expression patterns in a string, returns a matching list
Finditer (Pattern,string,[,flags])

The same as the FindAll function, but not a list, but a ITER,

For each match, the iterator returns a matching object

Split (pattern,string,max=0)

Based on the pattern delimiter of the regular expression, the Split function splits the string into a list, and then returns a list of successful matches.

Max times for split most operations

Sub (pattern,repl,string,count=0)

Use REPL to replace the position of all regular expression patterns in the string, unless count is defined, otherwise

Replace all occurrences of the location

Purge () Clear the implicit compilation of regular expressions
Group (num=0) Returns the entire matching object or a specific subgroup numbered NUM
Groups (Default=none) Returns a Ganso that contains all matched subgroups, and returns an empty tuple if there is no match
Groupdict (Defalut=none) Returns a dictionary that contains all matching named groups, all sub-group nouns as keys to the dictionary
Re. I, Re. IGNORECASE Case-insensitive matching
Re. L, Re. LOCAL Match by \w \w \b \b \s \s based on the locale used
Re. M Re. MULTILINE ^ and $ match the start and end of the target string, instead of strictly matching the beginning and end of the entire string itself
Re. S Re. Dotall

"." (dot) usually matches any single character except \ n (line break);

The token indicates "." (point number) to match all characters

Re. X Re. VERBOSE

Escaped by backslash, otherwise all spaces plus # (and all subsequent text in that line) are ignored,

Unless you are in a character class or allow annotations and improve readability.

3.MatchObject Common functions:

The Re.match (Pattern,string,flag) and Re.search (Pattern,string,flag) are all matched by Matchobject. Introduction Matchobject has a more common method, feel very practical.

1.start (groupnum=0)

This function returns the position where a groupnum of a matching result begins to match.

>>> string= ' xyz123123xyz ' >>> pattern= ' (123) ' >>> M=re.search (pattern,string) >> > M.group (0) ' 123 ' >>>>>> m.group (1) ' 123 ' >>> M.start (1) # This will return the first match to the 123 starting position in the entire XYZ123123XYZ 3

 

2.end (), Endpos (groupnum)

This ibid, except that the return result is the end position.

3.group (num=0), groups (), Groupdict ()

These functions are used to return the portion of a matched string that is saved by an additional group, and group (0) is the result of the entire match, starting with 1 to save the portion of the grouping. Groups () returns the extra Saved grouping section

Groupdict () returns groupname:groupvalue this dictionary

>>> pattern= ' 1 ([abc]+) 3 ' >>> string= ' 1BC3 ' >>> m=re.search (pattern,string) >>> M.group (0) ' 1BC3 ' >>> m.group (1) ' BC ' >>> m.groups () (' BC ',) >>> m.groupdict () {}

Groupdict ()

>>> string ' 1BC3 ' >>> pattern=r ' 1 (? p<g1>[abc]+) 3 ' >>> m=re.search (pattern,string) >>> m<_sre. Sre_match object at 0x00bd2e20>>>> m.groupdict () {' G1 ': ' BC '}

  

4.expand (Stringtemplate)

such as M.group (1) ==BC

Then M.expand (r "xxxx \1 zzzzz") will return to the xxxx BC ZZZZZ, \1 Place will be replaced with the value of group (1)

Similar to this function, Re.sub (Pattern,replacement,string,flag) is replaced with replacement in a string where the pattern fits

4.re Module Common functions

Pre-compilation

Re.compile (Pattern,flags) This method is used to compile a regular expression into an internal representation to speed up the efficiency of later use.

The

Re.match (Pattern,string,flags), Re.search (pattern,string,flags) These two functions are used to match, except that match must start at the beginning of the string, and if the starting position does not match, Even if there is no match, search returns a string in which the first match succeeds.

Find

Re.findall (Pattern,string,flags), Re.finditer (pattern,string,flags) returns all matching parts, FindAll compares the cost points of memory, Finditer save a bit of memory.

Segmentation

Re.split (pattern,string), splits the string, as long as it matches the pattern of the place are sliced.

>>> string= "a12b3223d55" >>> pattern=r ' [\d]+ ' >>> s=re.split (pattern,string) >>> S[' A ', ' B ', ' d ', ']

Replace

Re.sub (Pattern,repl,string[,count,flags])

>>> string= "a12b3223d55" >>> pattern=r ' [\d]+ ' >>>>>> s=re.sub (pattern, ' hello ', String) >>> s ' A hello b hello d hello '

5. Several points that may be less used

1. Use of groups

The usual way to do this is to change the date format.

Yyyy/mm/dd Transform to Dd/mm/yyy

string= ' 2016/6/24 '

Pattern=r ' ([\d]{4})/([\d]{1,2})/([\d]{1,2}) '

A=re.match (pattern,string)

A.groups ()-----> (' 2016 ', ' 6 ', ' 24 ')

Format conversion:

re.sub (pattern,r ' \3-\2-\1 ', string)-----> ' 24-6-2016 '

2. Look forward to affirmation, want to look at the negative, backwards to see affirmative, backward see negation (jargon called "Look Around")

string= ' ABCDEFG '

Look forward: The Expression pattern= ' CD (? =ef) ' means match the CD, but the CD must follow the ' EF ' Re.search (pattern,string) back to ' CD '

Forward negative: pattern= ' CD (?! EF) ' This expression is meant to match the CD, but the CD must not be followed by ' EF ' Re.search (pattern,string) to return None

The other two do not write, about the meaning, looking at is not a bit of the compiler principle inside the grammar analysis of the LR (k) algorithm of the meaning of the K, but this is not, the regular can only be used in the lexical analysis phase. Because he doesn't track contextual information.

3. When grouping, the difference between the +* in () and the Outside ()

。。。。。。。。

Just read the broken

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.