Just read the broken

Last Update:2016-06-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The regular expression on the principle, only a little bit of stuff, is a state machine, can only be used in context-independent grammar environment.

But it is very flexible to use, those who are powerful, can play out the flowers, the efficiency of a lot.

1. Common Regular Expression symbols

Symbol	Describe	Example
Literal	The literal value of the matching text string literal	Foo
Re1\|re2	Match expression Re1 or expression Re2	Foo\|bar
.	Match any character except (\ n)	B.b
^	Matches the starting part of the string	^dear
$	Match the end of a string	/bin/*sh$
*	Matches 0 or more occurrences of the preceding regular expression	[a-za-z0-9]*
+	Matches 1 or more occurrences of the preceding regular expression	[a-z]+\.com
?	Matches a regular expression that appears before 1 or 0 times	Goo?
N	Matches regular expressions that appear before n times	[0-9] {3}
{M,n}	Matches a regular expression that appears earlier in the M~n	[0-9] {5,9}
[...]	Match any single character from a character set	[Aeiou]
[... x-y ...]	Match to x~y any single character in the range	[0-9],[a-za-z]
[^...]	does not match any one of the characters that appear in the character set, including a range of characters	[^aeiou],[^a-za-z0-9]
(*\|+\|?\| {})?	Used to match non-greedy versions that occur frequently/repeatedly above	.*? [A-z]
(...)	Match a closed regular expression, and then save as a child group	([0-9]{3})?, f (oo\|u) bar
\d	Match any decimal number	Data\d.txt
\d	does not match any decimal digits
\w	equivalent to [a-za-z0-9]	[a-za-z]\w+
\s	Match any space character [\n\t\r\v\f]	Of\sthe
\s	Do not match any whitespace characters [^\n\t\r\v\f]	\bthe\b
\b	Match any word boundary
\b	Contrary to \b
\ n	Match saved self-group n
\c	Match any special character C verbatim (that is, translation)	\. \\ \*
\a (\z)	Match start of string (end)

Extended notation
(? ilmsaux)	Embedding one or more special tag parameters (or through a function method) in a regular expression I: Do not distinguish between size matches M: Multiline text ^ $ will try to match the start and end of each line \a \z will not S: Single-line text A:ascii text U:unicode text	(? x) (? IM)
(?:...)	Represents a group that matches without saving	(?:\ W+\.) *
(? P<name>, ...)	Like a regular grouping match that is identified only by the name and not by the numeric ID	(? p<data>)
(? P=name)	Matches in the same string by the (? p<name>) The previous text of the group	(? p=data)
(?#....)	Represents a comment, all content is ignored	(? #comment)
(?=....)	The match condition is if ... Occurs at a later position instead of using an input string, which is called a positive forward View assertion	(? =.com)
(?! ....)	The match condition is if ... Does not show up after the position, instead of using the input string, called negative forward View assertion	(?!. Net
(? <= ...)	The match condition is if ... Occurs at a later location, instead of using an input string, called a positive-backward-view assertion	(? <=800-)
(?<!...)	The match condition is if ... Does not show up after the position, instead of using the input string, called negative backward view assertion	(? <!192\.168\.)
(? (Id\|name) y\| N	If the ID provided by the grouping or name exists, the condition of the regular expression is returned to match y, Returns n if it does not exist; N is an option	(? (1) y\|x)

2.re Module Core functions


Compile (pattern,flags=0)	Compiles the pattern of the regular expression with any optional markup, and then returns a regular Expression object
Match (Pattern,string,flags=0)	Attempts to match a string using a pattern with a regular expression with an optional tag, and returns a matching object if the match succeeds If it fails, return none
Search (pattern,string,flags=0)	Searches for the first occurrence of the regular expression pattern in the string using an optional tag, and returns the matching object if successful If it fails, return none
FindAll (Pattern,string[,flags])	Finds all non-repeating regular expression patterns in a string, returns a matching list
Finditer (Pattern,string,[,flags])	The same as the FindAll function, but not a list, but a ITER, For each match, the iterator returns a matching object
Split (pattern,string,max=0)	Based on the pattern delimiter of the regular expression, the Split function splits the string into a list, and then returns a list of successful matches. Max times for split most operations

Sub (pattern,repl,string,count=0)	Use REPL to replace the position of all regular expression patterns in the string, unless count is defined, otherwise Replace all occurrences of the location
Purge ()	Clear the implicit compilation of regular expressions

Group (num=0)	Returns the entire matching object or a specific subgroup numbered NUM
Groups (Default=none)	Returns a Ganso that contains all matched subgroups, and returns an empty tuple if there is no match
Groupdict (Defalut=none)	Returns a dictionary that contains all matching named groups, all sub-group nouns as keys to the dictionary

Re. I, Re. IGNORECASE	Case-insensitive matching
Re. L, Re. LOCAL	Match by \w \w \b \b \s \s based on the locale used
Re. M Re. MULTILINE	^ and $ match the start and end of the target string, instead of strictly matching the beginning and end of the entire string itself
Re. S Re. Dotall	"." (dot) usually matches any single character except \ n (line break); The token indicates "." (point number) to match all characters
Re. X Re. VERBOSE	Escaped by backslash, otherwise all spaces plus # (and all subsequent text in that line) are ignored, Unless you are in a character class or allow annotations and improve readability.

3.MatchObject Common functions:

The Re.match (Pattern,string,flag) and Re.search (Pattern,string,flag) are all matched by Matchobject. Introduction Matchobject has a more common method, feel very practical.

1.start (groupnum=0)

This function returns the position where a groupnum of a matching result begins to match.

>>> string= ' xyz123123xyz ' >>> pattern= ' (123) ' >>> M=re.search (pattern,string) >> > M.group (0) ' 123 ' >>>>>> m.group (1) ' 123 ' >>> M.start (1) # This will return the first match to the 123 starting position in the entire XYZ123123XYZ 3

2.end (), Endpos (groupnum)

This ibid, except that the return result is the end position.

3.group (num=0), groups (), Groupdict ()

These functions are used to return the portion of a matched string that is saved by an additional group, and group (0) is the result of the entire match, starting with 1 to save the portion of the grouping. Groups () returns the extra Saved grouping section

Groupdict () returns groupname:groupvalue this dictionary

>>> pattern= ' 1 ([abc]+) 3 ' >>> string= ' 1BC3 ' >>> m=re.search (pattern,string) >>> M.group (0) ' 1BC3 ' >>> m.group (1) ' BC ' >>> m.groups () (' BC ',) >>> m.groupdict () {}

Groupdict ()

>>> string ' 1BC3 ' >>> pattern=r ' 1 (? p<g1>[abc]+) 3 ' >>> m=re.search (pattern,string) >>> m<_sre. Sre_match object at 0x00bd2e20>>>> m.groupdict () {' G1 ': ' BC '}

4.expand (Stringtemplate)

such as M.group (1) ==BC

Then M.expand (r "xxxx \1 zzzzz") will return to the xxxx BC ZZZZZ, \1 Place will be replaced with the value of group (1)

Similar to this function, Re.sub (Pattern,replacement,string,flag) is replaced with replacement in a string where the pattern fits

4.re Module Common functions

Pre-compilation

Re.compile (Pattern,flags) This method is used to compile a regular expression into an internal representation to speed up the efficiency of later use.

The

Re.match (Pattern,string,flags), Re.search (pattern,string,flags) These two functions are used to match, except that match must start at the beginning of the string, and if the starting position does not match, Even if there is no match, search returns a string in which the first match succeeds.

Find

Re.findall (Pattern,string,flags), Re.finditer (pattern,string,flags) returns all matching parts, FindAll compares the cost points of memory, Finditer save a bit of memory.

Segmentation

Re.split (pattern,string), splits the string, as long as it matches the pattern of the place are sliced.

>>> string= "a12b3223d55" >>> pattern=r ' [\d]+ ' >>> s=re.split (pattern,string) >>> S[' A ', ' B ', ' d ', ']

Replace

Re.sub (Pattern,repl,string[,count,flags])

>>> string= "a12b3223d55" >>> pattern=r ' [\d]+ ' >>>>>> s=re.sub (pattern, ' hello ', String) >>> s ' A hello b hello d hello '

5. Several points that may be less used

1. Use of groups

The usual way to do this is to change the date format.

Yyyy/mm/dd Transform to Dd/mm/yyy

string= ' 2016/6/24 '

Pattern=r ' ([\d]{4})/([\d]{1,2})/([\d]{1,2}) '

A=re.match (pattern,string)

A.groups ()-----> (' 2016 ', ' 6 ', ' 24 ')

Format conversion:

re.sub (pattern,r ' \3-\2-\1 ', string)-----> ' 24-6-2016 '

2. Look forward to affirmation, want to look at the negative, backwards to see affirmative, backward see negation (jargon called "Look Around")

string= ' ABCDEFG '

Look forward: The Expression pattern= ' CD (? =ef) ' means match the CD, but the CD must follow the ' EF ' Re.search (pattern,string) back to ' CD '

Forward negative: pattern= ' CD (?! EF) ' This expression is meant to match the CD, but the CD must not be followed by ' EF ' Re.search (pattern,string) to return None

The other two do not write, about the meaning, looking at is not a bit of the compiler principle inside the grammar analysis of the LR (k) algorithm of the meaning of the K, but this is not, the regular can only be used in the lexical analysis phase. Because he doesn't track contextual information.

3. When grouping, the difference between the +* in () and the Outside ()

。。。。。。。。

Just read the broken

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Just read the broken

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Just read the broken

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support