(4) RegEx, RegEx, and RegEx

Last Update:2016-05-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Content of this chapter:

Decorator
Re Regular Expression

Decorator

The decorator is a well-known design model and is often used in scenarios with cut-plane requirements. It is more classic in terms of log insertion, performance testing, and transaction processing. The decorator is an excellent design for solving such problems. With the decorator, we can extract a large number of identical codes irrelevant to the function itself and continue to reuse them. In summary, the purpose of the decorator is to add additional functions to existing objects.

First define a basic decorator:

########## Basic decorator ######### def orter (func): # define the decorator def inner (): print ("This is inner before. ") s = func () # Call the original input parameter function to execute print (" This is inner after. ") return s # return Original function return inner # return inner function to name function @ orter # Call decorator (pass function name as parameter to orter decorator) def name (): print ("This is name. ") return True # name original function return True ret = name () print (ret) output result: This is inner before. this is name. this is inner after. true

Pass parameters to the decorator:

########### Parameters uploaded by the decorator ########### def orter (func): def inner (a, B): # receive two input parameters print ("This is inner before. ") s = func (a, B) # receives the input two original function parameters print (" This is inner after. ") return s return inner @ orterdef name (a, B): # receives two input parameters, and name the overall function. When the parameter is passed into the orter modifier print (" This is name. % s, % s "% (a, B) return Trueret = name ('Nick ', 'jenny') # input two parameters print (ret) to output the result: this is inner before. this is name. nick, jennyThis is inner after. true

Upload the following universal parameters to the decorator:

########## Omnipotent parameter decorator ########## def orter (func): def inner (* args, ** kwargs): # print ("This is inner before. ") s = func (* args, ** kwargs) # The omnipotent parameter receives multiple print (" This is inner after. ") return s return inner @ orterdef name (a, B, c, k1 = 'Nick '): # accept multiple input parameters print (" This is name. % s, % s "% (a, B) return Trueret = name ('Nick ', 'jenny', 'car') print (ret) output result: this is inner before. this is name. nick, jennyThis is inner after. true

The method for applying multiple decorators to a function is as follows:

########## Multiple decorators for a function application ######### def orter (func): def inner (* args, ** kwargs): print ("This is inner one before. ") print (" This is inner one before angin. ") s = func (* args, ** kwargs) print (" This is inner one after. ") print (" This is inner one after angin. ") return s return innerdef orter_2 (func): def inner (* args, ** kwargs): print (" This is inner two before. ") print (" This is inner two before angin. ") s = func (* args, ** kwargs) print (" This is inner two after. ") print (" This is inner two after angin. ") return s return inner @ orter # pass the following functions as a parameter to the orter modifier @ orter_2 # pass the following functions as a parameter to the orter_2 modifier def name (a, B, c, k1 = 'Nick '): print ("This is name. % s and % s. "% (a, B) return Trueret = name ('Nick ', 'jenny', 'car') print (ret) output result: This is inner one before. this is inner one before angin. this is inner two before. this is inner two before angin. this is name. nick and jenny. this is inner two after. this is inner two after angin. this is inner one after. this is inner one after angin. true

Regular Expression

Regular Expressions are powerful tools for matching strings. They are also used in other programming languages. In essence, a regular expression (or RE) is a small, highly specialized programming language (in Python) embedded in Python and implemented through the re module. The regular expression mode is compiled into a series of bytecode and then executed by the matching engine written in C.

# Import re module import res = 'Nick jenny nice '# matching method (1) B = re. match (r 'Nick ', s) q = B. group () print (q) # matching method (2) # generate a Pattern object instance, and r indicates matching the source string a = re. compile (r 'Nick ') print (type (a) # <class' _ sre. SRE_Pattern '> B =. match (s) print (B) # <_ sre. SRE_Match object; span = (0, 4), match = 'Nick '> q = B. group () print (q) # Put the matched string in string print (B. string) # nick jenny nice # Put the string to be matched in re print (B. re) # re. compile ('Nick ')

The difference between the two matching methods is: the first abbreviation is that the matching formula must be compiled once for each matching, the second method is to compile (parse the matching formula) the format to be matched in advance, so that the matching format does not need to be compiled when matching again.

Matching rules:

.	"." Match any character (except \ n)
\	"\" Escape Character
[...]	"[...]" Matching Character Set


#". "Match any character (except \ n) a = re. match (r ". "," 95 nick ") B =. group () print (B) # [...] matching Character Set a = re. match (r "[a-zA-Z0-9]", "123 Nick") B =. group () print (B)

\d	Matches any decimal number; it is equivalent to the class [0-9]
\D	Matches any non-numeric character; it is equivalent to the class [^ 0-9]
\s	Matches any blank character. It is equivalent to the class [\ t \ n \ r \ f \ v]
\S	Matches any non-blank characters. It is equivalent to the class [^ \ t \ n \ r \ f \ v]
\w	Matches any alphanumeric character; it is equivalent to a class [a-zA-Z0-9]
\W	Matches any non-alphanumeric character; it is equivalent to the class [^ a-zA-Z0-9]

# \ D \ D Match Number/non-number a = re. match (r "\ D", "nick") B =. group () print (B) # \ s \ S match blank/non-blank character a = re. match (r "\ s", "") B =. group () print (B) # \ w \ W match word character [a-zA-Z0-9]/non-word character a = re. match (r "\ w", "123 Nick") B =. group () print (B) a = re. match (r "\ W", "+-*/") B =. group () print (B)

*	"*" Matches the first character 0 times or unlimited times
+	"+" Matches the first character once or infinitely
?	"? "Match a character 0 times or 1 time
{M} {m, n}	{M} {m, n} matches the previous character m times or m to n times
*? +? ??	*? +? ?? The matching mode changes to non-Greedy (as few strings as possible)

# "*" Matches the first character 0 times or unlimited times. a = re. match (r "[A-Z] [a-z] *", "Aaaaaa123") # Can only match A, 123 won't match B =. group () print (B) # "+" matches the first character once or infinitely. a = re. match (r "[_ a-zA-Z] +", "nick") B =. group () print (B) # "?" Match a character 0 times or 1 time a = re. match (r "[0-8]? [0-9] "," 95 ") # (0-8) No matching on 9b =. group () print (B) # {m} {m, n} matches the previous character m times or m to n times a = re. match (r "[\ w] {6, 10} @ qq.com", "630571017@qq.com") B =. group () print (B )#*? +? ?? The matching mode changes to non-Greedy (as few strings as possible) a = re. match (r "[0-9] [a-z] *? "," 9 nick ") B = a. group () print (B) a = re. match (r" [0-9] [a-z] +? "," 9 nick ") B = a. group () print (B)

^	"^" Matches the start of a string. In multiline mode, the start of each line is matched.
$	"$" Matches the end of a string. In multiline mode, the end of each row is matched.
\A	\ A only matches the start of A string
\ Z	\ Z only matches the end of a string
\ B	\ B matches a word boundary, that is, the position between a word and a space.

# "^" Matches the start of a string. In multiline mode, the start of each row is matched. Li = "nick \ nnjenny \ nsuo" a = re. search ("^ s. * ", li, re. m) B =. group () print (B) # "$" matches the end of a string. In multiline mode, the end of each row is matched. Li = "nick \ njenny \ nnick" a = re. search (". * y $ ", li, re. m) B =. group () print (B) # \ A only matches the start of string li = "nickjennyk" a = re. findall (r "\ Anick", li) print (a) # \ Z only matches the end of the string li = "nickjennyk" a = re. findall (r "nick \ Z", li) print (a) # \ B matches a word boundary, that is, the position a = re between a word and a space. search (r "\ bnick \ B", "jenny nick car") B =. group () print (B)

\|	"\|" Matches any expression between the left and right
ab	(AB) the expression in parentheses is used as a group.
\<number>	\ <Number> reference the string matched by the group numbered num
(?P<key>vlaue)	(? P <key> vlaue) matches a dictionary and can be used as an alias for vlaue removal.
(?P=name)	(? P = name) refers to the group matching string with the alias as name.

# "|" Matches any expression a = re. match (r "nick | jenny", "jenny") B =. group () print (B) # (AB) the expression in parentheses acts as a group a = re. match (r "[\ w] {6, 10} @ (qq | 163 ). com "," 630571017@qq.com ") B =. group () print (B) # \ <number> reference the string a = re. match (r "<([\ w] +>) [\ w] + </\ 1", "<book> nick </book>") B =. group () print (B )#(? P <key> vlace) matched output dictionary li = 'Nick jenny nnnk 'a = re. match ("(? P <k1> n )(? P <k2> \ w + ).*(? P <k3> n \ w +) ", li) print (. groupdict () output result: {'k2': 'ick', 'k1 ': 'n', 'k3': 'nk '}#(? P <name>) groups an alias #(? P = name) the group matching string a = re. match (r "<(? P <jenny> [\ w] +>) [\ w] + </(? P = jenny) "," <book> nick </book> ") B = a. group () print (B)

Module Method Introduction:

match	Match from scratch
search	Match the entire string until a match is found.
findall	Find the matching and return the list of all Matching Parts
Finditer	Returns an iterator.
Sub	Replace the part of the string that matches the regular expression with another value.
Split	Returns a list of string shards.

######## Module method ######### match from the ground up # search matches the entire string until a match is found # findall finds a match, returns the list of all matched parts # findall brackets li = 'Nick jenny nick car girer' r = re. findall ('n' \ w + ', li) print (r) # output result: ['Nick', 'nny ', 'Nick'] r = re. findall ('(n \ w +)', li) print (r) # output result: ['Nick ', 'nny', 'Nick '] r = re. findall ('n' (\ w +) ', li) print (r) # output result: ['ick', 'ny', 'ick'] r = re. findall ('(n) (\ w +) (k)', li) print (r) # output result: [('n', 'ic ', 'k '), ('n', 'ic ', 'k')] r = re. findall ('(n) (\ w +) (c) (k)', li) print (r) # output result: [('n ', 'ic ',' I ', 'C', 'k'), ('n', 'ic', 'I', 'C', 'k')] # finditer returns an iterator. Like findall, li = 'Nick jenny nnnk 'a = re. finditer (r 'n' \ w + ', li) for I in a: print (I. group () # sub replaces the part matching the regular expression in the string with another value li = 'this is 95 'a = re. sub (r "\ d +", "100", li) print (a) li = "nick njenny ncar ngirl" a = re. compile (r "\ bn") B =. sub ('cool ', li, 3) # print (B) after parameter replacement # output result: # coolick cooljenny coolcar ngirl # split String Based on matching, returns the list composed of split strings: li = 'Nick, suo jenny: nice card' a = re. split (r ": |,", li) # Or | print (a) li = 'nick1jenny2car3girl5 'a = re. compile (r "\ d") B =. split (li) print (B) # output result: # ['Nick ', 'jenny', 'car', 'girl ', ''] # Pay Attention to the empty elements behind

group()	Returns the string matched by the RE.
groups()	Returns a tuple containing all group strings in a regular expression, from 1 to the group number contained in the regular expression.
groupdict()	Return (? Dictionary defined by P <key> vlace)
start()	Returns the starting position of the match.
end()	Returns the position at which the match ends.
span()	Returns the index location of a tuples that contain a match (START, end ).


Li = 'Nick jenny nnnk 'a = re. match ("n \ w +", li) print (. group () a = re. match ("(n) (\ w +)", li) print (. groups () a = re. match ("(? P <k1> n )(? P <k2> \ w + ).*(? P <k3> n \ w +) ", li) print (. groupdict () ------------------------------------------------- import rea = "123abc456" re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (0) #123abc456, returns the overall re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (1) #123 re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (2) # abc re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (3) #456 group (1) lists the Matching Parts of the first parentheses, group (2) lists the Matching Parts of the second parentheses, and group (3) lists the matching parts of the third parentheses. -----------------------------------------------

re.I	Make matching case insensitive
re.L	Perform locale-aware matching
re.M	Multi-row matching, affecting ^ and $
re.S	Make. Match All characters including line breaks
re.U	Parses Characters Based on the Unicode Character Set. This flag affects \ w, \ W, \ B, \ B.
re.X	Comment, will affect space (invalid)

# Re. I make the matching case insensitive. a = re. search (r "nick", "NIck", re. i) print (. group () # re. L perform locale-aware matching # re. U parses characters according to the Unicode Character Set. This flag affects \ w, \ W, \ B, \ B. # re. S :. the line break will match. The default value is. the comma does not match the linefeed a = re. findall (r ". "," nick \ njenny ", re. s) print (a) output result: ['n', 'I', 'C', 'k', '\ n', 'J', 'E ', 'N', 'n', 'y'] # re. m: ^ $ will match each line. By default, ^ will match only the first line that matches the regular expression; by default, $ matches only the last line n = "12 drummers drumming, 11 pipers piping, 10 lords a-leaping" p = re. compile ("^ \ d +") p_multi = re. compile ("^ \ d +", re. m) print (re. findall (p, n) print (re. findall (p_multi, n ))

Common regular columns:

Matching mobile phone number:

# Match the mobile phone number phone_num = '000000' a = re. compile (r "^ 1 [\ d +] {10}") B =. match (phone_num) print (B. group ())

Match IPv4:

# Matching ip address = '192. 168.1.1 'a = re. compile (r "(1? [0-9]? [0-9]) | (2 [0-4] [0-9]) | (25 [0-5]) \.) {3} (1? [0-9]? [0-9]) | (2 [0-4] [0-9]) | (25 [0-5]) $ ") B =. search (ip) print (B)

Matching email:

# Matching emailemail = '2017 @ qq.com 'a = re. compile (r "(. *) {0, 26} @ (\ w +) {0, 20 }. (\ w +) {0, 8} ") B =. search (email) print (B. group ())

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

(4) RegEx, RegEx, and RegEx

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

(4) RegEx, RegEx, and RegEx

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support