(4) RegEx, RegEx, and RegEx
Content of this chapter:
- Decorator
- Re Regular Expression
The decorator is a well-known design model and is often used in scenarios with cut-plane requirements. It is more classic in terms of log insertion, performance testing, and transaction processing. The decorator is an excellent design for solving such problems. With the decorator, we can extract a large number of identical codes irrelevant to the function itself and continue to reuse them. In summary, the purpose of the decorator is to add additional functions to existing objects.
First define a basic decorator:
########## Basic decorator ######### def orter (func): # define the decorator def inner (): print ("This is inner before. ") s = func () # Call the original input parameter function to execute print (" This is inner after. ") return s # return Original function return inner # return inner function to name function @ orter # Call decorator (pass function name as parameter to orter decorator) def name (): print ("This is name. ") return True # name original function return True ret = name () print (ret) output result: This is inner before. this is name. this is inner after. true
Pass parameters to the decorator:
########### Parameters uploaded by the decorator ########### def orter (func): def inner (a, B): # receive two input parameters print ("This is inner before. ") s = func (a, B) # receives the input two original function parameters print (" This is inner after. ") return s return inner @ orterdef name (a, B): # receives two input parameters, and name the overall function. When the parameter is passed into the orter modifier print (" This is name. % s, % s "% (a, B) return Trueret = name ('Nick ', 'jenny') # input two parameters print (ret) to output the result: this is inner before. this is name. nick, jennyThis is inner after. true
Upload the following universal parameters to the decorator:
########## Omnipotent parameter decorator ########## def orter (func): def inner (* args, ** kwargs): # print ("This is inner before. ") s = func (* args, ** kwargs) # The omnipotent parameter receives multiple print (" This is inner after. ") return s return inner @ orterdef name (a, B, c, k1 = 'Nick '): # accept multiple input parameters print (" This is name. % s, % s "% (a, B) return Trueret = name ('Nick ', 'jenny', 'car') print (ret) output result: this is inner before. this is name. nick, jennyThis is inner after. true
The method for applying multiple decorators to a function is as follows:
########## Multiple decorators for a function application ######### def orter (func): def inner (* args, ** kwargs): print ("This is inner one before. ") print (" This is inner one before angin. ") s = func (* args, ** kwargs) print (" This is inner one after. ") print (" This is inner one after angin. ") return s return innerdef orter_2 (func): def inner (* args, ** kwargs): print (" This is inner two before. ") print (" This is inner two before angin. ") s = func (* args, ** kwargs) print (" This is inner two after. ") print (" This is inner two after angin. ") return s return inner @ orter # pass the following functions as a parameter to the orter modifier @ orter_2 # pass the following functions as a parameter to the orter_2 modifier def name (a, B, c, k1 = 'Nick '): print ("This is name. % s and % s. "% (a, B) return Trueret = name ('Nick ', 'jenny', 'car') print (ret) output result: This is inner one before. this is inner one before angin. this is inner two before. this is inner two before angin. this is name. nick and jenny. this is inner two after. this is inner two after angin. this is inner one after. this is inner one after angin. true
Regular Expressions are powerful tools for matching strings. They are also used in other programming languages. In essence, a regular expression (or RE) is a small, highly specialized programming language (in Python) embedded in Python and implemented through the re module. The regular expression mode is compiled into a series of bytecode and then executed by the matching engine written in C.
# Import re module import res = 'Nick jenny nice '# matching method (1) B = re. match (r 'Nick ', s) q = B. group () print (q) # matching method (2) # generate a Pattern object instance, and r indicates matching the source string a = re. compile (r 'Nick ') print (type (a) # <class' _ sre. SRE_Pattern '> B =. match (s) print (B) # <_ sre. SRE_Match object; span = (0, 4), match = 'Nick '> q = B. group () print (q) # Put the matched string in string print (B. string) # nick jenny nice # Put the string to be matched in re print (B. re) # re. compile ('Nick ')
The difference between the two matching methods is: the first abbreviation is that the matching formula must be compiled once for each matching, the second method is to compile (parse the matching formula) the format to be matched in advance, so that the matching format does not need to be compiled when matching again.
Matching rules:
. |
"." Match any character (except \ n) |
\ |
"\" Escape Character |
[...] |
"[...]" Matching Character Set |
#". "Match any character (except \ n) a = re. match (r ". "," 95 nick ") B =. group () print (B) # [...] matching Character Set a = re. match (r "[a-zA-Z0-9]", "123 Nick") B =. group () print (B)
\d |
Matches any decimal number; it is equivalent to the class [0-9] |
\D |
Matches any non-numeric character; it is equivalent to the class [^ 0-9] |
\s |
Matches any blank character. It is equivalent to the class [\ t \ n \ r \ f \ v] |
\S |
Matches any non-blank characters. It is equivalent to the class [^ \ t \ n \ r \ f \ v] |
\w |
Matches any alphanumeric character; it is equivalent to a class [a-zA-Z0-9] |
\W |
Matches any non-alphanumeric character; it is equivalent to the class [^ a-zA-Z0-9] |
# \ D \ D Match Number/non-number a = re. match (r "\ D", "nick") B =. group () print (B) # \ s \ S match blank/non-blank character a = re. match (r "\ s", "") B =. group () print (B) # \ w \ W match word character [a-zA-Z0-9]/non-word character a = re. match (r "\ w", "123 Nick") B =. group () print (B) a = re. match (r "\ W", "+-*/") B =. group () print (B)
* |
"*" Matches the first character 0 times or unlimited times |
+ |
"+" Matches the first character once or infinitely |
? |
"? "Match a character 0 times or 1 time |
{M} {m, n} |
{M} {m, n} matches the previous character m times or m to n times |
*? +? ?? |
*? +? ?? The matching mode changes to non-Greedy (as few strings as possible) |
# "*" Matches the first character 0 times or unlimited times. a = re. match (r "[A-Z] [a-z] *", "Aaaaaa123") # Can only match A, 123 won't match B =. group () print (B) # "+" matches the first character once or infinitely. a = re. match (r "[_ a-zA-Z] +", "nick") B =. group () print (B) # "?" Match a character 0 times or 1 time a = re. match (r "[0-8]? [0-9] "," 95 ") # (0-8) No matching on 9b =. group () print (B) # {m} {m, n} matches the previous character m times or m to n times a = re. match (r "[\ w] {6, 10} @ qq.com", "630571017@qq.com") B =. group () print (B )#*? +? ?? The matching mode changes to non-Greedy (as few strings as possible) a = re. match (r "[0-9] [a-z] *? "," 9 nick ") B = a. group () print (B) a = re. match (r" [0-9] [a-z] +? "," 9 nick ") B = a. group () print (B)
^ |
"^" Matches the start of a string. In multiline mode, the start of each line is matched. |
$ |
"$" Matches the end of a string. In multiline mode, the end of each row is matched. |
\A |
\ A only matches the start of A string |
\ Z |
\ Z only matches the end of a string |
\ B |
\ B matches a word boundary, that is, the position between a word and a space. |
# "^" Matches the start of a string. In multiline mode, the start of each row is matched. Li = "nick \ nnjenny \ nsuo" a = re. search ("^ s. * ", li, re. m) B =. group () print (B) # "$" matches the end of a string. In multiline mode, the end of each row is matched. Li = "nick \ njenny \ nnick" a = re. search (". * y $ ", li, re. m) B =. group () print (B) # \ A only matches the start of string li = "nickjennyk" a = re. findall (r "\ Anick", li) print (a) # \ Z only matches the end of the string li = "nickjennyk" a = re. findall (r "nick \ Z", li) print (a) # \ B matches a word boundary, that is, the position a = re between a word and a space. search (r "\ bnick \ B", "jenny nick car") B =. group () print (B)
| |
"|" Matches any expression between the left and right |
ab |
(AB) the expression in parentheses is used as a group. |
\<number> |
\ <Number> reference the string matched by the group numbered num |
(?P<key>vlaue) |
(? P <key> vlaue) matches a dictionary and can be used as an alias for vlaue removal. |
(?P=name) |
(? P = name) refers to the group matching string with the alias as name. |
# "|" Matches any expression a = re. match (r "nick | jenny", "jenny") B =. group () print (B) # (AB) the expression in parentheses acts as a group a = re. match (r "[\ w] {6, 10} @ (qq | 163 ). com "," 630571017@qq.com ") B =. group () print (B) # \ <number> reference the string a = re. match (r "<([\ w] +>) [\ w] + </\ 1", "<book> nick </book>") B =. group () print (B )#(? P <key> vlace) matched output dictionary li = 'Nick jenny nnnk 'a = re. match ("(? P <k1> n )(? P <k2> \ w + ).*(? P <k3> n \ w +) ", li) print (. groupdict () output result: {'k2': 'ick', 'k1 ': 'n', 'k3': 'nk '}#(? P <name>) groups an alias #(? P = name) the group matching string a = re. match (r "<(? P <jenny> [\ w] +>) [\ w] + </(? P = jenny) "," <book> nick </book> ") B = a. group () print (B)
Module Method Introduction:
match |
Match from scratch |
search |
Match the entire string until a match is found.
|
findall
|
Find the matching and return the list of all Matching Parts |
Finditer |
Returns an iterator. |
Sub |
Replace the part of the string that matches the regular expression with another value. |
Split |
Returns a list of string shards. |
######## Module method ######### match from the ground up # search matches the entire string until a match is found # findall finds a match, returns the list of all matched parts # findall brackets li = 'Nick jenny nick car girer' r = re. findall ('n' \ w + ', li) print (r) # output result: ['Nick', 'nny ', 'Nick'] r = re. findall ('(n \ w +)', li) print (r) # output result: ['Nick ', 'nny', 'Nick '] r = re. findall ('n' (\ w +) ', li) print (r) # output result: ['ick', 'ny', 'ick'] r = re. findall ('(n) (\ w +) (k)', li) print (r) # output result: [('n', 'ic ', 'k '), ('n', 'ic ', 'k')] r = re. findall ('(n) (\ w +) (c) (k)', li) print (r) # output result: [('n ', 'ic ',' I ', 'C', 'k'), ('n', 'ic', 'I', 'C', 'k')] # finditer returns an iterator. Like findall, li = 'Nick jenny nnnk 'a = re. finditer (r 'n' \ w + ', li) for I in a: print (I. group () # sub replaces the part matching the regular expression in the string with another value li = 'this is 95 'a = re. sub (r "\ d +", "100", li) print (a) li = "nick njenny ncar ngirl" a = re. compile (r "\ bn") B =. sub ('cool ', li, 3) # print (B) after parameter replacement # output result: # coolick cooljenny coolcar ngirl # split String Based on matching, returns the list composed of split strings: li = 'Nick, suo jenny: nice card' a = re. split (r ": |,", li) # Or | print (a) li = 'nick1jenny2car3girl5 'a = re. compile (r "\ d") B =. split (li) print (B) # output result: # ['Nick ', 'jenny', 'car', 'girl ', ''] # Pay Attention to the empty elements behind
group() |
Returns the string matched by the RE. |
groups() |
Returns a tuple containing all group strings in a regular expression, from 1 to the group number contained in the regular expression. |
groupdict() |
Return (? Dictionary defined by P <key> vlace) |
start() |
Returns the starting position of the match. |
end() |
Returns the position at which the match ends. |
span() |
Returns the index location of a tuples that contain a match (START, end ). |
Li = 'Nick jenny nnnk 'a = re. match ("n \ w +", li) print (. group () a = re. match ("(n) (\ w +)", li) print (. groups () a = re. match ("(? P <k1> n )(? P <k2> \ w + ).*(? P <k3> n \ w +) ", li) print (. groupdict () ------------------------------------------------- import rea = "123abc456" re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (0) #123abc456, returns the overall re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (1) #123 re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (2) # abc re. search ("([0-9] *) ([a-z] *) ([0-9] *)", ). group (3) #456 group (1) lists the Matching Parts of the first parentheses, group (2) lists the Matching Parts of the second parentheses, and group (3) lists the matching parts of the third parentheses. -----------------------------------------------
re.I |
Make matching case insensitive |
re.L |
Perform locale-aware matching |
re.M |
Multi-row matching, affecting ^ and $ |
re.S |
Make. Match All characters including line breaks |
re.U |
Parses Characters Based on the Unicode Character Set. This flag affects \ w, \ W, \ B, \ B. |
re.X |
Comment, will affect space (invalid) |
# Re. I make the matching case insensitive. a = re. search (r "nick", "NIck", re. i) print (. group () # re. L perform locale-aware matching # re. U parses characters according to the Unicode Character Set. This flag affects \ w, \ W, \ B, \ B. # re. S :. the line break will match. The default value is. the comma does not match the linefeed a = re. findall (r ". "," nick \ njenny ", re. s) print (a) output result: ['n', 'I', 'C', 'k', '\ n', 'J', 'E ', 'N', 'n', 'y'] # re. m: ^ $ will match each line. By default, ^ will match only the first line that matches the regular expression; by default, $ matches only the last line n = "12 drummers drumming, 11 pipers piping, 10 lords a-leaping" p = re. compile ("^ \ d +") p_multi = re. compile ("^ \ d +", re. m) print (re. findall (p, n) print (re. findall (p_multi, n ))
Common regular columns:
Matching mobile phone number:
# Match the mobile phone number phone_num = '000000' a = re. compile (r "^ 1 [\ d +] {10}") B =. match (phone_num) print (B. group ())
Match IPv4:
# Matching ip address = '192. 168.1.1 'a = re. compile (r "(1? [0-9]? [0-9]) | (2 [0-4] [0-9]) | (25 [0-5]) \.) {3} (1? [0-9]? [0-9]) | (2 [0-4] [0-9]) | (25 [0-5]) $ ") B =. search (ip) print (B)
Matching email:
# Matching emailemail = '2017 @ qq.com 'a = re. compile (r "(. *) {0, 26} @ (\ w +) {0, 20 }. (\ w +) {0, 8} ") B =. search (email) print (B. group ())