Python expert path [5] python-based regular expressions and python Regular Expressions
Lists the Python-supported regular expression metacharacters and syntaxes:
Character point: match any character
import rest = 'python'result = re.findall('p.t',st)print(result)
Character ^: Start with a match
import rest = 'python'result = re.findall('^py',st)print(result)
Character $: End of a match
import rest = 'python'result = re.findall('n$',st)print(result)
Character *: match any time, including 0
Import rest = 'I looooooove python' result = re. findall ('lo * Ve', st) # The character 0 can be absent, and there can be no more than one character. Both can match print (result)
Character +: match once or multiple times
Import rest = 'I looooooove python' result = re. findall ('lo + Ve', st) # print (result) cannot be matched if the character 0 does not exist)
Character? : Match 0 times or once
Import rest = 'I love python' result = re. findall ('lo? Ve ', st) # No character 0 can match print (result)
{M}: match the m times of the previous character
Import rest = 'I loooove python' result = re. findall ('o {3}', st) # match 3 o characters print (result)
{M, n}: match the m-n times of the previous character
import rest = 'I loooove python'result = re.findall('lo{1,4}ve',st)print(result)
[Abc] or [a-c]: match any character in []
import rest = 'I loooove python'result = re.findall('l[0-z]*e',st)print(result)
[A | B]: Match character a or character B
import rest = 'I lbve python'result = re.findall('l[a|b]ve',st)print(result)
[^ 1-9]: [] contains the ^ character, indicating non-meaning, not starting with what
import rest = 'I lb2ve python6'result = re.findall('[^0-9]',st)print(result)##########################################['I', ' ', 'l', 'b', 'v', 'e', ' ', 'p', 'y', 't', 'h', 'o', 'n']
\:
- Special features for removing backslash followed by metacharacters
- Special functions are implemented by backslash followed by common characters
- String matched by the word group corresponding to the reference serial number
Greedy and non-Greedy modes of quantifiers
Regular Expressions are usually used to search for matched strings in the text. In Python, quantifiers are greedy by default (in a few languages, they may also be non-Greedy by default), and always try to match as many characters as possible; in non-greedy, the opposite is true, always try to match as few characters as possible. For example, if the regular expression "AB *" is used to find "abbbc", "abbb" is found ". If we use a non-Greedy quantizer "AB *? "," A "is found ".
import reresult = re.findall(r'ab*','abbbc')print(result)##########################################['abbb']
Import reresult = re. findall (r' AB *? ', 'Abbbc') # cancel greedy mode print (result) ######################################## # ['a']
Re. match () match from scratch
Import reorigin = "hello poe bcd jet who are you 20" r = re. match ("h \ w +", origin) print (r. group () # obtain all matched results print (r. groups () # obtain the matching grouping result print (r. groupdict ()) # obtain the group results matching in the model ############################### ########## hello () {}
R = re. match ("(h) (\ w +)", origin) print (r. group () # obtain all matched results print (r. groups () # obtain the matching grouping result print (r. groupdict ()) # obtain the group results matching in the model ############################### ########## hello ('h ', 'Ello '){}
R = re. match ("(? P <n1> h )(? P <n2> \ w +) ", origin )#? P <n1>: Use the key as n1 and the value as the matched group and save it to the dictionary !? P <> This is a fixed method of print (r. group () # obtain all matched results print (r. groups () # obtain the matching grouping result print (r. groupdict ()) # obtain the group results matching in the model ############################### ########## hello ('h ', 'Ello ') {'n2': 'Ello ', 'n1': 'H '}
Re. search () Browse all strings and match the first matching string
Similar to re. match (),
Import reorigin = "hello poe bcd jet poe who are you 20" r = re. search ("p (\ w + ).*(? P <name> \ d) $ ", origin )#? P <n1>: Use the key as n1 and the value as the matched group and save it to the dictionary !? P <> This is a fixed method of print (r. group () # obtain all matched results print (r. groups () # obtain the matching grouping result print (r. groupdict ()) # obtain the group results matching in the model ############################### ########## poe bcd jet poe who are you 20 ('oe ', '0') {'name': '0 '}
Re. findall () puts all matched content in a list
Note: Empty match will also be saved to the result, for example:
result = re.findall("","a2b3c4d5")print(result)print(len(result))##########################################['', '', '', '', '', '', '', '', '']
The re. findall () method must be grouped:
# If no group exists, r = re. findall ("p \ w +", origin) print (r) ######################################## # ['poe ', 'poe ']
# If a group exists, the matched group will be placed in the result list r = re. findall ("p (\ w +)", origin) print (r) ######################################## # ['oe ', 'oe ']
Re. finditer ()
import reorigin = "hello poe bcd jet poe who are you 20"r = re.finditer("(p)(\w+(e))",origin)for i in r : print(i.group()) print(i.groups()) print(i.groupdict())
Re. split ()
If no group exists, the matched string will not appear in the matching result:
import reorigin = "hello poe bcd jet poe who are you 20"r = re.split("a\w+",origin,1)print(r)##########################################['hello poe bcd jet poe who ', ' you 20']
If a group exists, the matched group string will also appear in the matching result:
import reorigin = "hello poe bcd jet poe who are you 20"r = re.split("a(\w+)",origin,1)print(r)##########################################['hello poe bcd jet poe who ', 're', ' you 20']
Re. sub () Regular Expression replacement
Import reorigin = "1yiuoosfd234kuiuadf789v, xznfa978" new_str = re. sub ("\ d +", "KKK", origin, 1) # parameter 1 indicates that only the first matched string is replaced, if it is 2, replace the first two matched strings print (new_str) ######################################## # KKKyiuoosfdKKKkuiuadf789v, xznfa978
Re. subn () Only returns one more data than re. sub (), for example:
Import reorigin = "1yiuoosfd234kuiuadf789v, xznfa978" new_str, count = re. subn ("\ d +", "KKK", origin) # parameter 1 indicates that only the first matched string print (new_str, count) is replaced) ######################################## # kkkyiuoosfdkkkkkuiuadfkkkv, xznfaKKK 4
This 4 indicates that the replacement matches four times.