Re module
Preface:
The RE module is used to manipulate the regular expressions of Python
'.' The default match is any character except \ n, if you specify flag Dotall, match any character, including the newline ' ^ ' match character beginning, and if you specify flags MULTILINE, this can also match (r "^a", "\nabc\neee", Flags=re. MULTILINE) ' $ ' matches the end of the character, or E.search ("foo$", "BFOO\NSDFSF", Flags=re. MULTILINE). Group () can also ' * ' match the character before the * number 0 or more times, Re.findall ("ab*", "Cabb3abcbbac") results for [' ABB ', ' ab ', ' a '] ' + ' matches the previous character 1 times or more, re. FindAll ("ab+", "Ab+cd+abb+bba") results [' AB ', ' ABB '] '? ' Match previous character 1 times or 0 times ' {m} ' match previous character M times ' {n,m} ' matches previous character N to M times, Re.findall ("ab{1,3}", "ABB ABC abbcbbb") Results ' ABB ', ' AB ', ' ABB '] ' | ' Match | left or | Right character, re.search ("abc| ABC "," ABCBABCCD "). Group () Results ' ABC ' (...) ' Group match, Re.search (" (ABC) {2}a (123|456) C "," abcabca456c "). Group () results Abcabca456c ' [A-z] ' matches a to Z any one character ' [^ ()] ' matches any character except () R ' escape quotes characters for \ Character Details view ⑦ ' \a ' only matches from the beginning of the character, Re.search ("\aabc", "Alex ABC ") is not matched to the ' \z ' match character end, the same as $ ' \d ' matches the number 0-9 ' \d ' matches the non-numeric ' \w ' match [a-za-z0-9] ' \w ' matches non-[a-za-z0-9] ' \s ' matches whitespace characters, \ t, \ n, \ R, Re.search ("\s+", "Ab\tc1\n3"). Group () result ' \ t ' (? P<name>, ...) ' Group Matching Re.search (? P<province>[0-9]{4}) (? P<city>[0-9]{2}) (?P<BIRTHDAY>[0-9]{4}) "," 371481199306143242 "). Groupdict (" city ") result {' Province ': ' 3714 ', ' City ': ' Bayi ', ' birthday ' : ' 1993 '}re. IGNORECASE ignores Case re.search (' (\a|\s) Red (\s+|$) ', I,re. IGNORECASE)
The flag bit is the pattern modifier, does not change the regular expression in the case, through the pattern modifier changes the meaning of the regular expression, so as to achieve some matching results adjustment and other functions:
# Flagsi = IGNORECASE = Sre_compile. Sre_flag_ignorecase # Ignore case matches when ignoring casing L = LOCALE = Sre_compile. Sre_flag_locale # Assume current 8-bit LOCALE makes localization recognition match u = UNICODE = Sre_compile. Sre_flag_unicode # Assume Unicode locale is based on Unicode characters and the parse character m = MULTILINE = Sre_compile. Sre_flag_multiline # Make anchors look for newline multiline match s = Dotall = Sre_compile. Sre_flag_dotall # make dot match newline let. Matches include line breaks, that is, when the mode is corrected, "." Matches can match any character x = VERBOSE = Sre_compile. Sre_flag_verbose # ignore whitespace and comments
Greedy mode, lazy mode:
Import RERESULT1 = re.search ("P.*y", "Abcdfphp435pythony_py") # Greedy mode print (RESULT1) # <_sre. Sre_match object; Span= (5, +), match= ' php435pythony_py ' >result2 = re.search ("P.*?y", "Abcdfphp435pythony_py") # Lazy Mode print (RESULT2) # <_sre. Sre_match object; Span= (5,), match= ' php435py ' >
Match
From the starting position, match the specified content to the model to the string:
#matchimport re obj = Re.match (' \d+ ', ' 123uua123sf ') #从第一个字符开始匹配一个到多个数字print (obj) #<_sre. Sre_match object; Span= (0, 3), match= ' 123 ' >if obj: #如果有匹配到字符则执行, NULL does not execute print (Obj.group ()) #打印匹配到的内容 #123
Match IP Address:
Import Reip = ' 255.255.255.253 ' result=re.match (R ' ^ ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \. ( [1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \. ' R ' ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \. ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) $ ', IP) print (result) # <_sre. Sre_match object; span= (0,), match= ' 255.255.255.253 ' >
Search
Matches the specified content (not necessarily the first position) according to the model to match the first
#searchimport reobj = Re.search (' \d+ ', ' a123uu234asf ') #从数字开始匹配一个到多个数字print (obj) #<_sre. Sre_match object; Span= (1, 4), match= ' 123 ' >if obj: #如果有匹配到字符则执行, NULL does not execute print (Obj.group ()) #打印匹配到的内容 #123import Reobj = Re.search (' \ ([^ ()]+\) ', ' Sdds (A1fwewe2 (3UUSFDSF2) 34as) F ') #匹配最里面 () the content of print (obj) #<_sre. Sre_match object; span=, Match= ' (3uusfdsf2) ' >if obj: #如果有匹配到字符则执行, NULL does not execute print (Obj.group ()) #打印匹配到的内容 # ( 3UUSFDSF2)
The difference between group and groups:
#group与groups的区别import rea = "123abc456" b = Re.search ("([0-9]*) ([a-z]*] ([0-9]*)", a) print (b) #<_sre. Sre_match object; span= (0, 9), match= ' 123abc456 ' >print (B.group ()) #123abc456print (B.group (0)) #123abc456print (B.group (1)) # 123print (B.group (2)) #abcprint (B.group (3)) #456print (B.groups ()) # (' 123 ', ' abc ', ' 456 ')
FindAll
Both of these methods are used to match single values, that is, only one of the strings can be matched, and if you want to match all eligible elements in a string, you need to use Findall;findall without group usage
#findallimport reobj = Re.findall (' \d+ ', ' a123uu234asf ') #匹配多个if obj: #如果有匹配到字符则执行, NULL does not execute print (obj ) #生成的内容为列表 #[' 123 ', ' 234 ']
Sub
Used to replace matching strings (pattern, Repl, String, count=0, flags=0)
#subimport recontent = "123abc456" new_content = re.sub (' \d+ ', ' ABC ', content) print (new_content) #ABCabcABC
Split
Grouping according to specified matches (pattern, string, maxsplit=0, flags=0)
#splitimport recontent = "1-2 * ((60-30+1* (9-2*5/3+7/3*99/4*2998+10*568/14))-( -4*3)/(16-3*2))" new_content = Re.split (' \* ', content) #用 * Split, split into List print (new_content) #[' 1-2 ', ' ((60-30+1 ', ' (9-2 ', ' 5/3+7/3 ', ' 99/4 ', ' 2998+ (' 568/14 ')-( -4 ', ' 3)/(16-3 ', ' 2) ') ']content = ' 1-2 * ((60-30+1* (9-2*5/3+7/3*99/4*2998+10*568/14))-( -4*3)/(16-3*2) ) ' "new_content = Re.split (' [\+\-\*\/]+ ', content) # new_content = Re.split (' \* ', content, 1) print (new_content) #[" ' 1 ", ' 2 ', ' ((a), ' ', ' 1 ', ' (9 ', ' 2 ', ' 5 ', ' 3 ', ' 7 ', ' 3 ', ' A ', ' 4 ', ' 2998 ', ' ten ', ' 568 ', ' + ') ', # ' (', ' 4 ', ' 3 ') ', ' (16 ', ' 3 ', ' 2 ') ' "]INPP = ' 1-2* ((60-30 + ( -40-5) * (9-2*5/3 + 7/3*99/4*2998 +10 * 568/14))-( -4*3)/(16-3*2)) ' INPP = Re.sub (' \s * ', ', INPP) #把空白字符去掉print (inpp) new_content = Re.split (' \ ([\+\-\*\/]?\d+[\+\-\*\/]?\d+) {1}\) ', INPP, 1) print ( new_content) #[' 1-2* ((60-30+ ', '-40-5 ', ' * (9-2*5/3+7/3*99/4*2998+10*568/14))-( -4*3)/(16-3*2)) ']
Supplemental R ' Escape:
fdfdsfds\fdssfdsfds& @$
lzl.py
The first thing to know is that when the program reads the \ characters in the file, it isadded to the list \ \:
Import Re,sysli = []with open (' Lzl.txt ', ' R ', encoding= "Utf-8") as file: For line in file: li.append (line) print ( LI) # Note: The single slash in the file will turn into a double slash # [' fdfdsfds\\fds\n ', ' sfdsfds& @$ ']print (li[0]) # Print when printed or single slash # fdfdsfds\ Fds
The meaning of the R character , escaping the character \ ,\ Only appears as a character :
Import Re,sysli = []with open (' Lzl.txt ', ' R ', encoding= "Utf-8") as file: For line in file: print (Re.findall (R ' s\\ F ', line)) #第一种方式匹配 # Print ("Re.findall (' \\\\ ', line)") #第二种方式匹配 li.append (line) print (LI) # Note: A single slash in the file, read out will turn into a double slash # [' s\\f ']# []# [' fdfdsfds\\fds\n ', ' sfdsfds& @$ ']
add : After reading the following code you may be more ignorant
Import rere.findall (R ' \ \ ', line) # can only be written in this way can not be written as R ' \ ' so print (R ' \ \ ') # can only be written as R ' \ ' \ can only be even number # \ \ Result # If you want to print The individual \ Write the following print (' \ \ ') # can only be a single # \ result
Summary: The single slash in the file \, read out into the program is a double slash \\,print print out is a single slash \; regular match file but slash \ When, with R ' \ \ ' Double slash to match, or do not use R directly with ' \\\\ ' four slash to match
Compile function:
Description
Python provides support for regular expressions through the RE module. The general step for using RE is to use the Re.compile () function, compile the string form of the regular expression into a pattern instance, then use the pattern instance to process the text and get the matching result (a match instance), and finally use the match instance to get the information. To do other things.
To give a simple example, look for all the English characters in a string:
Import Repattern = Re.compile (' [a-za-z] ') result = Pattern.findall (' As3siopdj#@23awe ') print (result) # [' A ', ' s ', ' s ', ' I ' , ' O ', ' P ', ' d ', ' j ', ' A ', ' w ', ' e ']
Match IP address (255.255.255.255):
Import Repattern = Re.compile (R ' ^ ([[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \.) {3} ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) $ ') result = Pattern.match (' 255.255.255.255 ') print (Result) # <_sre. Sre_match object; span= (0,), match= ' 255.255.255.255 ' >