Common Regular Expression symbols
'.'default match any character except \ n, if flag Dotall is specified, matches any character, including line break'^'Match the beginning of the character, if you specify the flags MULTILINE, this can also match on (R"^a","\nabc\neee", flags=Re. MULTILINE)'$'Matches the end of a character, or E.search ("foo$","BFOO\NSDFSF", flags=Re. MULTILINE). Group () can also'*'Matches the character preceding the * number 0 or more times, Re.findall ("ab*","Cabb3abcbbac") results are ['ABB','AB','a']'+'Matches the previous character 1 or more times, Re.findall ("ab+","Ab+cd+abb+bba"Results'AB','ABB']'?'match a previous character 1 or 0 times'{m}'matches the previous character m times'{n,m}'Matches the previous character N to M times, Re.findall ("ab{1,3}","ABB ABC abbcbbb") Results'ABB','AB','ABB']'|'Match | left or | Right character, re.search ("abc| ABC","ABCBABCCD"). Group () results'ABC''(...)'Group matching, Re.search ("(ABC) {2}A (123|456) C","abcabca456c"). Group () result abcabca456c'\a'Match only from the beginning of the character, Re.search ("\AABC","ALEXABC") is not matched to the'\z'match character end, same as $'\d'Match number 0-9'\d'match non-numeric'\w'Match [a-za-z0-9]'\w'Match non-[a-za-z0-9]'s'Match whitespace characters, \ t, \ n, \ r, Re.search ("\s+","ab\tc1\n3"). Group () results'\ t' '(? P<name>, ...)'Group Matching Re.search ("(? P<province>[0-9]{4}) (? P<city>[0-9]{2}) (? P<BIRTHDAY>[0-9]{4})","371481199306143242"). Groupdict (" City"Results'Province':'3714',' City':'Bayi','Birthday':'1993'}
The most commonly used match syntax
1 Re.match match from the beginning 2 Re.search Match contains 3 Re.findall all matching characters to the elements in the list to return 4 Re.splitall as a list separator with matched characters 5 re.sub match characters and replace
The haunting of the backslash
As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes, and then escaped in the regular expression into a backslash. The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\ \". Similarly, a "\\d" that matches a number can be written as r "\d". With the native string, you no longer have to worry about missing the backslash, and the expression is more intuitive.
Only a few matching patterns to be known lightly
Re. I (re. IGNORECASE): Ignore case (full notation within parentheses, same as) M (MULTILINE): Multiline mode, change '^' and '$ ' ' behavior (see) S (dotall): Point any matching pattern, change '. ' The Act
1 ImportRe2 3 #s = ' Hello World '4 #Print (S.find (' ll '))5 #ret=s.replace (' ll ', ' xx ')6 #print (ret)7 #Print (S.split (' W '))8 #Ret=re.findall ("w\w{2}l", ' Hello World ')9 #print (ret)Ten #Ret=re.findall ("Alex", ' hiudfgiusiohalexlkshd ') One #print (ret) A #. Wildcard characters - #Ret=re.findall ("W.. L ", ' Hello World ') #. refers to all characters (except for a newline character only. - #print (ret) the #^ Sharp angle character - #Ret=re.findall (' ^h. O ', ' Hjasdflhello ') #只在开始位置匹配 - #print (ret) - #$ + #Ret=re.findall (' H.. o$ ', ' Hjasdflhello ') #只在结尾位置匹配 - #print (ret) + #* Repeat match range [0,+oo] A #ret= re.findall (' A.*li ', ' Husihfiosalexlihuidh ') at #print (ret) - #+: [1,+oo] - #ret= re.findall (' A.+li ', ' Husihfiosalexlihuidh ') - #print (ret) - # ? [0,1] - #ret= Re.findall (' A.? Li ', ' Husihfiosalexlihuidh ') in #print (ret) - to #{} Self-matching several times {1,3} matches one to three times + #ret=re.findall (' a{5}b ', ' Aaaaab ') - #print (ret) the #* equals {0, positive infinity} * #+ equals {1, positive infinity} $ #? equals {0,1}Panax Notoginseng - #Character Set the + #[] or in the relationship [], select one, A #ret=re.findall (' a[c,d]x ', ' acx ') the #print (ret) + #Special features for canceling metacharacters (\ ^-Exceptions) - #ret=re.findall (' a[c,*]x ', ' a*x ') $ #print (ret) $ #^ put in []: Take reverse - #Ret=re.findall (' [^4,5] ', ' ysdgufi4x245df ') - #print (ret) the #\ Backslash followed by meta-character removal special function - #backslash followed by ordinary character for special functionsWuyi #\d matches any decimal number; equivalent to [0-9] the #\d matches any non-numeric character; equivalent to [^0-9] - #\s matches any whitespace character; equivalent to [\t\n\r\f\v] Wu #\s matches any non-whitespace character; equivalent to [^\t\r\f\v] - #\w matches any alphanumeric character; equivalent to [a-za-z0-9] About #\w matches any non-alphanumeric character; equivalent to [^a-za-z0-9] $ #\b matches a word boundary; it means the position between the word and the space . - #Print (Re.findall (' \d{10} ', ' 9074892365982475896245692835 ')) - #Print (Re.findall (' \sasd ', ' Fak asd ')) - #Print (Re.findall (' \w ', ' Fak asd ')) A #Print (Re.findall (R ' i\b ', ' I am a LIST ') + #match the result of the first satisfying condition the #ret=re.search (' sb ', ' SHUKDSBJFHSB ') - #print (Ret.group ()) $ the #Ret=re.findall (r "\ \", "sdyfjd\\c") the #print (ret) the the #() | grouping - #Ret=re.search (' (AS) + ', ' Sdfghjasas '). Group () in #print (ret) the #Print (Re.search (' (AS) |3 ', ' as '). Group ()) the About #methods of regular Expressions the #1 FindAll () All results are returned the #2 Search () returns the first object to match, and the object can call the group () the #3 Match () returns only the first object that matches to the beginning of the string, and the object can call the group () + #4 Split (' [A, b] ') first divided by A to B - #5 Sub () Three parameters the first is the original content the second one is to replace the content after the third one is replaced the #6 Compile () creates a regular expression object, adding a rule. Obj=re.compile () obj.split ( )View Code
The Python module re