The Python module re

Last Update:2017-11-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Common Regular Expression symbols

'.'default match any character except \ n, if flag Dotall is specified, matches any character, including line break'^'Match the beginning of the character, if you specify the flags MULTILINE, this can also match on (R"^a","\nabc\neee", flags=Re. MULTILINE)'$'Matches the end of a character, or E.search ("foo$","BFOO\NSDFSF", flags=Re. MULTILINE). Group () can also'*'Matches the character preceding the * number 0 or more times, Re.findall ("ab*","Cabb3abcbbac") results are ['ABB','AB','a']'+'Matches the previous character 1 or more times, Re.findall ("ab+","Ab+cd+abb+bba"Results'AB','ABB']'?'match a previous character 1 or 0 times'{m}'matches the previous character m times'{n,m}'Matches the previous character N to M times, Re.findall ("ab{1,3}","ABB ABC abbcbbb") Results'ABB','AB','ABB']'|'Match | left or | Right character, re.search ("abc| ABC","ABCBABCCD"). Group () results'ABC''(...)'Group matching, Re.search ("(ABC) {2}A (123|456) C","abcabca456c"). Group () result abcabca456c'\a'Match only from the beginning of the character, Re.search ("\AABC","ALEXABC") is not matched to the'\z'match character end, same as $'\d'Match number 0-9'\d'match non-numeric'\w'Match [a-za-z0-9]'\w'Match non-[a-za-z0-9]'s'Match whitespace characters, \ t, \ n, \ r, Re.search ("\s+","ab\tc1\n3"). Group () results'\ t' '(? P<name>, ...)'Group Matching Re.search ("(? P<province>[0-9]{4}) (? P<city>[0-9]{2}) (? P<BIRTHDAY>[0-9]{4})","371481199306143242"). Groupdict (" City"Results'Province':'3714',' City':'Bayi','Birthday':'1993'}

The most commonly used match syntax

1 Re.match match from the beginning 2 Re.search Match contains 3 Re.findall all matching characters to the elements in the list to return 4 Re.splitall as a list separator with matched characters 5 re.sub      match characters and replace

The haunting of the backslash
As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes, and then escaped in the regular expression into a backslash. The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\ \". Similarly, a "\\d" that matches a number can be written as r "\d". With the native string, you no longer have to worry about missing the backslash, and the expression is more intuitive.

Only a few matching patterns to be known lightly

Re. I (re. IGNORECASE): Ignore case (full notation within parentheses, same as) M (MULTILINE): Multiline mode, change '^' and '$ ' ' behavior (see) S (dotall): Point any matching pattern, change '. ' The Act

1 ImportRe2 3 #s = ' Hello World '4 #Print (S.find (' ll '))5 #ret=s.replace (' ll ', ' xx ')6 #print (ret)7 #Print (S.split (' W '))8 #Ret=re.findall ("w\w{2}l", ' Hello World ')9 #print (ret)Ten #Ret=re.findall ("Alex", ' hiudfgiusiohalexlkshd ') One #print (ret) A #. Wildcard characters - #Ret=re.findall ("W.. L ", ' Hello World ') #. refers to all characters (except for a newline character only. - #print (ret) the #^ Sharp angle character - #Ret=re.findall (' ^h. O ', ' Hjasdflhello ') #只在开始位置匹配 - #print (ret) - #$ + #Ret=re.findall (' H.. o$ ', ' Hjasdflhello ') #只在结尾位置匹配 - #print (ret) + #* Repeat match range [0,+oo] A #ret= re.findall (' A.*li ', ' Husihfiosalexlihuidh ') at #print (ret) - #+: [1,+oo] - #ret= re.findall (' A.+li ', ' Husihfiosalexlihuidh ') - #print (ret) - # ? [0,1] - #ret= Re.findall (' A.? Li ', ' Husihfiosalexlihuidh ') in #print (ret) -  to #{} Self-matching several times {1,3} matches one to three times + #ret=re.findall (' a{5}b ', ' Aaaaab ') - #print (ret) the #* equals {0, positive infinity} * #+ equals {1, positive infinity} $ #? equals {0,1}Panax Notoginseng  - #Character Set the  + #[] or in the relationship [], select one, A #ret=re.findall (' a[c,d]x ', ' acx ') the #print (ret) + #Special features for canceling metacharacters (\ ^-Exceptions) - #ret=re.findall (' a[c,*]x ', ' a*x ') $ #print (ret) $ #^ put in []: Take reverse - #Ret=re.findall (' [^4,5] ', ' ysdgufi4x245df ') - #print (ret) the #\ Backslash followed by meta-character removal special function - #backslash followed by ordinary character for special functionsWuyi #\d matches any decimal number; equivalent to [0-9] the #\d matches any non-numeric character; equivalent to [^0-9] - #\s matches any whitespace character; equivalent to [\t\n\r\f\v] Wu #\s matches any non-whitespace character; equivalent to [^\t\r\f\v] - #\w matches any alphanumeric character; equivalent to [a-za-z0-9] About #\w matches any non-alphanumeric character; equivalent to [^a-za-z0-9] $ #\b matches a word boundary; it means the position between the word and the space . - #Print (Re.findall (' \d{10} ', ' 9074892365982475896245692835 ')) - #Print (Re.findall (' \sasd ', ' Fak asd ')) - #Print (Re.findall (' \w ', ' Fak asd ')) A #Print (Re.findall (R ' i\b ', ' I am a LIST ') + #match the result of the first satisfying condition the #ret=re.search (' sb ', ' SHUKDSBJFHSB ') - #print (Ret.group ()) $  the #Ret=re.findall (r "\ \", "sdyfjd\\c") the #print (ret) the  the #() | grouping - #Ret=re.search (' (AS) + ', ' Sdfghjasas '). Group () in #print (ret) the #Print (Re.search (' (AS) |3 ', ' as '). Group ()) the  About #methods of regular Expressions the #1 FindAll () All results are returned the #2 Search () returns the first object to match, and the object can call the group () the #3 Match () returns only the first object that matches to the beginning of the string, and the object can call the group () + #4 Split (' [A, b] ') first divided by A to B - #5 Sub () Three parameters the first is the original content the second one is to replace the content after the third one is replaced the #6 Compile () creates a regular expression object, adding a rule. Obj=re.compile () obj.split ( )

View Code

The Python module re

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The Python module re

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support