Python re module detailed

Source: Internet
Author: User



Match number correlation
‘.‘ Default match any character except \ n, if flag Dotall is specified, matches any character, including line break
The ' ^ ' matches the beginning of the character, and if you specify the flags MULTILINE, this can also be matched on (r "^a", "\nabc\neee", Flags=re. MULTILINE)
' $ ' matches the end of the character, or E.search ("foo$", "BFOO\NSDFSF", Flags=re. MULTILINE). Group () can also
' * ' matches the character before the * number 0 or more times, Re.findall ("ab*", "Cabb3abcbbac") results for [' ABB ', ' ab ', ' a ']
' + ' matches the previous character 1 or more times, Re.findall ("ab+", "Ab+cd+abb+bba") results [' AB ', ' ABB ']
‘?‘ Match a previous character 1 or 0 times
' {m} ' matches the previous character m times
' {n,m} ' matches the previous character N to M times, Re.findall ("ab{1,3}", "ABB ABC abbcbbb") Results ' ABB ', ' AB ', ' ABB ']
| Match | left or | Right character, re.search ("abc| ABC "," ABCBABCCD "). Group () result ' ABC '
' (...) ' Group match, Re.search ("(ABC) {2}A (123|456) C", "abcabca456c"). Group () Results abcabca456c

The ' \a ' effect and ^ are the same, only match from the beginning of the character, Re.search ("\aabc", "ALEXABC") is not matched
' \z ' matches the end of the character, same as $
' \d ' matches the number 0-9
' \d ' matches non-numeric
' \w ' match [a-za-z0-9]
' \w ' matches non-[a-za-z0-9]
' s ' matches whitespace characters, \ t, \ n, \ r, Re.search ("\s+", "Ab\tc1\n3"). Group () result ' \ t '
‘(? P<name&gt, ...) ' Group Matching Re.search (? P<province>[0-9]{4}) (? P<city>[0-9]{2}) (? P&LT;BIRTHDAY&GT;[0-9]{4}) "," 371481199306143242 "). Groupdict (" city ") result {' Province ': ' 3714 ', ' City ': ' Bayi ', ' birthday ' : ' 1993 '}
Attention:? P is a fixed syntax format



Note that there are several ways to re:
The match method is matched from the beginning of the string (with less)
Cases:
res = Re.match (' ^chen ', ' Chenronghua123 ')  syntax: pattern,string
Print (RES)
#输出: <_sre. Sre_match object; Span= (0, 4), match= ' Chen ' >
#res = Re.match (' r.+ ', ' chen123ronghua123 ')  #匹配结果为空, match starts at the beginning of the string
# res = re.search (' r.+ ', ' chen123ronghua123 ') #search search from entire text
# Print (Res.group ())
# Result: Ronghua

Commonly used in the following four kinds:
1.search is searched from the entire text, matched to a return
2.findall is search from the whole text, greedy match, if match to multiple return all, FindAll No group method
3.split Separation method
4.sub Replacement method

Only a few matching patterns to be known lightly
1.re. I (re. IGNORECASE): Ignore case (full notation in parentheses, same as below)
2.M (MULTILINE): Multiline mode, changing the behavior of ' ^ ' and ' $ ' (see) [rarely used]
3.S (dotall): Point any match mode, change '. ' The behavior




Split method:
res = Re.split (' [0-9]+ ', ' Abc12de3f45gh ')
Print (RES)
Output: [' abc ', ' De ', ' f ', ' GH ']

Sub method:
res = Re.sub (' [0-9]+ ', ' | ', ' abc12de3f45gh ', count=2)
Print (RES)
Output: Abc|de|f45gh


1.re. I (re. IGNORECASE): Ignore case
res = Re.search (' [a-z]+ ', ' abcgh ', flags=re. I)
Print (Res.group ())
Output: ABCGH

2.M (MULTILINE): Multiline mode, change the behavior of ' ^ ' and ' $ '
res = Re.search (r "^a", "\nabc\neee", Flags=re. M
Print (Res.group ())
Output: A

3.S (dotall): Point any match mode, change '. ' The behavior
res = Re.search (". +", "\nabc\neee", Flags=re. S
Print (Res.group ())
Output: A


Example:
‘.‘ Default match any character except \ n, if flag Dotall is specified, matches any character, including line break

res = Re.match ('. + ', ' chen123ronghua123 ')
Print (Res.group ())
Output:
Chen123ronghua123




' $ ' matches the end of the character, or E.search ("foo$", "BFOO\NSDFSF", Flags=re. MULTILINE). Group () can also

res = Re.match (' r.+ ', ' chen123ronghua123 ') #匹配结果为空, match starts at the beginning of the string
res = Re.search (' r.+ ', ' chen123ronghua123 ') #search search from entire text
Print (Res.group ())
Results: Ronghua



' + ' matches the previous character 1 or more times, Re.findall ("ab+", "Ab+cd+abb+bba") results [' AB ', ' ABB ']
res = Re.search (' r[a-z]+a ', ' chen123ronghua123 ') #匹配ronghua
Print (Res.group ())
Results: Ronghua

res = Re.search (' #.+# ', ' 1123#hello# ')
Print (Res.group ())
Results: #hello #



‘?‘ Match a previous character 1 or 0 times
Res0 = Re.search (' Aal? ', ' Aalex ')
Res1 = Re.search (' Aal? ', ' Aaex ')
Print (Res0.group ())
Print (Res1.group ())
Output
Ca.
Aa



' {m} ' matches the previous character m times
res = Re.search (' [0-9]{3} ', ' Aa1xe2pp345lex ') #匹配前面的数字三次
Print (Res.group ())

' {n,m} ' matches the previous character N to M times
res = Re.search (' [0-9]{1,3} ', ' Aa1xe2pp345lex ') #匹配前面的数字1到3次
Print (Res.group ())
Output 1

FindAll Greedy Match
res = Re.findall (' [0-9]{1,3} ', ' Aa1xe2pp345lex ') #findall, greedy match, matches the preceding number 1 to 3 times
Print (RES)
Output [' 1 ', ' 2 ', ' 345 '] #以列表的形式返回


| Match | left or | Right character, re.search ("abc| ABC "," ABCBABCCD "). Group () result ' ABC '

res = Re.search (' abc| ABC ', ' ABCBABCCD ')
Print (Res.group ())
Output ABC

res = Re.findall (' abc| ABC ', ' ABCBABCCD ')
Print (RES)
Output [' ABC ', ' ABC ']




' (...) ' Group match, Re.search ("(ABC) {2}A (123|456) C", "abcabca456c"). Group () Results abcabca456c

res = Re.search (' (ABC) {2} ', ' ALEXABCABC ')
Print (Res.group ())
Output ABCABC

res = Re.search (' (ABC) {2} (\|\|=) {2} ', ' alexabcabc| | =|| = ') match | | = two times, note the need to escape
Print (Res.group ())
Output: abcabc| | =|| =






' \d ' matches non-numeric
res = Re.search (' \d+ ', ' 123$-a ')
Print (Res.group ())
Output: $-A



' \w ' matches [a-za-z0-9] except for special characters

res = Re.search (' \w+ ', ' 123$-a ')
Print (Res.group ())
Output: 123

' \w ' matches non-[a-za-z0-9] matches only special characters
res = Re.search (' \w+ ', ' 123$-... a ')
Print (Res.group ())
Output: $-...



' \s ' matches whitespace characters, \ t, \ n, \ r, Re.search ("\s+", "Ab\tc1\n3"). Group () result ' \ t '
res = Re.findall (' \s ', ' 123$-\r\n\t...a ')
Print (RES)
Output: [', ' \ R ', ' \ n ', ' \ t ']

>>> re.search (' \s+ ', ' 123$-\ r \ n ')
<_sre. Sre_match object; Span= (5, 9), match= ' \t\r\n ' >




The ' \a ' effect and ^ are the same, only match from the beginning of the character, Re.search ("\aabc", "ALEXABC") is not matched
' \z ' matches the end of the character, same as $
' \d ' matches the number 0-9

Cases:
res = Re.search (' \a[0-9]+[a-z]\z ', ' 123a ')
Print (Res.group ())
Output: 123a


*: 0 to multiple
+: 1 to multiple

res = Re.match (' ^chen\d+ ', ' chen123ronghua123 ')
Print (RES)
Print (Res.group ()) #查看匹配到的对象

Output: <_sre. Sre_match object; span= (0, 7), match= ' Chen123 ' >
Chen123




‘(? P<name&gt, ...) ' Group Matching
res = Re.search (? P<province>[0-9]{4}) (? P<city>[0-9]{2}) (? P&LT;BIRTHDAY&GT;[0-9]{4}) "," 371481199306143242 "). Groupdict (" City ")
Print (RES)
Result {' Province ': ' 3714 ', ' City ': ' Bayi ', ' Birthday ': ' 1993 '}


Python re module detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.