First, the previous simple write the RE module operation String, plus regular after re module to play a more powerful function.
First look at the common regular symbols:
Review the basic RE module:
ImportRetext='C + + python2 Python3 perl Ruby Lua Java javascript php4 php5 c'#match,search,findall,split,subRe.match (R'Java', text)#only your change, no words returned none returns a <_sre. Sre_match object; Span= (match=), ' Java ' >Re.search (R'Java', text)#find the matching characters from the beginning#<_sre. Sre_match object; Span= (match=), ' Java ' >Re.match (R'c\++', text), Re.match (R'c\+\+', text)#Same Effect#<_sre. Sre_match object; Span= (0, 3), match= ' C + + ' >Re.findall (R'python', text)#returns all of the Python#[' python ', ' python ']Re.split (R'Perl', text)#to split the center of a character#[' C + + python2 Python3 ', ' Ruby lua Java javascript php4 php5 c ']Re.sub (R'Ruby','Fortran', text)#Replace a character#' C + + python2 Python3 Perl Fortran lua Java javascript php4 php5 c '
Second, regular is commonly used
Text = ' C + + python2 Python3 perl Ruby Lua Java javascript php4 php5 c '
1 ^ Start matches starting from the beginning
Example: Re.findall (R ' ^c. ', text)
output #[' C + + ']
2. except \ n matches all characters except line break
Re.findall (R ' ^c ', text)
#[' C ')
Re.findall (R ' ^c. ', text)
#[' c+ ']
3 + 1-inf matches one or more of the same values from 1---infinity
Re.findall (R ' c+ ', text)
#[' C ', ' C ', ' C ']
Re.findall (R ' c\++ ', text)
#[' C + + ']
4 $ end matches last character
Re.findall (R ' C $ ', text)
5 [] or refers to or
Re.findall (R ' p[a-za-z]+ ', text) #匹配p字符后面是 (A-Z) lowercase characters A-Z and uppercase A-Z character #{1,} matches 1 to infinity
#[' python ', ' python ', ' Perl ', ' pt ', ' php ', ' php '
6 * 0-inf 0 to Infinity
Re.findall (R ' p[a-za-z]* ', text)
#[' python ', ' python ', ' Perl ', ' pt ', ' php ', ' php '
7? 0-1 Matching 0--1
Re.findall (R ' p[a-za-z]? ', text)
#[' py ', ' py ', ' pe ', ' PT ', ' ph ', ' P ', ' ph ', ' P ']
Re.findall (R ' p[a-za-z0-9]{3,} ', text) #{3,} refers to a match of three letters or more
#[' python2 ', ' Python3 ', ' Perl ', ' php4 ', ' php5 '
Re.findall (R ' c[a-za-z]* ', text)
#[' C ', ' cript ', ' C ']
Re.findall (R ' c[^a-za-z]* ', text) # ^ can also mean non-meaning (when the ^ sign inside the brackets) matches non-letter symbols
#[' C + + ', ' C ', ' C ']
8 | Or you can also write a | number to see the difference between him and []
Re.findall (R ' [pj][a-za-z]+ ', text) #{1,inf}
#[' python ', ' python ', ' Perl ', ' Java ', ' JavaScript ', ' php ', ' php '
| Rewrite the pattern above
Re.findall (R ' p|j[a-za-z]+ ', text) #| refers to the front or the back so you need to modify the program
#[' P ', ' P ', ' P ', ' Java ', ' JavaScript ', ' P ', ' P ', ' p ', ' P '
Re.findall (R ' p[a-za-z]+|j[a-za-z]+ ', text) #相当于 [pj][a-z][a-z] separate
Re.findall (R ' p[^0-9]+|j[a-za-z]+ ', text) #注意空格也会被匹配为非数字
#[' python ', ' python ', ' Perl ruby lua java javascript php ', ' php '
Re.findall (R ' p[^0-9]+|j[a-za-z]+ ', text)
#[' python ', ' python ', ' Perl ', ' Java ', ' JavaScript ', ' php ', ' php '
9 \w [a-za-z0-9_], \w #匹配所有的小写大写下划线 \w refers to \w's non-
Re.findall (R ' p\w+ ', text)
#[' python2 ', ' Python3 ', ' Perl ', ' pt ', ' php4 ', ' php5 ']
ten \d [0-9], \d # #匹配所有的数字 \d is \d's non-
Re.findall (R ' p\w+\d ', text)
Re.findall (R ' p\w+[0-9] ', text)
Re.findall (R ' p\w{5,9} ', text) #匹配有5--9 characters
#[' Python2 ', ' Python3 ']
\s [\t\n\r\f\v], \s# matches all whitespace characters
\b Word boundary matches the bounds of a character to what begins what ends
Re.findall (R ' \bp[^0-9] ', text)
#[' py ', ' py ', ' pe ', ' ph ', ' ph ']
Re.findall (R ' p[^0-9]\b ', text)
#[' PT ']
\b Not \b
\a input Start, ^
\z input end, $ ibid.
14 greed and non-greed
* greedy mode matches as many as possible
*? 0~inf non-greedy #非贪婪模式尽可能匹配少
+? 1~inf non-greedy #非贪婪模式尽可能匹配少
re.findall (R ' p[a-z]* ', text)
Span style= "font-size:16px" > #[' python ', ' python ', ' Perl ', ' pt ', ' php ', ' php '
re.findall (R ' p[a-z]*? ', text)
#[' P ', ' P ', ' P ', ' P ', ' P ', ' P ', ' P ', ' P ']
re.findall (R ' p[a-z]+?\b ', text)
15 Group
(? P<name>pattern)
a=re.search (R ' (p[a-za-z]+) ([0-9]) ', ' Python2 ', re. X) #re. X can not write (re.x) compile characters inside can comment
a.group (1), A.group (2)
# ' python '
# ' 2 '
a=re.search (R ' (? p<name>p[a-za-z]+) (? P<version>[0-9]) ', ' Python2 ') #以字典形式输出
a.group (' name '), A.group (' Version ')
a.groupdict ()
#{' Name ': ' Python ', ' Version ': ' 2 '}
16 mix write
results = Pattern.search (' python2 ') #带入
print (Results.groupdict ())
Results = Pattern.search (' Python3 ')
print (Results.groupdict ())
results = Pattern.search (' php4 ')
print (Results.groupdict ())
#{' name ': ' Python ', ' Version ': ' 3 '}
17 dictionary loop
text = ' C + + python2 Python3 perl Ruby Lua Java javascript php4 php5 c '
pattern = Re.compile (? p<name>p[a-za-z]+) (? P<VERSION>[0-9]) #公式
for T in Text.split ("):
results = Pattern.search (t)
If results:
print (Results.groupdict ())
#{' name ': ' Python ', ' Version ': ' 2 '}
#{' name ': ' Python ', ' Version ': ' 3 '}
#{' name ': ' php ', ' Version ': ' 4 '}
#{' name ': ' php ', ' version ' : ' 5 '}
18 compile character Re. X
A = Re.compile (r "" "\d + # integral part
\. # decimal point
\d * # Number of decimal parts
"" ", Re. X) #可以转化成一行
b = Re.compile (r "\d+\.\d*")
Common regular of Python strings