Official website
module RE
{
The RE module provides an interface to the regular expression engine that allows you to compile restring on objects and use them for matching
}
Question: The string before the function of R, the backslash will not be any special processing
{
Compile Flag-flages
Dotall[s] to match all characters including line breaks
Ignorecase[i] To make the match case insensitive
LOCALE[L], do localization identification (locale-aware) match French, etc.
Multiline[m], multiline match, impact ^ and $
VARBOSE[X], the ability to use the verbose state of RES to make it easier to understand
}
1.[...] Used to represent a set of characters, listed separately: [IO] matches ' i ', or ' o '
import res=‘tip top‘r=r‘t[io]p‘a=re.findall(r,s)print a
[...] If the match is a range, this write [0-9a-za-b] can represent 0 to 9 and a to Z and a to B
2.^⑴ characters not in []: [^io] matches characters other than i,o
import res=‘tip top‘r=r‘t[^io]p‘a=re.findall(r,s)print a
⑵ matches the beginning of a string
import re s=‘ahello hello‘ r=r‘^hello‘ a=re.findall(r,s) print a
3.$ matches the following string
import res=‘hello hello‘r=r‘hello$‘a=re.findall(r,s)print a
4.[...^ ...]"Besides writing at the beginning of the place"and [.. or [ ...] or [... $] this has no effect and will only be treated as a generic character.
5. If the match string ^abc, there will be a problem, because ^ is a special character
import res=‘^abc ^abc ^abc‘r=r‘^abc‘a=re.findall(r,s)print a
The above problem can be resolved by an escape character
import res=‘^abc ^abc ^abc‘r=r‘\^abc‘a=re.findall(r,s)print a
3.9
-The backslash arrangement can add different character utilises to express different special meanings
-can also be used to get out of the sludge without dyeing all the meta-characters: [or \
⑴\d matches any decimal number, which is equivalent to class [0-9]
⑵\D matches any non-numeric character, which is equivalent to class [^0-9]
⑶\s matches any whitespace character, which is equivalent to class [\t\n\r\f\v]
⑷\w matches any alphanumeric character, which is equivalent to a class [a-za-z0-9]
⑸\w matches any non-alphanumeric character, which is equivalent to class [^a-za-z0-9]
7. Duplicate questions
If you match a phone number, for the number, you re-use \d is very troublesome, how to solve
import ren=‘18829789854‘r=r‘^1\d\d\d\d\d\d\d\d\d\d‘print re.findall(r,n)
Workaround: Use {Number},number to indicate the number of repetitions
import ren=‘18829789854‘r=r‘^1\d{10}$‘ #对于电话号码前面和后面不能现出现别的字符,所以使用^和$ print re.findall(r,n)
8.* specifies that the previous character can be matched 0 or more times instead of only once. Match engine tries to repeat as many times as I can (no more than integer office range, 2 billion)
import urllib,reh=urllib.urlopen(‘http://www.baidu.com‘)s=h.read()r=‘www.\w*.com‘ #两个.之间想匹配多个字母或数字li=re.findall(r,s)forin li: print i
9.+ the difference between this and * is that at least once, that is, match the preceding character (the number of times >=1), all for the URL match, you should use the + sign
import urllib,reh=urllib.urlopen(‘http://www.baidu.com‘)s=h.read()r=‘www.\w+.com‘ #两个.之间不会出现0次li=re.findall(r,s)forin li: print i
10.? ⑴ indicates that the preceding symbol repeats once or 0 times
⑵ represents the minimum number of matches, if you want to get the shortest match of AB
import res=‘abbbbbbb‘r=r‘ab+‘print re.findall(r,s)
Shortest:
import res=‘abbbbbbb‘r=r‘ab+?‘print re.findall(r,s)
There is also a role:
import rer=r‘(?:\d{1}){2}‘ #让组里面的内容连着再次s=‘fas14214jiojoi2412‘str_re=re.compile(r)print str_re.findall(s)
11. Use {M,n} if you want to match the previous symbol to a certain range
import res=‘abbbb‘r=r‘ab{1,3}‘ #表示b的次数出现1次到3次,包括1和3print re.findall(r,s)
12. If your regular expression, repeated use of the pre-compilation, so as to improve the efficiency of operation, the code is as follows
import rer=re.compile(r‘ab‘#把正则表达式提前编译好print r.findall(字符串)
13. If the match, if you want to match the case, you can compile it.
- Re.compile (regular expression, re. I) #re. I means ignoring case
14. Methods
- Match () determines whether the re matches the starting position of the string
- Search () scan string to find the first position of this re match #前两个返回的是一个对象, if you want to see the results
import res=‘ aab‘r=re.compile(r‘ab‘)t=r.search(s)print t.group()
Goups () group
import rer=r‘(\d)\w*(\d)‘s=‘fas14214jiojoi2412‘str_re=re.compile(r)str_object=str_re.search(s)print str_object.groups() #以元组输出分组的结果
- FindAll () finds all the substrings that the re matches and returns them as a list
- Finditer () finds all the substrings that the re matches and returns them as an iterator #返回一个迭代器对象
15.MatchObject (Match object)
- Group () returns a string that is matched by re
- Start () returns the position where the match started
- End () returns the position of the end of the match
- Span () returns a tuple containing the position of the match (start, end)
16. What if you want to replace a string that matches a regular expression?
import res=‘ aab‘r=re.compile(r‘ab‘)t=r.sub(‘AB‘,s) #‘AB‘是把符合正则表达式的字符换掉print t
There is also a function, SUBN () returns a string, and returns the number of changes
17. Want to split the content that matches the regular expression
import res=‘ 1+2-3*4/5‘r=re.compile(r‘[\+\-\*/]‘) #因为+-*都是特殊字符所有要加上\t=r.split(s)print
. (dot) matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.
19. If a match is encountered, it will not be considered an escape character when it encounters \ n and other escape characters, all at compile time.
Re.compile (regular expression, re. S
import rer=re.compile(r‘.net‘,re.S)print r.findall(‘\nnet‘)
19. If you are dealing with multiple lines of string, you can use it to re. M, for example: for file processing
import res=‘‘‘ababcabcd‘‘‘r=re.compile(r‘^a‘,re.M)print r.findall(s)
18. For regular expressions, when writing on multiple lines, use re. X
import rern=r‘‘‘a‘‘‘s=‘a‘r=re.compile(rn,re.X)print r.findall(s)
19. Grouping, selecting for string fragments, using (... | ... | ....)
Example 1:
import res=‘www.baidu.cn‘r=re.compile(r‘www\.\w+\.(com|cn)‘,re.X)print r.match(s)
If you use FindAll () to match, the following behavior occurs
import res=‘www.baidu.cn‘r=re.compile(r‘www\.\w+\.(com|cn)‘,re.X)print r.findall(s)
Results:
[' CN ']
Returns the contents of a group
Example 2:
import res=‘name=1 name=2‘r=re.compile(r‘name=(\d)‘,re.X)print r.findall(s)
Results:
[' 1 ', ' 2 ']
Only get the contents of the grouping, if there is no grouping symbol, it also returns the Name= etc.
Python detailing regular expressions (explaining commonly used key characters)