Python detailing regular expressions (explaining commonly used key characters)

Source: Internet
Author: User
Tags expression engine

Official website

module RE
{
The RE module provides an interface to the regular expression engine that allows you to compile restring on objects and use them for matching
}
Question: The string before the function of R, the backslash will not be any special processing
{
Compile Flag-flages
Dotall[s] to match all characters including line breaks
Ignorecase[i] To make the match case insensitive
LOCALE[L], do localization identification (locale-aware) match French, etc.
Multiline[m], multiline match, impact ^ and $
VARBOSE[X], the ability to use the verbose state of RES to make it easier to understand
}

1.[...] Used to represent a set of characters, listed separately: [IO] matches ' i ', or ' o '

import res=‘tip top‘r=r‘t[io]p‘a=re.findall(r,s)print a

[...] If the match is a range, this write [0-9a-za-b] can represent 0 to 9 and a to Z and a to B

2.^⑴ characters not in []: [^io] matches characters other than i,o

import res=‘tip top‘r=r‘t[^io]p‘a=re.findall(r,s)print a

⑵ matches the beginning of a string

    import re    s=‘ahello hello‘    r=r‘^hello‘    a=re.findall(r,s)    print a

3.$ matches the following string

import res=‘hello hello‘r=r‘hello$‘a=re.findall(r,s)print a

4.[...^ ...]"Besides writing at the beginning of the place"and [.. or [ ...] or [... $] this has no effect and will only be treated as a generic character.

5. If the match string ^abc, there will be a problem, because ^ is a special character

import res=‘^abc ^abc ^abc‘r=r‘^abc‘a=re.findall(r,s)print a

The above problem can be resolved by an escape character

import res=‘^abc ^abc ^abc‘r=r‘\^abc‘a=re.findall(r,s)print a

3.9
-The backslash arrangement can add different character utilises to express different special meanings
-can also be used to get out of the sludge without dyeing all the meta-characters: [or \
⑴\d matches any decimal number, which is equivalent to class [0-9]
⑵\D matches any non-numeric character, which is equivalent to class [^0-9]
⑶\s matches any whitespace character, which is equivalent to class [\t\n\r\f\v]
⑷\w matches any alphanumeric character, which is equivalent to a class [a-za-z0-9]
⑸\w matches any non-alphanumeric character, which is equivalent to class [^a-za-z0-9]

7. Duplicate questions
If you match a phone number, for the number, you re-use \d is very troublesome, how to solve

import ren=‘18829789854‘r=r‘^1\d\d\d\d\d\d\d\d\d\d‘print re.findall(r,n)

Workaround: Use {Number},number to indicate the number of repetitions

import ren=‘18829789854‘r=r‘^1\d{10}$‘   #对于电话号码前面和后面不能现出现别的字符,所以使用^和$   print re.findall(r,n)   

8.* specifies that the previous character can be matched 0 or more times instead of only once. Match engine tries to repeat as many times as I can (no more than integer office range, 2 billion)

import urllib,reh=urllib.urlopen(‘http://www.baidu.com‘)s=h.read()r=‘www.\w*.com‘  #两个.之间想匹配多个字母或数字li=re.findall(r,s)forin li:    print i

9.+ the difference between this and * is that at least once, that is, match the preceding character (the number of times >=1), all for the URL match, you should use the + sign

import urllib,reh=urllib.urlopen(‘http://www.baidu.com‘)s=h.read()r=‘www.\w+.com‘  #两个.之间不会出现0次li=re.findall(r,s)forin li:    print i

10.? ⑴ indicates that the preceding symbol repeats once or 0 times
⑵ represents the minimum number of matches, if you want to get the shortest match of AB

import res=‘abbbbbbb‘r=r‘ab+‘print re.findall(r,s)       

Shortest:

import res=‘abbbbbbb‘r=r‘ab+?‘print re.findall(r,s)       

There is also a role:

import rer=r‘(?:\d{1}){2}‘       #让组里面的内容连着再次s=‘fas14214jiojoi2412‘str_re=re.compile(r)print str_re.findall(s)

11. Use {M,n} if you want to match the previous symbol to a certain range

import res=‘abbbb‘r=r‘ab{1,3}‘        #表示b的次数出现1次到3次,包括1和3print re.findall(r,s)       

12. If your regular expression, repeated use of the pre-compilation, so as to improve the efficiency of operation, the code is as follows

import rer=re.compile(r‘ab‘#把正则表达式提前编译好print r.findall(字符串)

13. If the match, if you want to match the case, you can compile it.

    • Re.compile (regular expression, re. I) #re. I means ignoring case

14. Methods

    • Match () determines whether the re matches the starting position of the string
    • Search () scan string to find the first position of this re match #前两个返回的是一个对象, if you want to see the results
import res=‘ aab‘r=re.compile(r‘ab‘)t=r.search(s)print t.group()

Goups () group

import rer=r‘(\d)\w*(\d)‘s=‘fas14214jiojoi2412‘str_re=re.compile(r)str_object=str_re.search(s)print str_object.groups()  #以元组输出分组的结果
    • FindAll () finds all the substrings that the re matches and returns them as a list
    • Finditer () finds all the substrings that the re matches and returns them as an iterator #返回一个迭代器对象

15.MatchObject (Match object)

    • Group () returns a string that is matched by re
    • Start () returns the position where the match started
    • End () returns the position of the end of the match
    • Span () returns a tuple containing the position of the match (start, end)

16. What if you want to replace a string that matches a regular expression?

import res=‘ aab‘r=re.compile(r‘ab‘)t=r.sub(‘AB‘,s)   #‘AB‘是把符合正则表达式的字符换掉print t

There is also a function, SUBN () returns a string, and returns the number of changes

17. Want to split the content that matches the regular expression

import res=‘ 1+2-3*4/5‘r=re.compile(r‘[\+\-\*/]‘)   #因为+-*都是特殊字符所有要加上\t=r.split(s)print

. (dot) matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.

19. If a match is encountered, it will not be considered an escape character when it encounters \ n and other escape characters, all at compile time.
Re.compile (regular expression, re. S

import rer=re.compile(r‘.net‘,re.S)print r.findall(‘\nnet‘)        

19. If you are dealing with multiple lines of string, you can use it to re. M, for example: for file processing

import res=‘‘‘ababcabcd‘‘‘r=re.compile(r‘^a‘,re.M)print r.findall(s)  

18. For regular expressions, when writing on multiple lines, use re. X

import rern=r‘‘‘a‘‘‘s=‘a‘r=re.compile(rn,re.X)print r.findall(s)

19. Grouping, selecting for string fragments, using (... | ... | ....)
Example 1:

import res=‘www.baidu.cn‘r=re.compile(r‘www\.\w+\.(com|cn)‘,re.X)print r.match(s)    

If you use FindAll () to match, the following behavior occurs

import res=‘www.baidu.cn‘r=re.compile(r‘www\.\w+\.(com|cn)‘,re.X)print r.findall(s)

Results:
[' CN ']

Returns the contents of a group

Example 2:

import res=‘name=1 name=2‘r=re.compile(r‘name=(\d)‘,re.X)print r.findall(s)

Results:
[' 1 ', ' 2 ']

Only get the contents of the grouping, if there is no grouping symbol, it also returns the Name= etc.

Python detailing regular expressions (explaining commonly used key characters)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.