Python detailing regular expressions (explaining commonly used key characters)

Last Update:2016-05-12 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Official website

module RE
{
The RE module provides an interface to the regular expression engine that allows you to compile restring on objects and use them for matching
}
Question: The string before the function of R, the backslash will not be any special processing
{
Compile Flag-flages
Dotall[s] to match all characters including line breaks
Ignorecase[i] To make the match case insensitive
LOCALE[L], do localization identification (locale-aware) match French, etc.
Multiline[m], multiline match, impact ^ and $
VARBOSE[X], the ability to use the verbose state of RES to make it easier to understand
}

1.[...] Used to represent a set of characters, listed separately: [IO] matches ' i ', or ' o '

import res=‘tip top‘r=r‘t[io]p‘a=re.findall(r,s)print a

[...] If the match is a range, this write [0-9a-za-b] can represent 0 to 9 and a to Z and a to B

2.^⑴ characters not in []: [^io] matches characters other than i,o

import res=‘tip top‘r=r‘t[^io]p‘a=re.findall(r,s)print a

⑵ matches the beginning of a string

    import re    s=‘ahello hello‘    r=r‘^hello‘    a=re.findall(r,s)    print a

3.$ matches the following string

import res=‘hello hello‘r=r‘hello$‘a=re.findall(r,s)print a

4.[...^ ...]"Besides writing at the beginning of the place"and [.. or [ ...] or [... $] this has no effect and will only be treated as a generic character.

5. If the match string ^abc, there will be a problem, because ^ is a special character

import res=‘^abc ^abc ^abc‘r=r‘^abc‘a=re.findall(r,s)print a

The above problem can be resolved by an escape character

import res=‘^abc ^abc ^abc‘r=r‘\^abc‘a=re.findall(r,s)print a

3.9
-The backslash arrangement can add different character utilises to express different special meanings
-can also be used to get out of the sludge without dyeing all the meta-characters: [or \
⑴\d matches any decimal number, which is equivalent to class [0-9]
⑵\D matches any non-numeric character, which is equivalent to class [^0-9]
⑶\s matches any whitespace character, which is equivalent to class [\t\n\r\f\v]
⑷\w matches any alphanumeric character, which is equivalent to a class [a-za-z0-9]
⑸\w matches any non-alphanumeric character, which is equivalent to class [^a-za-z0-9]

7. Duplicate questions
If you match a phone number, for the number, you re-use \d is very troublesome, how to solve

import ren=‘18829789854‘r=r‘^1\d\d\d\d\d\d\d\d\d\d‘print re.findall(r,n)

Workaround: Use {Number},number to indicate the number of repetitions

import ren=‘18829789854‘r=r‘^1\d{10}$‘   #对于电话号码前面和后面不能现出现别的字符，所以使用^和$   print re.findall(r,n)

8.* specifies that the previous character can be matched 0 or more times instead of only once. Match engine tries to repeat as many times as I can (no more than integer office range, 2 billion)

import urllib,reh=urllib.urlopen(‘http://www.baidu.com‘)s=h.read()r=‘www.\w*.com‘  #两个.之间想匹配多个字母或数字li=re.findall(r,s)forin li:    print i

9.+ the difference between this and * is that at least once, that is, match the preceding character (the number of times >=1), all for the URL match, you should use the + sign

import urllib,reh=urllib.urlopen(‘http://www.baidu.com‘)s=h.read()r=‘www.\w+.com‘  #两个.之间不会出现0次li=re.findall(r,s)forin li:    print i

10.? ⑴ indicates that the preceding symbol repeats once or 0 times
⑵ represents the minimum number of matches, if you want to get the shortest match of AB

import res=‘abbbbbbb‘r=r‘ab+‘print re.findall(r,s)

Shortest:

import res=‘abbbbbbb‘r=r‘ab+?‘print re.findall(r,s)

There is also a role:

import rer=r‘(?:\d{1}){2}‘       #让组里面的内容连着再次s=‘fas14214jiojoi2412‘str_re=re.compile(r)print str_re.findall(s)

11. Use {M,n} if you want to match the previous symbol to a certain range

import res=‘abbbb‘r=r‘ab{1,3}‘        #表示b的次数出现1次到3次，包括1和3print re.findall(r,s)

12. If your regular expression, repeated use of the pre-compilation, so as to improve the efficiency of operation, the code is as follows

import rer=re.compile(r‘ab‘#把正则表达式提前编译好print r.findall(字符串)

13. If the match, if you want to match the case, you can compile it.

Re.compile (regular expression, re. I) #re. I means ignoring case

14. Methods

Match () determines whether the re matches the starting position of the string
Search () scan string to find the first position of this re match #前两个返回的是一个对象, if you want to see the results

import res=‘ aab‘r=re.compile(r‘ab‘)t=r.search(s)print t.group()

Goups () group

import rer=r‘(\d)\w*(\d)‘s=‘fas14214jiojoi2412‘str_re=re.compile(r)str_object=str_re.search(s)print str_object.groups()  #以元组输出分组的结果

FindAll () finds all the substrings that the re matches and returns them as a list
Finditer () finds all the substrings that the re matches and returns them as an iterator #返回一个迭代器对象

15.MatchObject (Match object)

Group () returns a string that is matched by re
Start () returns the position where the match started
End () returns the position of the end of the match
Span () returns a tuple containing the position of the match (start, end)

16. What if you want to replace a string that matches a regular expression?

import res=‘ aab‘r=re.compile(r‘ab‘)t=r.sub(‘AB‘,s)   #‘AB‘是把符合正则表达式的字符换掉print t

There is also a function, SUBN () returns a string, and returns the number of changes

17. Want to split the content that matches the regular expression

import res=‘ 1+2-3*4/5‘r=re.compile(r‘[\+\-\*/]‘)   #因为+-*都是特殊字符所有要加上\t=r.split(s)print

. (dot) matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.

19. If a match is encountered, it will not be considered an escape character when it encounters \ n and other escape characters, all at compile time.
Re.compile (regular expression, re. S

import rer=re.compile(r‘.net‘,re.S)print r.findall(‘\nnet‘)

19. If you are dealing with multiple lines of string, you can use it to re. M, for example: for file processing

import res=‘‘‘ababcabcd‘‘‘r=re.compile(r‘^a‘,re.M)print r.findall(s)

18. For regular expressions, when writing on multiple lines, use re. X

import rern=r‘‘‘a‘‘‘s=‘a‘r=re.compile(rn,re.X)print r.findall(s)

19. Grouping, selecting for string fragments, using (... | ... | ....)
Example 1:

import res=‘www.baidu.cn‘r=re.compile(r‘www\.\w+\.(com|cn)‘,re.X)print r.match(s)

If you use FindAll () to match, the following behavior occurs

import res=‘www.baidu.cn‘r=re.compile(r‘www\.\w+\.(com|cn)‘,re.X)print r.findall(s)

Results:
[' CN ']

Returns the contents of a group

Example 2:

import res=‘name=1 name=2‘r=re.compile(r‘name=(\d)‘,re.X)print r.findall(s)

Results:
[' 1 ', ' 2 ']

Only get the contents of the grouping, if there is no grouping symbol, it also returns the Name= etc.

Python detailing regular expressions (explaining commonly used key characters)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More