Very detailed use of Python regular expressions

Source: Internet
Author: User


The first chapter is about character groups (Character Class).
In a regular expression, it represents "various characters that may appear in the same position", which is written to list all possible characters between brackets [and], simple character groups such as [AB], [314], [#.]
The places to be noted are
1)-The range notation is to be written in ASCII order, [0-9] is valid, [9-0] will report an error

In [1]: Import re

In [2]: Re.search ("^[0-9]$", "2")!= None
OUT[2]: True

In [3]: Re.search ("^[9-0]$", "2")!= None
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Traceback (most recent call last):
Error:bad Character Range

2 match numbers and letters do not use [0-z], because this range includes many punctuation marks, preferably written [0-9a-za-z]

In [4]: Re.search ("^[0-z]$", "A")!= None
OUT[4]: True

In [5]: Re.search ("^[0-z]$", ":")!= None
OUT[5]: True

3 The meta character is used as a common character to be escaped, in the Python code, the escape of the ordinary string needs to add two "", the raw string requires a "" recommended use of native strings

#原生字符串和字符串的等价
In [6]: R "^[0-9]$" = = "^[0\-9]$"
OUT[6]: True
#原生字符串的转义要简单许多
In [7]: Re.search (r "^[0-9]$", "3")!= None
OUT[7]: False
In [8]: Re.search (r "^[0-9]$", "-")!= None
OUT[8]: True

4)] appear in different locations, meaning different, regular expressions will be the previous


#未转义的]
In [9]: Re.search (r "^[012]345]$", "2345")!= None
OUT[9]: False

In [ten]: Re.search (r "^[012]345]$", "2345]")!= None
OUT[10]: True

In [one]: Re.search (r "^[012]345]$", "5")!= None
OUT[11]: False

in [[]: Re.search (r "^[012]345]$", "]")!= None
OUT[12]: False
#转义的]
in [[]: Re.search (r "^[012]345]$", "2345")!= None
OUT[13]: False

in [[]: Re.search (r "^[012]345]$", "5")!= None
OUT[14]: True

in [[]: Re.search (r "^[012]345]$", "]")!= None
OUT[15]: True

5) Character group denoted method

Common character group denoted methods are D, W, S. On the surface, they are related to [...] No contact at all, in fact, is consistent. where d is equivalent to [0-9], the D represents "number (digit)"; W is equivalent to [0-9a-za-z_], where W stands for "word character (word)"; s is equivalent to [TRNVF] (the first character is a space), and S is "white space".


Re.search (R "^d$", "8")!= None # => True
Re.search (R "^d$", "a")!= None # => False

Re.search (R "^w$", "8")!= None # => True
Re.search (R "^w$", "a")!= None # => True
Re.search (R "^w$", "_")!= None # => True

Re.search (R "^s$", "" ")!= None # => True
Re.search (R "^s$", "T")!= None # => True
Re.search (R "^s$", "n")!= None # => True

6 The regular expression also provides the denoted method corresponding to the set of excluded characters in relation to the three common character groups of D, W and S, denoted: D, W and s--letters exactly the same, only to uppercase. These denoted methods match the characters complementary: s can match the characters, s must not match; W can match the characters, W must not match, D can match the characters, D must not match. Example 1-19 demonstrates the application of the denoted method for these groups of characters.


#d和D
Re.search (R "^d$", "8")!= None # => True
Re.search (R "^d$", "a")!= None # => False
Re.search (R "^d$", "8")!= None # => False
Re.search (R "^d$", "a")!= None # => True
#w和W
Re.search (R "^w$", "C")!= None # => True
Re.search (R "^w$", "!")!= None # => False
Re.search (R "^w$", "C")!= None # => False
Re.search (R "^w$", "!")!= None # => True
#s和S
Re.search (R "^s$", "T")!= None # => True
Re.search (R "^s$", "0")!= None # => False
Re.search (R "^s$", "T")!= None # => False
Re.search (R "^s$", "0")!= None # => True

2.1 General Forms

Table 2-1 General form of quantifiers
The elements before the
classifier
description
{n}
elements must appear n times
The elements before the {m,n}
appear at least m Times, up to n times
{m,}
appears at least m times without a maximum number of occurrences
{0,n}
can not appear, and Can appear, up to n times
(in some languages you can write {, n} )
2.2 Common quantifiers
Table 2-2 Common quantifiers
Common quantifiers
{M,n} equivalence form
Description
*
{0,}
May or may not occur, there are no limits on the number of occurrences
+
{1,}
At least 1 times, no limit on the number of occurrences
?
{0,1}
Occurs at most 1 times, and may not appear
Table 2-3 Matching of all kinds of tag
Matches all tag's expressions
Tag category
An expression that matches the tag of a category
<[^>]+>
Open tag
<[^/>][^>]*>
Close tag
</[^>]+>
self-closing tag
<[^>/]+/>
Note: These expressions are not very rigorous, such as matching the open tag expression, you can also match self-closing tag. The author said that the existing knowledge is not enough to solve the problem, need to continue to learn.
2.3 Data Extraction
?
1
2
3
4
5
6
7
8
9
10
In [ 1 ]: import re
In [ 2 ]: re.search(r "d{6}" , "ab123456cd" ).group( 0 )
Out[ 2 ]: '123456'
In [ 3 ]: re.search(r "^<[^>]+>$" , "<bold>" ).group( 0 )
Out[ 3 ]: '<bold>'
In [ 4 ]: re.findall(r "d{6}" , "zipcode1:201203, zipcode2:100859" )
Out[ 4 ]: [ '201203' , '100859' ]
No. 2.4.
Match all characters except line breaks
2.5 the problem of abusing the dot number
Because the point number can match almost all characters, so the actual application of many people figure easy, free to use. * or. +, but the result backfired.
The classifiers described earlier can be grouped into one category, called matching precedence quantifiers (greedy quantifier, also translated as greedy quantifiers).
Matching the first quantifier, as the name suggests, is in doubt whether to match the time, the first attempt to match, and write down this state for future "regret."
Backtracking (backtracking)
?
1
2
3 /div>
4
re.search(r "".*"" , ""quoted string" and another"" ).group( 0 )
re.search(r '".*"' , '"quoted string" and another"' ).group( 0 )
re.search(r ""[^"]*"" , ""quoted string" and another"" ).group( 0 )
re.search(r '"[^"]*"' , '"quoted string" and another"' ).group( 0
2.6 Ignore Precedence quantifiers
The matching priority classifier corresponds to ignoring the first classifier, but only after the corresponding matching classifier is added.
The number of qualified elements can also be the same, the encounter can not match the situation also need to backtrack; the only difference is that
Ignoring the priority classifier will give priority to "ignore", and matching priority classifier will select "Match".
Table 2-4 Matching priority classifiers and ignoring priority quantifiers
Matching precedence quantifiers
Ignore precedence quantifiers
Limited number of times
*
*?
May not appear, or may occur, the number of times there is no limit
+
+?
At least 1 times, no limit on the number of occurrences
?
??
Occurs at most 1 times, and may not appear
{M,n}
{m,n}?
The number of occurrences is at least m Times, up to n Times
{m,}
{m,}?
Number of occurrences at least m times, no upper limit
{, n}
{, n}?
May not appear, or may appear, up to n Times
2.7 Escape
Quantifiers
Escape form
N
N
{M,n}
{M,n}
{m,}
{m,}
{, n}
{, n}
*
*
+
+
?
?
*?
*?
+?
+?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.