The first chapter is about character groups (Character Class).
In a regular expression, it represents "various characters that may appear in the same position", which is written to list all possible characters between brackets [and], simple character groups such as [AB], [314], [#.]
The places to be noted are
1)-The range notation is to be written in ASCII order, [0-9] is valid, [9-0] will report an error
In [1]: Import re
In [2]: Re.search ("^[0-9]$", "2")!= None
OUT[2]: True
In [3]: Re.search ("^[9-0]$", "2")!= None
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Traceback (most recent call last):
Error:bad Character Range
2 match numbers and letters do not use [0-z], because this range includes many punctuation marks, preferably written [0-9a-za-z]
In [4]: Re.search ("^[0-z]$", "A")!= None
OUT[4]: True
In [5]: Re.search ("^[0-z]$", ":")!= None
OUT[5]: True
3 The meta character is used as a common character to be escaped, in the Python code, the escape of the ordinary string needs to add two "", the raw string requires a "" recommended use of native strings
#原生字符串和字符串的等价
In [6]: R "^[0-9]$" = = "^[0\-9]$"
OUT[6]: True
#原生字符串的转义要简单许多
In [7]: Re.search (r "^[0-9]$", "3")!= None
OUT[7]: False
In [8]: Re.search (r "^[0-9]$", "-")!= None
OUT[8]: True
4)] appear in different locations, meaning different, regular expressions will be the previous
#未转义的]
In [9]: Re.search (r "^[012]345]$", "2345")!= None
OUT[9]: False
In [ten]: Re.search (r "^[012]345]$", "2345]")!= None
OUT[10]: True
In [one]: Re.search (r "^[012]345]$", "5")!= None
OUT[11]: False
in [[]: Re.search (r "^[012]345]$", "]")!= None
OUT[12]: False
#转义的]
in [[]: Re.search (r "^[012]345]$", "2345")!= None
OUT[13]: False
in [[]: Re.search (r "^[012]345]$", "5")!= None
OUT[14]: True
in [[]: Re.search (r "^[012]345]$", "]")!= None
OUT[15]: True
5) Character group denoted method
Common character group denoted methods are D, W, S. On the surface, they are related to [...] No contact at all, in fact, is consistent. where d is equivalent to [0-9], the D represents "number (digit)"; W is equivalent to [0-9a-za-z_], where W stands for "word character (word)"; s is equivalent to [TRNVF] (the first character is a space), and S is "white space".
Re.search (R "^d$", "8")!= None # => True
Re.search (R "^d$", "a")!= None # => False
Re.search (R "^w$", "8")!= None # => True
Re.search (R "^w$", "a")!= None # => True
Re.search (R "^w$", "_")!= None # => True
Re.search (R "^s$", "" ")!= None # => True
Re.search (R "^s$", "T")!= None # => True
Re.search (R "^s$", "n")!= None # => True
6 The regular expression also provides the denoted method corresponding to the set of excluded characters in relation to the three common character groups of D, W and S, denoted: D, W and s--letters exactly the same, only to uppercase. These denoted methods match the characters complementary: s can match the characters, s must not match; W can match the characters, W must not match, D can match the characters, D must not match. Example 1-19 demonstrates the application of the denoted method for these groups of characters.
#d和D
Re.search (R "^d$", "8")!= None # => True
Re.search (R "^d$", "a")!= None # => False
Re.search (R "^d$", "8")!= None # => False
Re.search (R "^d$", "a")!= None # => True
#w和W
Re.search (R "^w$", "C")!= None # => True
Re.search (R "^w$", "!")!= None # => False
Re.search (R "^w$", "C")!= None # => False
Re.search (R "^w$", "!")!= None # => True
#s和S
Re.search (R "^s$", "T")!= None # => True
Re.search (R "^s$", "0")!= None # => False
Re.search (R "^s$", "T")!= None # => False
Re.search (R "^s$", "0")!= None # => True
Table 2-1 General form of quantifiers
|
classifier
|
description
|
|
{n}
|
elements must appear n times |
|
The elements before the {m,n}
|
appear at least m Times, up to n times |
|
{m,}
|
appears at least m times without a maximum number of occurrences |
The elements before the
| {0,n} |
can not appear, and Can appear, up to
n times
(in some languages you can write {, n} )
|
Table 2-2 Common quantifiers
|
Common quantifiers
|
{M,n} equivalence form
|
Description
|
|
*
|
{0,}
|
May or may not occur, there are no limits on the number of occurrences
|
|
+
|
{1,}
|
At least 1 times, no limit on the number of occurrences
|
|
?
|
{0,1}
|
Occurs at most 1 times, and may not appear
|
Table 2-3 Matching of all kinds of tag
|
Matches all tag's expressions
|
Tag category
|
An expression that matches the tag of a category
|
|
<[^>]+>
|
Open tag
|
<[^/>][^>]*>
|
|
Close tag
|
</[^>]+>
|
|
self-closing tag
|
<[^>/]+/>
|
Note: These expressions are not very rigorous, such as matching the open tag expression, you can also match self-closing tag. The author said that the existing knowledge is not enough to solve the problem, need to continue to learn.
|
1
2
3
4
5
6
7
8
9
10
|
In [
2
]: re.search(r
"d{6}"
,
"ab123456cd"
).group(
0
)
In [
3
]: re.search(r
"^<[^>]+>$"
,
"<bold>"
).group(
0
)
In [
4
]: re.findall(r
"d{6}"
,
"zipcode1:201203, zipcode2:100859"
)
Out[
4
]: [
'201203'
,
'100859'
]
|
Match all characters except line breaks
2.5
the problem of abusing the dot number
Because the point number can match almost all characters, so the actual application of many people figure easy, free to use. * or. +, but the result backfired.
The classifiers described earlier can be grouped into one category, called matching precedence quantifiers (greedy quantifier, also translated as greedy quantifiers).
Matching the first quantifier, as the name suggests, is in doubt whether to match the time, the first attempt to match, and write down this state for future "regret."
Backtracking (backtracking)
|
1
2
|
re.search(r
"".*""
,
""quoted string" and another""
).group(
0
)
re.search(r
'".*"'
,
'"quoted string" and another"'
).group(
0
)
re.search(r
""[^"]*""
,
""quoted string" and another""
).group(
0
)
re.search(r
'"[^"]*"'
,
'"quoted string" and another"'
).group(
0
|
2.6 Ignore Precedence quantifiers
The matching priority classifier corresponds to ignoring the first classifier, but only after the corresponding matching classifier is added.
The number of qualified elements can also be the same, the encounter can not match the situation also need to backtrack; the only difference is that
Ignoring the priority classifier will give priority to "ignore", and matching priority classifier will select "Match".
Table 2-4 Matching priority classifiers and ignoring priority quantifiers
|
Matching precedence quantifiers
|
Ignore precedence quantifiers
|
Limited number of times
|
|
*
|
*?
|
May not appear, or may occur, the number of times there is no limit
|
|
+
|
+?
|
At least 1 times, no limit on the number of occurrences
|
|
?
|
??
|
Occurs at most 1 times, and may not appear
|
|
{M,n}
|
{m,n}?
|
The number of occurrences is at least
m Times, up to n Times
|
|
{m,}
|
{m,}?
|
Number of occurrences at least
m times, no upper limit
|
|
{, n}
|
{, n}?
|
May not appear, or may appear, up to
n Times
|
2.7 Escape
|
Quantifiers
|
Escape form
|
|
N
|
N
|
|
{M,n}
|
{M,n}
|
|
{m,}
|
{m,}
|
|
{, n}
|
{, n}
|
|
*
|
*
|
|
+
|
+
|
|
?
|
?
|
|
*?
|
*?
|
|
+?
|
+?
|
| |
|