Training course 6 re usage
Re usage process: mode string-> Expression object-> match target text
Common metacharacters:
.: Any character except line breaks
\: Escape characters when we use metacharacters as common characters for matching
[]: Character Set matching
\ D: number matching 0-9
\ D: Non-numeric match [^ 0-9]
\ S: Non-blank character [\ t \ r \ n \ f \ n]
\ S: F non-blank characters [^ \ s]
\ W: letter and number [A-Za-z0-9]
\ W: Non-letter number [^ \ W]
# To put it bluntly, the relationship between uppercase and lowercase letters is exactly the relationship between the complement set.
# Using compile will be faster
>>> P = Re. Compile (r'nihao ')
>>> Type (P)
<Type '_ SRE. sre_pattern'>
# Adding R before matching characters is less unnecessary
In [10]: rs = P. Match ('abc ABCDE abcdd ')
# Using the mathch method, a mactch object is returned if the match is successful. If the match is unsuccessful, none is returned.
In [11]: Print rs
<_ SRE. sre_match object at 0x03638918>
In [12]: Rs. Group () # mathc returns only one matching character.
Out [12]: 'abc'
Quantity matching:
+: Once or multiple times
? : Once or 0 times
*: Any number of times, 0 times, 1 time, multiple times
{M}: match the previous character m times
{M, n}: match the previous character m to n times
Quantifiers? : Non-Greedy mode ex: [I] *?
In [14]: P = Re. Compile ('[ABC] {2, 3}') # Greedy mode by default, always matching to the maximum
In [15]: P. findall ('abcabcbc ')
Out [15]: ['abc', 'abc', 'bc']
In [16]: P = Re. Compile ('[ABC] {2, 3 }? ') # Added? If it becomes non-greedy in the future, the minimum match will be returned.
In [17]: P. findall ('abcabcbc ')
Out [17]: ['AB', 'CA', 'bc', 'bc']
Boundary:
^: Matches the start of a string. Multiple rows match the beginning of each row.
$: Match the end of a string. Multiple rows match the end of each row.
\ A: matches the start of a string only
\ Z: only matches the end of the string
\ B: matching between \ W and W
Logical grouping:
|: Or operation
(...): Group matching, matching Return Value
(?...) : Group match. No matching value is returned.
\ <Num>: refers to the string matched by the group numbered num.
(? P <Name>): group name
(? P = Name): Reference Group. No value is specified.
# Grouping operations are closely related to the group method.
# Do not group or extract
In [52]: Re. findall (R '(? : A) B ', 'abc ')
Out [52]: ['AB']
# Group name
In [54]: M = Re. Match (R '(? P <A> A) (B) ', 'abc ')
In [55]: M. Groups ()
Out [55]: ('A', 'B ')
In [56]: M. groupdict ()
Out [56]: {'A': 'A '}
Special Structure:
(? :...): Not grouped
(? Ilmsux): Different Modes
(? #...): # Comment
(? =...): A character is returned only after it is satisfied.
(?!...) :
(? <= ...):
(? <!...) :
(? (ID/Name) Yes | no ):
In [69]: Re. findall (r'a (? = \ D) ', 'a3 A4 ')
Out [69]: ['A', 'a']
Matching mode:
Re. I case-insensitive
Re. m multi-row mode changes to the behavior of ^ $, ignore line feed
Re. S. Any matching mode, including line feed
Re. U \ W \ B \ s \ D depends on the Unicode Character attribute
Re. x verbose mode
Re. l use the reserved string class \ W \ B \ s depends on the current region settings
(? Ilmsux): different modes are generally placed at the beginning of the expression.
Common functions:
Compile
Pattern
Match
Serach
Split
Findall
Finditer
Sub
? Difference between match and search