Symbol Analysis of Common Python Regular Expressions and python Regular Expressions
The understanding of Regular Expressions in Python is mainly about the understanding of symbols. This article analyzes the regular expression symbols commonly used in Python. The main symbols are:
.
By default, one character is matched, excluding line breaks. If DOTALL is set, the line breaks are matched.
^
Match the beginning of a row
$
Match the end of a row
*
Matches 0 or multiple duplicates.
+
Match one or more duplicates
?
Matches one or zero duplicates.
*?, + ?,??
Match by non-Greedy Mode
{M}, {m, n}, {m, n }?
Match m duplicates, m to n duplicates, and m to n duplicates in non-Greedy mode.
\
Escape
[]
[Abc], [a-z] [^ a-z]
|
Or match 'a | B'
(...)
Matching Group
(? ILmsux )(? :...)(? P <name>...) >>> re. match ('(? P <name> abc) {2} ', 'abcabc'). groupdict () {'name': 'abc '}(? P = name) >>> re. match (R '(? P <name> abc )----(? P = name) ', 'abc ---- abc'). group () 'abc ---- abc '(? #...) # The following content is a comment (? = ...)
The content following the matched string needs to be matched.
>>> re.match(r'phone(?=\d{3})','phone123').group()'phone'#(?!...)
The matched characters cannot match
>>> re.match(r'phone(?!\d{3})','phoneabc123').group()'phone'(?<=...)
Match before the matched string
(? <!...) Matched characters cannot match
(? (Id/name) yes-pattern | no-pattern)
\ Number
\ A matches the start of A string
\ B match word boundary
\ B
\ B's antsense
\ D indicates [0-9]
\ D indicates [^ 0-9]
\ S indicates [\ t \ r \ n \ f \ v]
\ S is a non-white space character
\ W is equivalent to [a-zA-Z0-9]
\ W \ w antsense
\ Z: End of matching string
Retrieving text information using Python Regular Expressions
Import re
Text = open(r'file named log.txt '). read () # read the text first
Sys_bok = text. split ('bck ') # split sys and bok into two parts
Syss = sys [0]. split ('\ n ')
Bcks = sys [1]. split ('\ n ')
Print 'sys'
For sys in syss:
S = re. findall (R' [0-9] + ', sys)
Print ''. join (s)
Print 'bck'
For bck in bcks:
B = re. findallre. findall (R' [0-9] + ', bck)
Print ''. join (B)
In this way, the following data format is output:
Sys
20 12 79
20 13 81
20 14 12
Bck
20 12 164
20 13 278
20 14 128
Python Regular Expressions
You are wrong. R "2x \ + 5y" indicates that "\" in the string is not escaped;
In a regular expression, "\ +" indicates escaping "+" in a regular expression, because "+" has a special meaning in a regular expression, this is irrelevant to the escape of strings.
More clearly, you write "\ +" or "r" \ + "in the program and save a" \ "and a" + "in the memory ", as long as the Regular Expression Engine reads a continuous "\" and "+" from the memory, it will understand that you want to match the character "+.
Therefore, if you do not write r before the string, the regular expression string should be written as follows:
"2x \ + 5y | 7y-3z"