Personally, the main use of it to do some complex string analysis, extract the desired information
Learning principles: Enough on the line, when needed in depth
The summary is as follows:
Special symbols in regular expressions:
"." Table any character
"^" Table string Start
"$" Table string end
"*" "+" "?" followed by the characters, 0--Multiple, 1--Multiple, 0, or one
*?, +?, ?? Match the conditions of the case, matching the minimum//limit *,+,? match the greedy sex
{m} matches previous characters, repeat m times
{m,n} m to n times, M,n can omit
For example, ' a.*b ' is an arbitrary string representing the beginning of a and the end of B.
A{5} matches 5 consecutive a
[] Table A series of characters [ABCD] table a,b,c,d [^a] represents non-a
| a| b denotes a or B, AB is any regular expression in addition to | is greedy if a matches, then does not find B
(...) The function of this bracket is to be understood in conjunction with an instance to extract information
d [0-9]
D Non D
s represents a null character
S non-null character
w [a-za-z0-9_]
W Non w
One: Several functions of RE
1:compile (pattern, [flags])
Generate regular Expression objects from regular expression string pattern and optional flags
Generate regular Expression objects (see two)
Where flags have the following definition:
I denotes case insensitive
L make some special character sets that depend on the current environment
M Multi-line mode makes ^ $ match the start and end of a line in addition to the end of a string
S "." matches any character, including ' n ', otherwise. does not include ' n '
U make W, W, B, B, D, D, s and s dependent on the Unicode character properties database
X This is mainly to say, in order to write regular expressions, more toxic, will ignore some space and # after the comments
where S is more commonly used,
Application forms are as follows
Import re
Re.compile (..., re.) S
2:match (Pattern,string,[,flags])
Let string match, pattern, followed by flag the same as compile parameters
Returns the Matchobject object (see Third)
3:split (Pattern, string[, Maxsplit = 0])
To separate a string with pattern
>>> re.split (' w+ ', ' Words, Words, Words. ')
[' Words ', ' Words ', ' Words ', ']
Bracket ' () ' has special function in pattern, please check the manual
4:findall (pattern, string[, flags])
More commonly used,
Find the pattern-compliant expression from within a string and return to the list of lists
5:sub (Pattern, REPL, string[, Count])
Repl can be time string, also can function
When Repl is a string,
is to replace the string of strings that match the pattern with the REPL
When Repl is a function, it does not overlap with each other within a string, matching pattern
SUBSTRING, call REPL (substring), and then replace the substring with the return value
>>> re.sub (R ' defs+ ([a-za-z_][a-za-z_0-9]*) s* (s*): ',
... r ' static pyobject*npy_1 (void) n{',
... ' Def myfunc (): ')
' Static pyobject*npy_myfunc (void) n{'
>>> def dashrepl (matchobj):
... if matchobj.group (0) = = '-': return '
... else:return '-'
>>> re.sub ('-{1,2} ', Dashrepl, ' pro--gram-files ')
' Pro–gram files '
Two: Regular Expression objects (Regular Expression Objects)
Generation mode: Via Re.compile (Pattern,[flags]) back
Match (string[, pos[, Endpos]); return String[pos,endpos] Match
Matchobject of pattern (see III)
Split (string[, maxsplit = 0])
FindAll (string[, pos[, Endpos])
Sub (repl, string[, Count = 0])
These functions are the same as in the RE module, except that the calling form is somewhat different
Re. Several functions of several functions and regular expression objects have the same function, but if the same program
These function functions are used many times, and some functions of regular expression objects are more efficient.
Three: Matchobject
Through the Re.match (...) and re.compile (...). Match return
The object has the following methods and properties:
Method:
Group ([Group1, ...])
Groups ([default])
Groupdict ([default])
Start ([group])
End ([group])
The best way to illustrate these functions is to give an example
Matchobj = Re.compile (r) (? pd+). (d*) ")
m = Matchobj.match (' 3.14sss ')
#m = Re.match (?) pd+). (d*) ", ' 3.14sss ')
Print M.group ()
Print M.group (0)
Print M.group (1)
Print M.group (2)
Print M.group (1,2)
Print M.group (0,1,2)
Print m.groups ()
Print m.groupdict ()
Print M.start (2)
Print m.string
The output is as follows:
3.14
3.14
3
14
(' 3′, ' 14′ ')
(' 3.14′, ' 3′, ' 14′ ')
(' 3′, ' 14′ ')
{' int ': ' 3′}
2
3.14sss
So group () and group (0) return, matching the entire expression of the string
In addition group (i) is a regular expression with the first "()" in the matching content
(' 3.14′, ' 3′, ' 14′ ') can best explain the problem.
String substitution
1. Replace all matching substrings
Replaces all substrings in the subject with the regular expression regex with newstring
result, Number = RE.SUBN (regex, newstring, subject) 2. Replace all matching substrings (using regular Expression objects)
Reobj = Re.compile (regex)
result, Number = REOBJ.SUBN (newstring, subject) string split
1. String splitting
result = Re.split (regex, subject) 2. String splitting (using regular expression objects)
Reobj = Re.compile (regex)
result = Reobj.split (subject) match
Several matching usages of the Python regular expression are listed below:
1. Test whether the regular expression matches all or part of the string
Regex=ur "..." #正则表达式
If Re.search (Regex, subject):
Do_something ()
Else
Do_anotherthing ()
2. Test whether the regular expression matches the entire string
Regex=ur "... Z "#正则表达式末尾以Z结束
If Re.match (Regex, subject):
Do_something ()
Else
Do_anotherthing ()
3. Create a matching object, and then get the matching details through the object
Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group ()
Do_something ()
Else
Do_anotherthing ()
4. Gets the substring that the regular expression matches
(Get the part a string matched by the regex)
Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
result = Match.group ()
Else
result = ""
5. Gets the substring that the capturing group matches
(Get the part of a string matched by a capturing group)
Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
result = Match.group (1)
Else
result = ""
6. Gets the substring that the named group matches
(Get the part of a string matched by a named group)
Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
result = Match.group ("groupname")
Else
result = ""
7. Put all the matching substrings in the string into the array
(Get an array of all regex matches in a string)
result = Re.findall (regex, subject) 8. Traverse All matching substrings
(Iterate over all matches in a string)
For the match in Re.finditer (R) < (. *?) S*.*?/1> ", subject)
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group () 9. Create a regular expression object from a regular expression string
(Create an object to use the same regex for many operations)
Reobj = Re.compile (regex)
10. The regular Expression object version of Usage 1
(Use Regex object with If/else branch whether (part of) A string can is matched)
Reobj = Re.compile (regex)
If Reobj.search (subject):
Do_something ()
Else
Do_anotherthing ()
11. The regular Expression object version of Usage 2
(Use Regex object for If/else branch whether a string can be matched entirely)
Reobj = Re.compile (r "z") # End of regular expression ends with Z
If Reobj.match (subject):
Do_something ()
Else
Do_anotherthing ()
12. Create a regular Expression object, and then get the matching details through the object
(Create an object and details about how the Regex object matches (part of) a string)
Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group ()
Do_something ()
Else
Do_anotherthing ()
13. Get matching substring with regular expression object
(Use Regex object to get the part of a string matched by the regex)
Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
result = Match.group ()
Else
result = ""
14. Get the substring matched by the capturing group with the regular expression object
(Use Regex object to get the part of a string matched by a capturing group)
Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
result = Match.group (1)
Else
result = ""
15. Get the substring matched by the named group with the regular expression object
(Use Regex object to get the part of a string matched by a named group)
Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
result = Match.group ("groupname")
Else
result = ""
16. Get all matching substrings and put them in the array with the regular expression object
(Use Regex object-get "an array of ' all" regex matches in a string)
Reobj = Re.compile (regex)
result = Reobj.findall (subject) 17. Traversal of all matching substrings through a regular expression object
(Use Regex object to iterate the all matches in a string)
Reobj = Re.compile (regex)
For match in Reobj.finditer (subject):
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group ()
Non-greedy, multiple-line matching regular expression examples
Tips for some regular:
1 Non-greedy flag
>>> Re.findall (R "A (d+?)", "a23b")
[' 2 ']
>>> Re.findall (R "A (d+)", "a23b")
[' 23 '] Note the comparison of this situation:
>>> Re.findall (R "A (d+) b", "a23b")
[' 23 ']
>>> Re.findall (R "a" (d+?) B "," a23b ")
[' 23 ']
2 If you want to match multiple lines, then add re. S and RE.M logo
Re. S:. Will match line breaks, default. Do not match newline characters
>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b")
[]
>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b",
Re. S
[(' 23 ', ' 34 ')]
>>>re. The m:^$ flag will match each row, and the default ^ and $ will only match the first row
>>> Re.findall (r "^a (d+) b", "a23bna34b")
[' 23 ']
>>> Re.findall (r "^a (d+) b", "a23bna34b", re. M
[' 23 ', ' 34 '] but, if there is no ^ sign,
>>> Re.findall (R "A (d+) b", "a23bna23b")
[' 23 ', ' 23 '] visible, is not required re. M