Regular Expressions can match text fragments. Today, Python will discuss with you the re module in python and friends interested in the pythonre module.
What is a regular expression:
A regular expression is a pattern that can match a text clip.
The regular expression 'python' can match 'python'
Regular Expressions are awesome and are certainly not missing in python.
So today's Python will discuss with you the re module in python.
The re module supports regular expressions.
Wildcard
. Match any character:
'. Ython' can match 'python' and 'fython'
Escape special characters:
'Python \. org 'matches 'python. org'
Character Set
'[Pj] ython' can match 'python' and 'jython'
Reverse Character Set
'[^ Abc]' can match any character except abc
Selector
Use pipeline symbols |
Optional
When you add "hello", it becomes optional:
R' (http ://)? (Www .)? Python.org can only match the following types:
'Http: // www.python.org'
'Http: // python.org'
'Www .python.org'
'Python. org'
Replay Mode
*: The allowed mode is repeated 0 times or multiple times.
+: The allowed mode is repeated once or multiple times.
{M, n} allowed repeated m-n times
Of course, there are many regular expression syntax rules, far more than the above. However, we can only click here, because this blog aims to introduce the Python module and re module.
The re module enables the Python language to have all the regular expression functions.
The compile function generates a regular expression object based on a mode string and optional flag parameters. This object has a series of methods for regular expression matching and replacement.
The re module also provides functions that are exactly the same as those of these methods. These functions use a pattern string as their first parameter.
Important functions in re:
Compile (pattern [, flags]) creates a pattern object based on a string containing a regular expression.
Search (pattern, string [, flags]) in the string to find the Mode
Match (pattern, string [, flags]) matches the pattern at the beginning of the string
Split (pattern, string [, maxsplit = 0]) Splits strings Based on matching items
Findall (pattern, string) lists all matching items of the pattern in the string.
Replace all pat matching items in the sub (pat, rep, string [, count = 0]) string with repl
Escape (string) escapes all special expression characters in the string
The following is a simple application:
Use match
Import reprint (re. match ('www ', 'www .runoob.com '). span () # match print (re. match ('com ', 'www .runoob.com') # Not matching at the starting position
Use search
Import reprint (re. search ('www ', 'www .runoob.com '). span () # match print (re. search ('com ', 'www .runoob.com '). span () # does not match the start position
In this case, we need to stop. What is the difference between match and search?
Look at the results first:
Results In the match example:
(0, 3)
None
Results In the search example:
(0, 3)
(11, 14)
The match () function only checks whether the RE matches the start position of the string. search () scans the entire string for matching;
That is to say, match () is returned only when the match is successful at 0. If the match is not successful at the starting position, match () returns none.
Search () scans the entire string and returns the first successful match.
Use sub
The re module of Python provides re. sub to replace matching items in strings.
#!/usr/bin/pythonimport rephone = "2004-959-559 # This is Phone Number"# Delete Python-style commentsnum = re.sub(r'#.*$', "", phone)print "Phone Num : ", num# Remove anything other than digitsnum = re.sub(r'\D', "", phone) print "Phone Num : ", num
Result:
Phone Num: 2004-959-559
Phone Num: 2004959559
Final chrysanthemum:
^ Match the start of a string
$ Matches the end of a string.
. Match any character. Except for line breaks, when re. DOTALL is specified, it can match any character including line breaks.
[...] Indicates a group of characters, which are listed separately: [amk] matches 'A', 'M', or 'K'
[^...] Characters not in []: [^ abc] matches characters other than a, B, and c.
Re * matches zero or multiple expressions.
Re + matches one or more expressions.
Re? Matches 0 or 1 segment defined by the previous regular expression. It is not greedy.
Re {n}
Re {n,} exactly matches n previous expressions.
Re {n, m} matches the segments defined by the previous regular expression for n to m times. Greedy Mode
A | B matches a or B
(Re) G matches the expression in the brackets and also represents a group
(? Imx) a regular expression contains three optional flags: I, m, or x. Only the area in the brackets is affected.
(? -Imx) the regular expression disables the I, m, or x flag. Only the area in the brackets is affected.
(? : Re) similar to (...), but does not represent a group
(? Imx: re) use the I, m, or x flag in brackets.
(? -Imx: re) do not use I, m, or x optional flag in brackets
(? #...) Comment.
(? = Re) forward positive identifier. If the regular expression is included in the regular expression, it indicates that the match is successful at the current position. Otherwise, the match fails. However, once the contained expression has been tried, the matching engine has not improved at all; the rest of the pattern also needs to try to the right of the separator.
(?! Re. Opposite to the positive identifier. The expression contained in the string cannot match the current position of the string.
(?> Re) matching independent mode, eliminating backtracking.
\ W matching letters and numbers
\ W matches non-alphanumeric characters
\ S matches any blank characters, which is equivalent to [\ t \ n \ r \ f].
\ S match any non-null characters
\ D matches any number, which is equivalent to [0-9].
\ D match any non-digit
\ A matches strings
\ Z matches the end of a string. If a line break exists, it only matches the end string before the line break. C
\ Z match string ends
The position where \ G matches the final match.
\ B matches a word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.
\ B matches non-word boundaries. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.
\ N, \ t, and so on. match a line break. Match a tab. And so on
\ 1... \ 9 matches the subexpression of the nth group.
\ 10 matches the subexpression of the nth group if it matches. Otherwise, it refers to the expression of the octal verification code.
Re' regular expression syntax
The regular expression syntax is as follows:
Syntax |
Meaning |
Description |
"." |
Any character |
|
"^" |
String start |
'^ Hello' matches 'helloworld' but does not match 'aaaahellobb' |
"$" |
End of string |
Same as above |
"*" |
0 or multiple characters (Greedy match) |
<*> Match |
"+" |
1 or more characters (Greedy match) |
Same as above |
"? " |
0 or multiple characters (Greedy match) |
Same as above |
*?, + ?,?? |
The above three get the first matching result (non-Greedy match) |
<*> Match |