What is a regular:
Regular expressions are patterns that can match text fragments.
Regular expression ' python ' can match ' python '
The regular is a very good thing, of course, Python is not missing.
So today's python is going to talk to you about the RE module in Python.
The RE module contains support for regular expressions.
Wildcard characters
. Indicates that any character matches:
'. Ython ' can match ' python ' and ' Fython '
To escape a special character:
' python\.org ' matches ' python.org '
Character
' [Pj]ython ' can match ' python ' and ' Jython '
Inverse Character Set
' [^ABC] ' can match any character except ABC
Selection character
Using Pipe symbols |
options available
Add hello to the option:
R ' (HTTP//)? (www.)? Python.org ' can only match the following types:
' Http://www.python.org '
' Http://python.org '
' Www.python.org '
' Python.org '
Repeating sub-mode
*: Allow mode to repeat 0 or more times
+: Allow mode to repeat 1 or more times
{m, n} allow mode to repeat m-n times
Of course, there are many regular grammatical rules, much more than these. But we can only donuts, because the purpose of this blog is to introduce the module in Python, the RE module.
The RE module enables the Python language to have all the regular expression functionality.
The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution.
The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.
Important functions in Re:
Compile (pattern[, flags]) creates a pattern object based on a string containing a regular expression
Search (pattern, string[, flags]) searching for patterns in strings
Match (pattern, string[, flags]) matches the pattern at the beginning of the string
Split (pattern, string[, maxsplit=0]) splits a string based on a match
FindAll (Pattern, string) lists all occurrences of a pattern in a string
Sub (PAT, Rep, string[, Count=0]) the match for all Pat in the string is replaced with REPL
Escape (String) escapes all special expression characters in a string
The following is a simple application:
Use match
Import reprint (Re.match (' www ', ' www.runoob.com '). span ()) # Match print at start (re.match (' com ', ' www.runoob.com ') # does not match at start position
Using Search
Import reprint (Re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position (re.search (' com ', ' www.runoob.com '). span ()) # does not match at start position
What's the difference between match and search when you need to stop?
Look at the results first:
Results from the match example:
(0, 3)
None
Results from the search example:
(0, 3)
(11, 14)
The match () function only detects if the re is matched at the beginning of the string, and search () scans the entire string to find the match;
That is, match () returns only if the 0-bit match succeeds, and the match () returns none if the match is not successful at the start position.
Search () scans the entire string and returns the first successful match.
Using Sub
The Python re module provides re.sub to replace matches in a string.
#!/usr/bin/pythonimport Rephone = "2004-959-559 # This is Phone number" # Delete Python-style commentsnum = re.sub (R ' #.*$ ', "", phone) print "Phone num:", num# Remove anything other than Digitsnum = Re.sub (R ' \d ', "", phone) print "Phone num:" Num
Results:
Phone num:2004-959-559
Phone num:2004959559
The last chrysanthemum:
^ matches the beginning of the string
$ matches the end of the string.
. Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.
[...] Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '
[^...] Characters not in []: [^ABC] matches characters other than a,b,c.
re* matches 0 or more expressions.
Re+ matches 1 or more expressions.
Re? Matches 0 or 1 fragments defined by a preceding regular expression, not greedy
re{N}
re{N,} exactly matches n preceding expressions.
re{N, m} matches N to M times the fragment defined by the preceding regular expression, greedy way
a| b matches A or b
(RE) The G matches the expression in parentheses, and also represents a group
The (? IMX) Regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.
(?-imx) Regular expression Close I, M, or x optional flag. Affects only the areas in parentheses.
(?: RE) similar (...), but does not represent a group
(? imx:re) use I, M, or x optional flag in parentheses
(?-imx:re) do not use I, M, or x optional flags in parentheses
(?#...) Comments.
(? = RE) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.
(?! Re) forward negative delimiter. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string
(?> re) matches the standalone mode, eliminating backtracking.
\w Match Alpha-numeric
\w matches non-alphanumeric numbers
\s matches any whitespace character, equivalent to [\t\n\r\f].
\s matches any non-null character
\d matches any number, equivalent to [0-9].
\d matches any non-numeric
\a Match string start
\z matches the end of the string, if there is a newline, matches only the end string before the line break. C
\z Match string End
\g matches the position where the last match was completed.
\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\ n, \ t, and so on. Matches a line break. Matches a tab character. such as
\1...\9 matches the sub-expression of the nth grouping.
\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.
Regular expression syntax for re
The Regular Expression syntax table is as follows:
Grammar |
Significance |
Description |
". " |
any character |
|
" ^ " |
string start |
^hello ' Match ' HelloWorld " without matching ' aaaahellobbb " |
" $ " |
end of string |
|
" * " |
0 or more characters (greedy match) |
<*> match |
" + " |
1 or more characters (greedy match ) |
and the same as the |
"? " |
0 or more characters (greedy match ) |
and the same as the |
*?,+?,??
|
above three take the first matching result (non-greedy match ) |
<*> Matching |