Regular expressions are a great thing, whether in JavaScriptor in Python Web development (http://www.maiziedu.com/course/python-px/ , we all encounter regular expressions, although JavaScript and Python have little difference in regular expressions, but regular expressions are an essential part of Python, so let's talk about the re module in python today .
The RE module contains support for regular expressions.
What is a regular:
Regular expressions are patterns that can match text fragments.
Regular expression ' python ' can match ' python '
Wildcard characters
. indicates that any character matches:
'. Ython ' can match ' python ' and ' Fython '
To escape a special character:
' python\.org ' matches ' python.org '
Character
' [Pj]ython ' can match ' python ' and ' Jython '
Inverse Character Set
' [^ABC] ' can match any character except ABC
Selection character
Using pipe symbols |
Options available
Add hello to the option:
R ' (HTTP//)? (www.)? Python.org ' can only match the following types:
' Http://www.python.org '
' Http://python.org '
' Www.python.org '
' Python.org '
Repeating sub-mode
*: Allow mode to repeat 0 or more times
+: Allow mode to repeat 1 or more times
{m, n} Allow mode repetition M-n Times
Of course, there are many regular grammatical rules, much more than these. But we can only donuts, because the purpose of this blog is to introduce the module in Python , there module.
The RE module enables the Python language to have all the regular expression functionality.
The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution.
The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.
important functions in Re:
Compile (pattern[, flags]) creates a pattern object based on a string containing a regular expression
Search (pattern, string[, flags]) searching for patterns in strings
Match (pattern, string[, flags]) matches the pattern at the beginning of the string split (pattern, string[, maxsplit=0]) splits the string based on the match
FindAll (Pattern, string) lists all occurrences of a pattern in a string sub (PAT, Rep, string[, count=0]) all Pat in string matches with the Repl Replace
Escape (String) escapes all special expression characters in a string
The following is a simple application:
Use match
Import re
Print (Re.match (' www ', ' www.runoob.com '). span ()) # matches at the starting position
Print (Re.match (' com ', ' www.runoob.com ')) # does not match at start position
Using search
Import reprint (Re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position (re.search (' com ', ' www.runoob.com ' ). span ()) # does not match at the start position
What's the difference between match and search when you need to stop?
Look at the results first:
Results from the match example:
(0, 3) None
results from the search example:
(0, 3)
(11, 14)
The match () function only detects if the RE is matched at the start of the string , andsearch () scans the entire string find matches;
That is, Match () returns only if the match () is successful at 0 position, Match() If it is not a successful start position match will return none.
Search () scans the entire string and returns the first successful match.
Using Sub
The Python re module provides re.sub to replace matches in a string.
#!/usr/bin/pythonimport RE
Phone = "2004-959-559 # This is Phone number"
# Delete Python-style Comments
num = re.sub (R ' #.*$ ', "", phone) print "Phone num:", num
# Remove anything other than digits
num = re.sub (R ' \d ', "", phone) print "Phone num:", num
Results:
Phone num:2004-959-559
Phone num:2004959559
The last chrysanthemum:
^ matches the beginning of the string
$ matches the end of the string.
. matches any character, except for line breaks, when Re. When the Dotall tag is specified, it can match any character that includes a line feed.
[...] used to represent a set of characters , listed separately: [AMK] match ' A ', ' m ' or ' K '
[^...] not in Characters in [] :[^ABC] matches characters other than a,b,c.
re* matches 0 or more expressions.
re+ matches 1 or more expressions.
Re? matches 0 or 1 fragments defined by a preceding regular expression, not greedy
re{N}
re{N,} exactly matches n preceding expressions.
re{N, m} matches n to m times the fragment defined by the preceding regular expression, greedy way
a| b matches a or b
(RE) The G matches the expression in parentheses, and also represents a group
(? imx) The regular expression consists of three optional flags: I, M, or x . Affects only the areas in parentheses.
(?-imx) Regular Expression Close I, M, or x an optional flag. Affects only the areas in parentheses.
(?: RE) similar (...), but does not represent a group
(? imx:re) use in parentheses I, M, or x Optional Flag
(?-imx:re) do not use in parentheses I, M, or x Optional Flag
(?#...) Notes .
(? = re) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.
(?! Re) forward negative qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string
(?> re) match the standalone mode, eliminating backtracking.
\w Match Alpha-numeric
\w matches non-alphanumeric numbers
\s matches any whitespace character, equivalent to [\t\n\r\f].
\s matches any non-null character
\d matches any number, equivalent to [0-9].
\d matches any non-numeric
\a Match string start
\z matches the end of the string, if there is a newline, matches only the end string before the line break. C
\z Match string End
\g matches the position where the last match was completed.
\b matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\ n, \ t, et . matches a line break. Matches a tab character. Wait
\1...\9 matches the Sub-expression of the nth grouping.
\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.
The above is the Python regular expression basic grammar and re module of the detailed content, I hope to help everyone.
The study of regular expressions and re modules in Python development