The basic syntax of regular expressions for Python basic teaching and the re-module

Source: Internet
Author: User
What is a regular:

Regular expressions are patterns that can match text fragments.

Regular expression ' python ' can match ' python '

The regular is a very good thing, of course, Python is not missing.

So today's python is going to talk to you about the RE module in Python.

The RE module contains support for regular expressions.

Wildcard characters

. Indicates that any character matches:

'. Ython ' can match ' python ' and ' Fython '

To escape a special character:

' python\.org ' matches ' python.org '

Character

' [Pj]ython ' can match ' python ' and ' Jython '

Inverse Character Set

' [^ABC] ' can match any character except ABC

Selection character

Using Pipe symbols |

options available

Add hello to the option:

R ' (HTTP//)? (www.)? Python.org ' can only match the following types:

' Http://www.python.org '
' Http://python.org '
' Www.python.org '
' Python.org '

Repeating sub-mode

*: Allow mode to repeat 0 or more times
+: Allow mode to repeat 1 or more times
{m, n} allow mode to repeat m-n times

Of course, there are many regular grammatical rules, much more than these. But we can only donuts, because the purpose of this blog is to introduce the module in Python, the RE module.

The RE module enables the Python language to have all the regular expression functionality.

The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution.

The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.

Important functions in Re:

Compile (pattern[, flags]) creates a pattern object based on a string containing a regular expression

Search (pattern, string[, flags]) searching for patterns in strings

Match (pattern, string[, flags]) matches the pattern at the beginning of the string

Split (pattern, string[, maxsplit=0]) splits a string based on a match

FindAll (Pattern, string) lists all occurrences of a pattern in a string

Sub (PAT, Rep, string[, Count=0]) the match for all Pat in the string is replaced with REPL

Escape (String) escapes all special expression characters in a string

The following is a simple application:

Use match

Import reprint (Re.match (' www ', ' www.runoob.com '). span ()) # Match print at start (re.match (' com ', ' www.runoob.com ') # does not match at start position

Using Search

Import reprint (Re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position (re.search (' com ', ' www.runoob.com '). span ()) # does not match at start position

What's the difference between match and search when you need to stop?

Look at the results first:

Results from the match example:

(0, 3)
None

Results from the search example:

(0, 3)
(11, 14)

The match () function only detects if the re is matched at the beginning of the string, and search () scans the entire string to find the match;
That is, match () returns only if the 0-bit match succeeds, and the match () returns none if the match is not successful at the start position.

Search () scans the entire string and returns the first successful match.

Using Sub

The Python re module provides re.sub to replace matches in a string.

#!/usr/bin/pythonimport Rephone = "2004-959-559 # This is Phone number" # Delete Python-style commentsnum = re.sub (R ' #.*$ ', "", phone) print "Phone num:", num# Remove anything other than Digitsnum = Re.sub (R ' \d ', "", phone) print "Phone num:" Num

Results:

Phone num:2004-959-559
Phone num:2004959559

The last chrysanthemum:

^ matches the beginning of the string
$ matches the end of the string.
. Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.
[...] Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '
[^...] Characters not in []: [^ABC] matches characters other than a,b,c.
re* matches 0 or more expressions.
Re+ matches 1 or more expressions.
Re? Matches 0 or 1 fragments defined by a preceding regular expression, not greedy
re{N}
re{N,} exactly matches n preceding expressions.
re{N, m} matches N to M times the fragment defined by the preceding regular expression, greedy way
a| b matches A or b
(RE) The G matches the expression in parentheses, and also represents a group
The (? IMX) Regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.
(?-imx) Regular expression Close I, M, or x optional flag. Affects only the areas in parentheses.
(?: RE) similar (...), but does not represent a group
(? imx:re) use I, M, or x optional flag in parentheses
(?-imx:re) do not use I, M, or x optional flags in parentheses
(?#...) Comments.
(? = RE) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.
(?! Re) forward negative delimiter. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string
(?> re) matches the standalone mode, eliminating backtracking.
\w Match Alpha-numeric
\w matches non-alphanumeric numbers
\s matches any whitespace character, equivalent to [\t\n\r\f].
\s matches any non-null character
\d matches any number, equivalent to [0-9].
\d matches any non-numeric
\a Match string start
\z matches the end of the string, if there is a newline, matches only the end string before the line break. C
\z Match string End
\g matches the position where the last match was completed.
\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\ n, \ t, and so on. Matches a line break. Matches a tab character. such as
\1...\9 matches the sub-expression of the nth grouping.
\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.

Regular expression syntax for re

The Regular Expression syntax table is as follows:

Grammar Significance Description
". " any character
" ^ " string start ^hello ' Match ' HelloWorld " without matching ' aaaahellobbb "
" $ " end of string
" * "
0 or more characters (greedy match)
<*> match
" + "
1 or more characters (greedy match )
and the same as the
"? "
0 or more characters (greedy match )
and the same as the
*?,+?,??
above three take the first matching result (non-greedy match ) <*> Matching
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.