The study of regular expressions and re modules in Python development

Source: Internet
Author: User

Regular expressions are a great thing, whether in JavaScriptor in Python Web development (http://www.maiziedu.com/course/python-px/ , we all encounter regular expressions, although JavaScript and Python have little difference in regular expressions, but regular expressions are an essential part of Python, so let's talk about the re module in python today .

The RE module contains support for regular expressions.

What is a regular:
Regular expressions are patterns that can match text fragments.
Regular expression ' python ' can match ' python '

Wildcard characters
. indicates that any character matches:
'. Ython ' can match ' python ' and ' Fython '

To escape a special character:
' python\.org ' matches ' python.org '

Character
' [Pj]ython ' can match ' python ' and ' Jython '

Inverse Character Set
' [^ABC] ' can match any character except ABC

Selection character
Using pipe symbols |

Options available
Add hello to the option:
R ' (HTTP//)? (www.)? Python.org ' can only match the following types:

' Http://www.python.org '

' Http://python.org '

' Www.python.org '

' Python.org '

Repeating sub-mode
*: Allow mode to repeat 0 or more times
+: Allow mode to repeat 1 or more times
{m, n} Allow mode repetition M-n Times

Of course, there are many regular grammatical rules, much more than these. But we can only donuts, because the purpose of this blog is to introduce the module in Python , there module.

The RE module enables the Python language to have all the regular expression functionality.
The compile function generates a regular expression object based on a pattern string and an optional flag parameter. The object has a series of methods for regular expression matching and substitution.
The RE module also provides functions that are fully consistent with these methods, which use a pattern string as their first parameter.

important functions in Re:

Compile (pattern[, flags]) creates a pattern object based on a string containing a regular expression

Search (pattern, string[, flags]) searching for patterns in strings

Match (pattern, string[, flags]) matches the pattern at the beginning of the string split (pattern, string[, maxsplit=0]) splits the string based on the match

FindAll (Pattern, string) lists all occurrences of a pattern in a string sub (PAT, Rep, string[, count=0]) all Pat in string matches with the Repl Replace

Escape (String) escapes all special expression characters in a string

The following is a simple application:

Use match

Import re

Print (Re.match (' www ', ' www.runoob.com '). span ()) # matches at the starting position

Print (Re.match (' com ', ' www.runoob.com ')) # does not match at start position

Using search

Import reprint (Re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position (re.search (' com ', ' www.runoob.com ' ). span ()) # does not match at the start position

What's the difference between match and search when you need to stop?
Look at the results first:

Results from the match example:

(0, 3) None

results from the search example:

(0, 3)

(11, 14)

The match () function only detects if the RE is matched at the start of the string , andsearch () scans the entire string find matches;
That is, Match () returns only if the match () is successful at 0 position, Match() If it is not a successful start position match will return none.

Search () scans the entire string and returns the first successful match.

Using Sub
The Python re module provides re.sub to replace matches in a string.

#!/usr/bin/pythonimport RE

Phone = "2004-959-559 # This is Phone number"

# Delete Python-style Comments

num = re.sub (R ' #.*$ ', "", phone) print "Phone num:", num

# Remove anything other than digits

num = re.sub (R ' \d ', "", phone) print "Phone num:", num

Results:

Phone num:2004-959-559

Phone num:2004959559

The last chrysanthemum:

^ matches the beginning of the string

$ matches the end of the string.

. matches any character, except for line breaks, when Re. When the Dotall tag is specified, it can match any character that includes a line feed.

[...] used to represent a set of characters , listed separately: [AMK] match ' A ', ' m ' or ' K '

[^...] not in Characters in [] :[^ABC] matches characters other than a,b,c.

re* matches 0 or more expressions.

re+ matches 1 or more expressions.

Re? matches 0 or 1 fragments defined by a preceding regular expression, not greedy

re{N}

re{N,} exactly matches n preceding expressions.

re{N, m} matches n to m times the fragment defined by the preceding regular expression, greedy way

a| b matches a or b

(RE) The G matches the expression in parentheses, and also represents a group

(? imx) The regular expression consists of three optional flags: I, M, or x . Affects only the areas in parentheses.

(?-imx) Regular Expression Close I, M, or x an optional flag. Affects only the areas in parentheses.

(?: RE) similar (...), but does not represent a group

(? imx:re) use in parentheses I, M, or x Optional Flag

(?-imx:re) do not use in parentheses I, M, or x Optional Flag

(?#...) Notes .

(? = re) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.

(?! Re) forward negative qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string

(?> re) match the standalone mode, eliminating backtracking.

\w Match Alpha-numeric

\w matches non-alphanumeric numbers

\s matches any whitespace character, equivalent to [\t\n\r\f].

\s matches any non-null character

\d matches any number, equivalent to [0-9].

\d matches any non-numeric

\a Match string start

\z matches the end of the string, if there is a newline, matches only the end string before the line break. C

\z Match string End

\g matches the position where the last match was completed.

\b matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.

\b matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.

\ n, \ t, et . matches a line break. Matches a tab character. Wait

\1...\9 matches the Sub-expression of the nth grouping.

\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.

The above is the Python regular expression basic grammar and re module of the detailed content, I hope to help everyone.

The study of regular expressions and re modules in Python development

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.