Common methods of Python's regular expression re module _python

Source: Internet
Author: User
Introduction to 1.re
The Python re module, while not satisfying all the complex matches, is sufficient to effectively parse the complex strings and extract relevant information in most cases. Python converts regular expressions into bytecode, using the C-language matching engine for depth-first matching.

Copy Code code as follows:

Import re
Print re.__doc__


You can query the function information of the RE module, the following will be combined with several examples to illustrate.

2.re Regular-expression syntax

The Regular Expression syntax table is as follows:

Grammar Significance Description
"." Any character
"^" String start ' ^hello ' matches ' HelloWorld' and does not match ' aaaahellobbb '
"$" End of string In the same vein
"*"
0 or more characters (greedy match)
<*> Matching <title>chinaunix</title>
"+"
1 or more characters (greedy match )
In the same vein
"?"
0 or more characters (greedy match )
In the same vein
*?,+?,??
The above three takes the first matching result (not greedy match ) <*> Matching <title>
{M,n}
Repeat M to n times for the previous character, {m} can also
A{6} matches 6 A, a{2,4} matches 2 to 4 a
{m,n}?
Repeat M to n times for the previous character and take as few
a{2,4} in ' Aaaaaa ' will only match 2
"\\"
Special word escape or special sequence
[]
Represents a character set [0-9], [A-Z], [a-z], [^0]
"|"
Or a| B, or operation
(...)
Match any expression in parentheses
(?#...)
Note, you can ignore
(?=...)
Matches if ... matches next, but doesn ' t consume the string.
' (? =test) ' matches in hellotest Hello
(?! ...)
Matches if ... doesn ' t match next.
'(?! =test) ' if Hello is not followed by test, match hello
(<= ...)
Matches if preceded by ... (must be fixed length).
' (? <=hello) test ' matches test in Hellotest
(?<!...)
Matches if not preceded by ... (must be fixed length).
' (? <!hello)test ' does not match test in hellotest

The regular expression Special sequence table is as follows:

Special sequence Symbols
Significance
\a
Match only at start of string
\z
Match only at end of string
\b
Matches an empty string at the beginning or end
\b
Matches an empty string that is not at the beginning or end
\d
equivalent to [0-9]
\d
equivalent to [^0-9]
\s
Match any whitespace characters: [\t\n\r\r\v]
\s
Match any non-white-space character:[^\t\n\r\r\v]
\w
Match any number and letter: [A-za-z0-9]
\w
Match any non-numeric and alphabetic: [^a-za-z0-9]

Main functional functions of 3.re

Common functional functions include: Compile, search, match, split, FindAll (Finditer), Sub (SUBN)
Compile
Re.compile (pattern[, flags])
Function: Converts regular expression syntax into regular expression objects
The flags definition includes:
Re. I: Ignore case
Re. L: Represents special character Set \w, \w, \b, \b, \s, \s dependent on the current environment
Re. M: Multi-line mode
Re. S: '. ' and include any characters, including line breaks (note: '. ' does not include line breaks '
Re. U: Special Character Set \w, \w, \b, \b, \d, \d, \s, \s dependent on Unicode character Property database

Search
Re.search (pattern, string[, flags])
Search (string[, pos[, Endpos])
Action: Finds the position in the string that matches the regular expression pattern, returns an instance of Matchobject, or none if no matching location is found.

Match
Re.match (pattern, string[, flags])
Match (string[, pos[, Endpos])
Function: the match () function attempts to match the regular expression only at the beginning of the string, which is only to report a match starting at position 0, and the search () function scans the entire string to find a match. If you want to search the entire string to find a match, you should use Search ().

Here are a few examples:
For example: The most basic usage, through re. Regexobject Object Invocation

Copy Code code as follows:

#!/usr/bin/env python
Import re
R1 = Re.compile (R ' World ')
If R1.match (' HelloWorld '):
print ' Match succeeds '
Else
print ' match fails '
If R1.search (' HelloWorld '):
print ' Search succeeds '
Else
print ' Search fails '

Explain: R is raw (original) meaning. Because there are some escape characters in the presentation string, such as carriage return ' \ n '. If you want to indicate that the table needs to be written as ' \ n '. But if I just need to express a ' \ ' + ' n ', do not use the R way to write as: ' \\n '. But using R means R ' \ n ' is so much clearer.

Example: Setting up Flag

Copy Code code as follows:

#r2 = Re.compile (R ' n$ ', re. S
#r2 = Re.compile (' \n$ ', re. S
r2 = re.compile (' world$ ', re. I)
If R2.search (' helloworld\n '):
print ' Search succeeds '
Else
print ' Search fails '

Example: direct call
Copy Code code as follows:

If Re.search (R ' ABC ', ' Helloaaabcdworldn '):
print ' Search succeeds '
Else
print ' Search fails '

Split
Re.split (Pattern, string[, maxsplit=0, flags=0])
Split (string[, maxsplit=0])
Function: You can split a string to match the part of a regular expression and return a list
Example: Simple analysis of IP

Copy Code code as follows:

#!/usr/bin/env python
Import re
R1 = re.compile (' w+ ')
Print r1.split (' 192.168.1.1 ')
Print Re.split (' (w+) ', ' 192.168.1.1 ')
Print Re.split (' (w+) ', ' 192.168.1.1 ', 1)

The results are as follows:
[' 192 ', ' 168 ', ' 1 ', ' 1 ']
[' 192 ', '. ', ' 168 ', '. ', ' 1 ', '. ', ' 1 ']
[' 192 ', '. ', ' 168.1.1 ']

FindAll
Re.findall (pattern, string[, flags])
FindAll (string[, pos[, Endpos])
Action: Find all substrings in the string that match the regular expression and form a list to return
Find [] included content (greedy and non-greedy lookup)

Copy Code code as follows:

#!/usr/bin/env python
Import re
R1 = Re.compile (' ([. *]) ')
Print Re.findall (r1, "Hello[hi]heldfsdsf[iwonder]lo")
R1 = Re.compile (' ([. *?]) ')
Print Re.findall (r1, "Hello[hi]heldfsdsf[iwonder]lo")
Print Re.findall (' [0-9]{2} ', ' FDSKFJ1323JFKDJ ')
Print Re.findall (' ([0-9][a-z]) ', "FDSKFJ1323JFKDJ")
Print Re.findall (' =www) ', "afdsfwwwfkdjfsdfsdwww")
Print Re.findall (' <=www) ', "afdsfwwwfkdjfsdfsdwww")

Finditer
Re.finditer (pattern, string[, flags])
Finditer (string[, pos[, Endpos])
Description: Similar to FindAll, find all substrings in the string that match the regular expression and form an iterator to return. Likewise regexobject are:

Sub
Re.sub (Pattern, REPL, string[, Count, flags))
Sub (REPL, string[, count=0])
Description: Finds all substrings that match the pattern of regular expressions in string strings and replaces them with another string repl. If no string matching pattern is found, a string that is not modified is returned. Repl can be either a string or a function.
Cases:

Copy Code code as follows:

#!/usr/bin/env python
Import re
p = re.compile (' (one|two|three) ')
Print p.sub (' num ', ' one word two words three words apple ', 2)

Subn
RE.SUBN (Pattern, REPL, string[, Count, flags))
Subn (Repl, string[, count=0])

Description: The function has the same function as sub (), but it also returns a new string and the number of substitutions. Likewise regexobject are:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.