Python's re regular expression module learning

Source: Internet
Author: User

How to use the RE module in Python

The Python re module (Regular expression regex) provides a variety of matching operations for regular expressions, which is a useful tool for text parsing, complex string parsing, and information extraction, and I summarize the common methods of re.

Introduction to 1.re
Using the Python re module, while not satisfying all the complex matching scenarios, is sufficient in most cases to effectively parse the complex string and extract relevant information. Python translates the regular expression into bytecode and uses the C language matching engine for depth-first matching.

    1. Import re
    2. Print re.__doc__

You can query the function information of the RE module, which is illustrated in several examples below.

Regular expression syntax for 2.re
The Regular Expression syntax table is as follows:

Grammar Significance Description
"." Any character
"^" String start ' ^hello ' matches ' HelloWorld ' and does not match ' aaaahellobbb '
"$" End of string In the same vein
"*" 0 or more characters (greedy match) <*> Matching <title>chinaunix</title>
"+" 1 or more characters (greedy match) In the same vein
"?" 0 or more characters (greedy match) In the same vein
*?,+?,?? Above three take the first matching result (non-greedy match) <*> Matching <title>
{M,n} Repeat m to n for the previous character, {m} can also A{6} matches 6 A, a{2,4} matches 2 to 4 a
{m,n}? Repeat m to n for the first character and take as little as possible ' Aaaaaa ' in a{2,4} will only match 2
"\\" Special character escapes or special sequences
[] Represents a character set [0-9], [A-z], [A-z], [^0]
"|" Or a| B, or arithmetic
(...) Match any expression in parentheses
(?#...) Annotations, which can be ignored
(?=...) Matches If ... Matches next, but doesn ' t consume the string. ' (? =test) ' Matches hello in hellotest
(?! ...) Matches If ... doesn ' t match next. ‘(?! =test) ' If Hello is not behind test, match hello
(? <= ...) Matches if preceded by ... (must be fixed length). ' (? <=hello) test ' matches test in Hellotest
(?<!...) Matches if not preceded by ... (must be fixed length). ' (? <!hello) test ' does not match test in hellotest

The regular expression special sequence list is as follows:

matches an empty string at the beginning or end of
special sequence symbol meaning
\a matches only at the beginning of the string
\z match only at the end of the string
\b
\b matches an empty string that is not at the beginning or end
\d equals [0 -9]
\d equals [^0-9]
\s matches any whitespace character: [\t\n\r\r\v]
\s matches any non-whitespace character: [^\t\n\r\r\v]
\w matches any number and letter: [a-za- Z0-9]
\w matches any non-number and letter: [^a-za-z0-9]

The main function function of 3.re
Common function functions include: Compile, search, match, split, FindAll (Finditer), Sub (SUBN)
Compile
Re.compile (pattern[, flags])
Function: Converts regular expression syntax into regular expression objects
The flags definition includes:
Re. I: Ignore case
Re. L: Represents a special character set \w, \w, \b, \b, \s, \s dependent on the current environment
Re. M: Multi-line mode
Re. S: '. ' and any character including newline characters (note: '. ' Do not include newline characters.
Re. U: Represents a special character set \w, \w, \b, \b, \d, \d, \s, \s dependent on Unicode character Property database
More usage can be found on http://www.devexception.com/sitemap_index.xml
Search
Re.search (pattern, string[, flags])
Search (string[, pos[, Endpos])
Function: Finds the position in the string that matches the regular expression pattern, returns an instance of Matchobject, or none if no matching position is found.

Match
Re.match (pattern, string[, flags])
Match (string[, pos[, Endpos])
Function: the match () function attempts to match the regular expression only at the beginning of the string, that is, only the match that starts at position 0 is reported, and the search () function scans the entire string to find a match. If you want to search the entire string for a match, you should use Search ().

Here are a few examples:
Example: The most basic usage, through re. Regexobject Object Invocation

    1. #!/usr/bin/env python
    2. Import re
    3. R1 = Re.compile (R ' World ')
    4. If R1.match (' HelloWorld '):
    5. print ' Match succeeds '
    6. Else
    7. print ' match fails '
    8. If R1.search (' HelloWorld '):
    9. print ' Search succeeds '
    10. Else
    11. print ' Search fails '

Note: R is the meaning of raw (raw). Because there are some escape characters in the representation string, such as the carriage return ' \ n '. If you want to indicate \ table needs to be written as ' \ \ '. But if I just need to represent a ' \ ' + ' n ', do not use the R method to write: ' \\n '. But using R means R ' \ n ' is much clearer.

Example: Setting flag

    1. #r2 = Re.compile (R ' n$ ', re. S
    2. #r2 = Re.compile (' \n$ ', re. S
    3. r2 = re.compile (' world$ ', re. I)
    4. If R2.search (' helloworld\n '):
    5. print ' Search succeeds '
    6. Else
    7. print ' Search fails '

Example: calling Directly

    1. If Re.search (R ' ABC ', ' Helloaaabcdworldn '):
    2. print ' Search succeeds '
    3. Else
    4. print ' Search fails '

Split
Re.split (Pattern, string[, maxsplit=0, flags=0])
Split (string[, maxsplit=0])
Function: The part of a string matching regular expression can be split open and return a list
Example: Simple analysis IP

    1. #!/usr/bin/env python
    2. Import re
    3. R1 = re.compile (' w+ ')
    4. Print r1.split (' 192.168.1.1 ')
    5. Print Re.split (' (w+) ', ' 192.168.1.1 ')
    6. Print Re.split (' (w+) ', ' 192.168.1.1 ',
      1)

The results are as follows:
[' 192 ', ' 168 ', ' 1 ', ' 1 ']
[' 192 ', '. ', ' 168 ', '. ', ' 1 ', '. ', ' 1 ']
[' 192 ', '. ', ' 168.1.1 ']

FindAll
Re.findall (pattern, string[, flags])
FindAll (string[, pos[, Endpos])
Function: Finds all substrings that match the regular expression in the string and makes up a list to return
Example: Find [] What's included (greedy and non-greedy lookups)

    1. #!/usr/bin/env python
    2. Import re
    3. R1 = Re.compile (' ([. *]) ')
    4. Print Re.findall (r1, "Hello[hi]heldfsdsf[iwonder]lo")
    5. R1 = Re.compile (' ([. *?]) ')
    6. Print Re.findall (r1, "Hello[hi]heldfsdsf[iwonder]lo")
    7. Print Re.findall (' [0-9]{2} ', ' FDSKFJ1323JFKDJ ')
    8. Print Re.findall (' ([0-9][a-z]) ', "FDSKFJ1323JFKDJ")
    9. Print Re.findall (' (? =www) ', "afdsfwwwfkdjfsdfsdwww")
    10. Print Re.findall (' (? <=www) ', "afdsfwwwfkdjfsdfsdwww")

Finditer
Re.finditer (pattern, string[, flags])
Finditer (string[, pos[, Endpos])
Description: Similar to FindAll, finds all substrings that match the regular expression in the string and makes up an iterator to return. The same regexobject are:

Sub
Re.sub (Pattern, REPL, string[, Count, flags])
Sub (REPL, string[, count=0])
Description: Finds all substrings matching the regular expression pattern in string strings and replaces them with another string repl. If no string matching the pattern is found, a string that has not been modified is returned. Repl can be either a string or a function.
Cases:

    1. #!/usr/bin/env python
    2. Import re
    3. p = re.compile (' (one|two|three) ')
    4. Print p.sub (' num ', ' one word, ' words three words
      Apple ', 2)

Subn
RE.SUBN (Pattern, REPL, string[, Count, flags])
Subn (Repl, string[, count=0])
Description: The function has the same function as a sub (), but it also returns the new string and the number of substitutions. The same regexobject are:

Python's re regular expression module learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.