Python3 How to use regular expressions gracefully (two-way)

Source: Internet
Author: User
Tags expression engine

Using regular expressions

Now let's start writing some simple regex expressions. Python provides an interface to the regular expression engine through the RE module, while allowing you to compile regular expressions into schema objects and use them for matching.

The Turtle explains: The RE module is written in C, so the efficiency is much higher than the normal string method, and the regular expression is compiled (compile) to further improve efficiency; We will often refer to "pattern" in the back, which refers to the pattern object that the regular expression is compiled into.


Compiling regular expressions

The regular expression is compiled into a schema object that has various methods for manipulating strings, such as finding pattern matching or performing string substitution.

    1. >>> Import re
    2. >>> p = re.compile (' ab* ')
    3. >>> P
    4. <_sre. Sre_pattern Object at 0x...>
Copy Code


Re.compile () can also accept the flags parameter, which is used to turn on a variety of special functions and grammatical changes, which we'll cover in the rear.

Now let's look at a simple example:

    1. >>> p = re.compile (' ab* ', re. IGNORECASE)
Copy Code

The


Regular expression is passed as a string parameter to Re.compile (). Because the regular expression is not a core part of Python, there is no special syntax support for it, so the regular expression can only be represented as a string. (Some apps don't need to use regular expressions at all, so the Python community's small partners don't think it's necessary to incorporate them into Python's core.) Instead, the RE module is only included as a C extension module in Python, like the socket module and the Zlib module. The

uses a string to denote that the regular expression preserves Python's concise, consistent style, but has some negative effects, so let's talk about it.


The troublesome backslash

in the previous article we have mentioned that regular expressions use   ' \ ' The   character character allows some ordinary characters to have special abilities (such as   \d   to match any decimal number), or the ability to deprive some special characters (such as   \[   matches the left parenthesis   ' . This conflicts with characters in the Python string that implement the same functionality.

The Turtle explains: It's a mouthful, and then you know it.

Now the situation is that you need to be in LaTeX The file uses a regular expression to match the string ' \section '. Because the backslash is a special character that needs to be matched, you need to add a backslash to the front to deprive it of its special function. So we're going to write the character of the regular expression ' \\section '.

But don't forget that Python also uses backslashes in strings to represent special meanings. So, if we want to pass ' \\section ' completely to Re.compile (), we need to add two more backslashes ...

Match character Matching phase
\section String that needs to be matched
\\section The regular expression uses ' \ \ ' to denote the match character ' \ '
"\\\\section" Unfortunately, the Python string also uses ' \ \ ' to denote the character ' \ '


In short, in order to match the backslash character, we need to use four backslashes in a string. Therefore, the frequent use of backslashes in regular expressions can cause a backslash storm, which in turn makes your string extremely difficult to understand.

The workaround is to use the original Python string to represent the regular expression (just add r in front of the string, you remember ...) ):

Regular string Raw string
"Ab*" R "Ab*"
"\\\\section" R "\\section"
"\\w+\\s+\\1" R "\w+\s+\1"


The Little Turtle explains: It is strongly recommended to use the original string to express the regular expression.


Implementation matching

When you compile the regular expression, you get a schema object. What are you going to do with him? Schema objects have many methods and properties, and we list the most important ones below:

Method Function
Match () Determines whether a regular expression matches a string from the beginning
Search () Traverse string to find the first position of a regular expression match
FindAll () Iterate through the string, find all the locations where the regular expression matches, and return as a list
Finditer () Iterates through a string, finds all locations where the regular expression matches, and returns as an iterator


If no match is found, match () and search () returns None, and if the match succeeds, a match object is returned with all matching information: for example, where to start, where to end, matching substrings, and so on.


Let's walk through the following steps:

    1. >>> Import re
    2. >>> p = re.compile (' [a-z]+ ')
    3. >>> P
    4. Re.compile (' [a-z]+ ')
Copy Code


Now, you can try using regular Expressions [a-z]+ to match the various strings.

For example:

    1. >>> P.match ("")
    2. >>> Print (P.match (""))
    3. None
Copy Code


An empty string cannot be matched because the + represents a match one or more times. Therefore, match () returns None.

Let's try another string that matches:

    1. >>> m = p.match (' FISHC ')
    2. >>> m
    3. <_sre. Sre_match object; Span= (0, 5), match= ' FISHC ' >
Copy Code


In this example, match () returns a matching object, which we store in the variable m for later use.


Let's take a look at what's inside the matching object. The matching object contains many methods and properties, the following are the most important:

Method Function
Group () Returns a matching string
Start () Returns the starting position of the match
End () Returns the end position of the match
Span () Returns a tuple representing the matching location (start, end)


You see:

    1. >>> M.group ()
    2. ' FISHC '
    3. >>> M.start ()
    4. 0
    5. >>> M.end ()
    6. 5
    7. >>> M.span ()
    8. (0, 5)
Copy Code


Start () always returns 0 because match () checks only if the regular expression matches the starting position of the string.

However, the search () method can be different:

    1. >>> Print (P.match (' ^_^ FISHC '))
    2. None
    3. >>> m = P.search (' ^_^ FISHC ')
    4. >>> Print (m)
    5. <_sre. Sre_match object; Span= (3, 8), match= ' FISHC ' >
    6. >>> M.group ()
    7. ' FISHC '
    8. >>> M.span ()
    9. (3, 8)
Copy Code


In practical applications, the most common way is to store matching objects in a local variable and check that their return value is None.

The form is usually as follows:

    1. p = re.compile (...)
    2. m = P.match (' string goes here ')
    3. If M:
    4. Print (' Match found: ', M.group ())
    5. Else
    6. Print (' No match ')
Copy Code


There are two ways to return all matching results, one is findall () and the other is Finditer ().

FindAll () returns a list:

    1. >>> p = re.compile (' \d+ ')
    2. >>> P.findall (' 3 Little Turtle, 15 legs, where is the extra 3? ‘)
    3. [' 3 ', ' 15 ', ' 3 ']
Copy Code


FindAll () needs to create a list before returning, and Finditer () returns the matching object as an iterator:

    1. >>> iterator = P.finditer (' 3 Little Turtle, 15 legs, and 3. ‘)
    2. >>> iterator
    3. <callable_iterator Object at 0x10511b588>
    4. >>> for match in iterator:
    5. Print (Match.span ())
    6. (0, 1)
    7. (6, 8)
    8. (13, 14)
Copy Code


The turtle explains: If the list is large, then the efficiency of the return iterator is much higher. For an iterator, see: "0 basic Beginner Learning python" 048 | Magic Method: Iterators

Python3 How to use regular expressions gracefully (two-way)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.