Python3 How to use regular expressions gracefully (two-way)

Last Update:2015-01-12 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Using regular expressions

Now let's start writing some simple regex expressions. Python provides an interface to the regular expression engine through the RE module, while allowing you to compile regular expressions into schema objects and use them for matching.

The Turtle explains: The RE module is written in C, so the efficiency is much higher than the normal string method, and the regular expression is compiled (compile) to further improve efficiency; We will often refer to "pattern" in the back, which refers to the pattern object that the regular expression is compiled into.

Compiling regular expressions

The regular expression is compiled into a schema object that has various methods for manipulating strings, such as finding pattern matching or performing string substitution.

>>> Import re
>>> p = re.compile (' ab* ')
>>> P
<_sre. Sre_pattern Object at 0x...>

Copy Code

Re.compile () can also accept the flags parameter, which is used to turn on a variety of special functions and grammatical changes, which we'll cover in the rear.

Now let's look at a simple example:

>>> p = re.compile (' ab* ', re. IGNORECASE)

Copy Code

The

Regular expression is passed as a string parameter to Re.compile (). Because the regular expression is not a core part of Python, there is no special syntax support for it, so the regular expression can only be represented as a string. (Some apps don't need to use regular expressions at all, so the Python community's small partners don't think it's necessary to incorporate them into Python's core.) Instead, the RE module is only included as a C extension module in Python, like the socket module and the Zlib module. The

uses a string to denote that the regular expression preserves Python's concise, consistent style, but has some negative effects, so let's talk about it.

The troublesome backslash

in the previous article we have mentioned that regular expressions use ' \ ' The character character allows some ordinary characters to have special abilities (such as \d to match any decimal number), or the ability to deprive some special characters (such as \[ matches the left parenthesis ' . This conflicts with characters in the Python string that implement the same functionality.

The Turtle explains: It's a mouthful, and then you know it.

Now the situation is that you need to be in LaTeX The file uses a regular expression to match the string ' \section '. Because the backslash is a special character that needs to be matched, you need to add a backslash to the front to deprive it of its special function. So we're going to write the character of the regular expression ' \\section '.

But don't forget that Python also uses backslashes in strings to represent special meanings. So, if we want to pass ' \\section ' completely to Re.compile (), we need to add two more backslashes ...

Match character	Matching phase
\section	String that needs to be matched
\\section	The regular expression uses ' \ \ ' to denote the match character ' \ '
"\\\\section"	Unfortunately, the Python string also uses ' \ \ ' to denote the character ' \ '

In short, in order to match the backslash character, we need to use four backslashes in a string. Therefore, the frequent use of backslashes in regular expressions can cause a backslash storm, which in turn makes your string extremely difficult to understand.

The workaround is to use the original Python string to represent the regular expression (just add r in front of the string, you remember ...) ）：

Regular string	Raw string
"Ab*"	R "Ab*"
"\\\\section"	R "\\section"
"\\w+\\s+\\1"	R "\w+\s+\1"

The Little Turtle explains: It is strongly recommended to use the original string to express the regular expression.

Implementation matching

When you compile the regular expression, you get a schema object. What are you going to do with him? Schema objects have many methods and properties, and we list the most important ones below:

Method	Function
Match ()	Determines whether a regular expression matches a string from the beginning
Search ()	Traverse string to find the first position of a regular expression match
FindAll ()	Iterate through the string, find all the locations where the regular expression matches, and return as a list
Finditer ()	Iterates through a string, finds all locations where the regular expression matches, and returns as an iterator

If no match is found, match () and search () returns None, and if the match succeeds, a match object is returned with all matching information: for example, where to start, where to end, matching substrings, and so on.

Let's walk through the following steps:

>>> Import re
>>> p = re.compile (' [a-z]+ ')
>>> P
Re.compile (' [a-z]+ ')

Copy Code

Now, you can try using regular Expressions [a-z]+ to match the various strings.

For example:

>>> P.match ("")
>>> Print (P.match (""))
None

Copy Code

An empty string cannot be matched because the + represents a match one or more times. Therefore, match () returns None.

Let's try another string that matches:

>>> m = p.match (' FISHC ')
>>> m
<_sre. Sre_match object; Span= (0, 5), match= ' FISHC ' >

Copy Code

In this example, match () returns a matching object, which we store in the variable m for later use.

Let's take a look at what's inside the matching object. The matching object contains many methods and properties, the following are the most important:

Method	Function
Group ()	Returns a matching string
Start ()	Returns the starting position of the match
End ()	Returns the end position of the match
Span ()	Returns a tuple representing the matching location (start, end)

You see:

>>> M.group ()
' FISHC '
>>> M.start ()
0
>>> M.end ()
5
>>> M.span ()
(0, 5)

Copy Code

Start () always returns 0 because match () checks only if the regular expression matches the starting position of the string.

However, the search () method can be different:

>>> Print (P.match (' ^_^ FISHC '))
None
>>> m = P.search (' ^_^ FISHC ')
>>> Print (m)
<_sre. Sre_match object; Span= (3, 8), match= ' FISHC ' >
>>> M.group ()
' FISHC '
>>> M.span ()
(3, 8)

Copy Code

In practical applications, the most common way is to store matching objects in a local variable and check that their return value is None.

The form is usually as follows:

p = re.compile (...)
m = P.match (' string goes here ')
If M:
Print (' Match found: ', M.group ())
Else
Print (' No match ')

Copy Code

There are two ways to return all matching results, one is findall () and the other is Finditer ().

FindAll () returns a list:

>>> p = re.compile (' \d+ ')
>>> P.findall (' 3 Little Turtle, 15 legs, where is the extra 3? ‘)
[' 3 ', ' 15 ', ' 3 ']

Copy Code

FindAll () needs to create a list before returning, and Finditer () returns the matching object as an iterator:

>>> iterator = P.finditer (' 3 Little Turtle, 15 legs, and 3. ‘)
>>> iterator
<callable_iterator Object at 0x10511b588>
>>> for match in iterator:
Print (Match.span ())
(0, 1)
(6, 8)
(13, 14)

Copy Code

The turtle explains: If the list is large, then the efficiency of the return iterator is much higher. For an iterator, see: "0 basic Beginner Learning python" 048 | Magic Method: Iterators

Python3 How to use regular expressions gracefully (two-way)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More