As a concept, regular expressions are not unique to Python. However, the regular expression in Python still has some minor differences in actual use.
This article is part of a series of articles about Python regular expressions. In the first article in this series, we will focus on how to use regular expressions in Python and highlight some of the unique features in Python.
We'll cover some of the ways Python searches and locates strings. Then we talk about how to use grouping to handle the children of the matching objects we found.
The module for regular expressions in Python that we are interested in using is often called 're'.
1 >>> import re
1. Python primitive type string
The Python compiler uses '\' (backslash) to represent escape characters in string constants.
If the backslash is followed by a string of special characters recognized by the compiler, the entire escape sequence is replaced by the corresponding special character (for example, '\ n' is replaced by the compiler with a newline character).
But this poses a problem for using regular expressions in Python because backslashes are also used in the 're' module to escape special characters (such as * and +) in regular expressions.
There are times when you have to escape the escape character itself (when special characters can be recognized by both Python and regular expression compilers), but at other times you do not have to Do this (if special characters can only be recognized by the Python compiler).
Instead of focusing our mind on how many backslashes we need, we can use the original string instead.
Primitive type strings can be easily created by placing a single character 'r' in front of the double quotes of an ordinary string. When a string is of primitive type, the Python compiler does not attempt to make any substitutions. In essence, you are telling the compiler not to interfere with your string at all.
1http: //www.aliyun.com/zixun/aggregation/19060.html "> 234567 >>> string = 'This is a \ nnormal string' >>> rawString = r'and this is a \ nraw string >> > print string This is an ordinary string >>> print rawStringand this is a \ nraw string This is a primitive type string.
Use regular expressions in Python for lookups
're' module provides several methods for the exact input string query. We will discuss the methods are:
re.match () re.search () re.findall ()
Each method receives a regular expression and a matching string to be found. Let's examine each of these methods in more detail to understand how they work and what each one differs from others.
Find using re.match - match starts
Let's take a look at the match () method. The match () method works so that it finds matching objects only when the pattern of the searched string matches at the beginning.
For example, calling the mathch () method on the string 'dog cat dog' will look for the pattern 'dog':
12345 >>> re.match (r'dog ',' dog cat dog ') <_ sre.SRE_Match object at 0xb743e720 <>>> match = re.match (r'dog', 'dog cat dog') >>> match.group (0) 'dog'
We will talk more about the group () method later. Now, we just need to know that we called it with 0 as its argument, and the group () method returns the pattern we found.
I'm also skipping the SRE_Match object for the moment, and we'll talk about it soon.
However, if we call the math () method on the same string and look for the pattern 'cat', we will not find a match.
12 >>> re.match (r'cat ',' dog cat dog ') >>>
3. Use re.search to find - to match any location
The search () method is similar to match (), but the search () method does not limit us to find matches only from the beginning of the string, so looking up 'cat' in our sample string finds a match:
123search (r'cat ',' dog cat dog ') >>> match.group (0)' cat '
However, the search () method stops searching after it finds a match, so using the searc () method to find 'dog' in our sample string only finds its first occurrence.
123 >>> match = re.search (r'dog ',' dog cat dog ') >>> match.group (0)' dog '
Use re.findall - all matching objects
The most commonly used find method I've used in Python so far is the findall () method. When we call the findall () method, we can easily get a list of all matching patterns instead of getting the match (we'll talk more about the match in the next section). It's easier for me. Calling the findall () method on the sample string we get:
123 ['dog', 'dog'] >>> re.findall (r'cat ',' dog cat dog ') [' cat ']
5. Use match.start and match.end methods
So what exactly did the 'match' objects previously returned to us by the search () and match () methods before?
Unlike the match part of a simple return string, the "match object" returned by search () and match () is actually a wrapper class on the matching substring.
Earlier you saw that I could get a matching substring by calling the group () method (as we will see in the next section, it's actually useful when dealing with grouping problems), but matching objects also contain more information about matching Substring information.
For example, the match object can tell us where the matched content starts and ends in the original string:
12345 >>> match = re.search (r'dog ',' dog cat dog ') >>> match.start () 0 >>> match.end () 3
Knowing this information is sometimes useful.
Use mathch.group to group by number
As I mentioned earlier, matching objects are very handy when working with groups.
Grouping is the ability to locate a specific substring of the entire regular expression. We can define a group as a part of the entire regular expression, and then separately for this part of the corresponding match to locate.
Let's see how it works:
1 >>> contactInfo = 'Doe, John: 555-1212'
The string I just created is similar to a snippet taken from someone's address book. We can match this line with a regular expression like this:
12 >>> re.search (r '\ w +, \ w +: \ S +', contactInfo) <_ sre.SRE_Match object at 0xb74e1ad8 <
By surrounding specific parts of a regular expression with parentheses (the '(' and ')' characters), we can group the content and then process these subgroups separately.
1 >>> match = re.search (r '(\ w +), (\ w +): (\ S +)', contactInfo)
These groups can be obtained by using the group () method of grouping objects. They can be located (starting from 1) in numerical order from left to right in a regular expression:
123456 >>> match.group (1) 'Doe' >>> match.group (2) 'John' >>> match.group (3) '555-1212'
The reason the group ordinal number starts at 1 is because the 0th group is reserved for all matching objects (we saw it before when we learned the match () and search () methods).
12 >>> match.group (0) 'Doe, John: 555-1212'
7. Use match.group to group by alias
Occasionally, especially when there are many subgroups of a regular expression, positioning through the group's appearance becomes unrealistic. Python also allows you to specify a group name by the following statement:
1 >>> match = re.search (r '(? P <last> \ w +), (? P <first> \ w +): (? P <phone> \ S +)', contactInfo)
We can still use the group () method to get the contents of the group, but this time we want to use the group name we specified instead of the number of bits used previously.
123456 >>> match.group ('last') 'Doe' >>> match.group ('first') 'John' >>> match.group ('phone') '555-1212'
This greatly enhances the clarity and readability of the code. As you can imagine, as regular expressions get more and more complex, it becomes harder and harder to understand what a group gets to capture. Name your grouping will clearly tell you and your readers your intention.
Although the findall () method does not return group objects, it can also use grouping. Similarly, the findall () method returns a collection of tuples, where the nth element in each tuple corresponds to the Nth packet in the regular expression.
12 >>> re.findall (r '(\ w +), (\ w +): (\ S +)', contactInfo) [('Doe', 'John', '555-1212')]
However, grouping names does not apply to the findall () method.
In this article we covered some of the basics of using regular expressions in Python. We learned about the primitive string types (and the ones that help you solve some of the headaches in using regular expressions). We also learned how to use the match (), search (), and findall () methods for basic queries and how to use subgroups to handle subcomponents of matching objects.
As always, the Python official documentation for the re module is a great resource if you want to find out more about this topic.
In a future article, we'll dig deeper into the use of regular expressions in Python. We'll learn more about matching objects, learn how to use them to replace strings, and even use them to parse Python data structures from text files.