As a concept, regular expressions are not unique to Python. However, regular expressions in Python have some minor differences in actual use.
This article is part of a series of articles about Python regular expressions. In the first article in this series, we'll focus on using regular expressions in Python and highlighting some of the unique features in Python.
We'll introduce some of the ways in Python to search for and find strings. Then we'll talk about how to use grouping to handle the subkeys of the matching objects we find.
The modules of Python's regular expressions that we are interested in using are often called ' re '.
>>> import re
1. The original type string in Python
The Python compiler is a ' \ ' (backslash) to represent an escape character in a string constant.
If the backslash is followed by a special character that a string of compilers can recognize, the entire escape sequence is replaced with the corresponding special character (for example, ' \ n ' will be replaced by the compiler with a newline character).
But this poses a problem for using regular expressions in Python, because the backslash is also used in the ' re ' module to escape special characters (such as * and +) in regular expressions.
The combination of these two ways means that sometimes you have to escape the escape character itself (when special characters are RP by Python and regular expression compilers), but at other times you don't have to do this (if special characters can only be identified by the Python compiler).
Instead of putting our minds to figuring out how many backslashes are needed, we can use the original string instead.
The original type string can be created simply by adding a character ' R ' before the double quotation mark in the normal string. When a string is of the original type, the Python compiler
No substitutions are attempted on it. Essentially, you're telling the compiler not to interfere with your strings at all.
>>> string = ' This are a\nnormal string '
>>> rawstring = R ' and this is A\nraw string '
>>> ; Print string
This is an ordinary string
>>> Print rawstring and this is
A\nraw string
This is an original type string.
Using regular expressions in Python to find
The ' re ' module provides several methods for exactly querying the input string. The methods we will be discussing are:
Re.match ()
Re.search ()
Re.findall ()
Each method receives a regular expression and a string to find a match. Let's look at each of these methods in more detail to figure out how they work and how different they are.
2. Use Re.match Lookup – match start
Let's take a look at the match () method first. The match () method works by finding a matching object only when the start of the search string matches the pattern.
For example, the Mathch () method is invoked on the string ' Dog cat dog ', and the lookup pattern ' dog ' will match:
>>> Re.match (R ' Dog ', ' dog Cat dog ')
<_sre. Sre_match object at 0xb743e720<
>>> Match = Re.match (R ' Dog ', ' dog Cat dog ')
>>> Match.group (0)
' Dog
We'll talk more about the group () method later. Now we just need to know that we called it with 0 as its arguments, and the group () method returns the matching pattern found.
I'm also going to skip over the returned Sre_match object and we'll discuss it soon.
However, if we call the math () method on the same string, find the pattern ' cat ', the match will not be found.
>>> Re.match (R ' Cat ', ' dog Cat dog ')
>>>
3. Use Re.search Lookup – match any location
The search () method is similar to match (), but the search () method does not restrict us from finding matches only from the beginning of the string, so finding ' cat ' in our sample string finds a match:
Search (R ' Cat ', ' dog Cat dog ')
>>> match.group (0)
' cat '
However, the search () method stops searching after it finds a match, so finding ' dog ' in our sample string using the Searc () method finds only its first occurrence.
>>> match = Re.search (R ' Dog ', ' dog Cat dog ')
>>> match.group (0)
' dog '
4. Use re.findall– all matching objects
The most common search method I've used so far in Python is the FindAll () method. When we call the FindAll () method, we can very simply get a list of all the matching patterns, rather than get the match object (we'll talk more about the match object in the next step). It's easier for me. Call the FindAll () method on the sample string we get:
[' Dog ', ' dog ']
>>> Re.findall (R ' Cat ', ' dog Cat dog ')
[' Cat ']
5. Using Match.start and Match.end methods
So what did the previous search () and Match () methods have previously returned to our ' match ' object?
Unlike the simple matching part of the return string, search () and match () return the matching object, which is actually a wrapper class about the matching substring.
Previously you saw that I could get a matching substring by calling the group () method (we'll see in the next section that the fact that the matching object is useful when dealing with group problems), but the matching object also contains more information about the matching substring.
For example, the match object can tell us where the matching content begins and ends in the original string:
>>> match = Re.search (R ' Dog ', ' dog Cat dog ')
>>> match.start ()
>>> match.end ()
Knowing this information is sometimes very useful.
6. Use Mathch.group to GROUP by numbers
As I mentioned earlier, matching objects are handy when working with groups.
Grouping is the ability to position a specific substring of an entire regular expression. We can define a grouping as part of the entire regular expression, and then individually locate the content that corresponds to that part.
Let's take a look at how it works:
>>> contactInfo = 'Doe, John: 555-1212'
The string I just created resembles a fragment taken out of someone's address book. We can match this line with a regular expression:
>>> Re.search (R ' \w+, \w+: \s+ ', ContactInfo)
<_sre. Sre_match Object at 0xb74e1ad8<
By enclosing the specific parts of the regular expression with parentheses (characters ' and '), we can group the content and then separate the subgroups.
>>> match = re.search(r'(\w+), (\w+): (\S+)', contactInfo)
These groupings can be obtained by using the group () method of the grouped objects. They can be positioned from the numeric order in which they appear from left to right in the regular expression (starting from 1):
>>> Match.group (1)
' Doe '
>>> match.group (2)
' John '
>>> match.group (3)
' 555-1212 '
The reason the group's ordinal number starts at 1 is because the No. 0 Group is reserved for all matching objects (we learned the match () method and the search () method before we saw it).
>>> match.group (0)
' Doe, john:555-1212 '
7. Use Match.group to group by alias
Sometimes, especially when a regular expression has many groupings, it becomes impractical to locate the group by its order of occurrence. Python also allows you to specify a group name by using the following statement:
>>> match = re.search(r'(?P<last>\w+), (?P<first>\w+): (?P<phone>\S+)', contactInfo)
We can still use the group () method to get the grouped content, but at this point we want to use the name we specify rather than the number of places the group used previously.
>>> match.group (' last ')
' Doe '
>>> match.group (' i ')
' John '
>>> Match.group (' phone ')
' 555-1212 '
This greatly enhances the clarity and readability of the code. You can imagine that when regular expressions become more complex, it becomes more and more difficult to understand what a group is and what it captures. The naming of your group will clearly tell you and your readers your intentions.
Although the FindAll () method does not return a grouped object, it can also use grouping. Similarly, the FindAll () method returns a set of tuples in which the nth element in each tuple corresponds to the nth grouping in the regular expression.
>>> Re.findall (R ' (\w+), (\w+): (\s+) ', ContactInfo
] [(' Doe ', ' John ', ' 555-1212 ')]
However, naming a group does not apply to the FindAll () method.
In this article we introduce some of the basics of using regular expressions in Python. We learned about the original string type (and what it can do to solve some headaches in using regular expressions). We also learned how to use match (), search (), and FindAll () methods for basic queries, and how to use groupings to handle child components of matching objects.
As always, if you want to see more about this topic, the Python official documentation for the RE module is a very good resource.
In future articles, we'll discuss the application of Python's regular expressions in more depth. We'll learn more about matching objects, learn how to use them to replace strings, and even use them to parse Python data structures from text files.