A detailed example of using Python regex expressions

Source: Internet
Author: User
As a concept, regular expressions are not unique to Python. However, the regular expressions in Python still have some small differences in the actual usage.

This article is part of a series about Python regular expression articles. In the first article in this series, we'll focus on how to use regular expressions in Python and highlight some of the unique features in Python.

We'll cover some of the ways in Python to search and find strings. Then we talk about how to use grouping to process the children of the matching objects we find.

The modules we are interested in using in Python's regular expressions are often called ' re '.

>>> Import re

1. The original type string in Python

The Python compiler represents the escape character in a string constant with a ' \ ' (backslash).

If a backslash is followed by a string of special characters that the compiler can recognize, the entire escape sequence is replaced with the corresponding special character (for example, ' \ n ' will be replaced by the compiler as a newline character).

However, this poses a problem with using regular expressions in Python because the backslash is also used in the ' re ' module to escape special characters in the regular expression (such as * and +).

The combination of the two means that sometimes you have to escape the escape character itself (when the special character RP is recognized by both Python and the regular expression compiler), but at other times you do not have to do so (if special characters can only be recognized by the Python compiler).

Instead of putting our minds to figuring out how many backslashes we need, we can use the original string instead.

The original type string can be created simply by adding a character ' R ' before the double quotation marks of the normal string. When a string is of the original type, the Python compiler does not attempt to replace it. Essentially, you're telling the compiler not to interfere with your string at all.

>>> string = ' This was A\nnormal string ' >>> rawstring = R ' and this is A\nraw string ' >>> print St Ring

This is a normal string

>>> Print Rawstringand This is A\nraw string

This is a primitive type string.
Use regular expressions to find in Python

The ' re ' module provides several methods for the exact query of the input string. The methods we will discuss are:

Re.match () Re.search () Re.findall ()

Each method receives a regular expression and a matching string to be found. Let's look at each of these methods in more detail to figure out how they work and how they are different.

2. Use Re.match Find – match start

Let's take a look at the match () method first. The match () method works by matching the pattern at the beginning of the searched string to find the matching object.

For example, to the string ' dog cat dog ' call the Mathch () method, the lookup mode ' dog ' will match:

>>> Re.match (R ' Dog ', ' dog Cat dog ') <_sre. Sre_match object at 0xb743e720<>>> Match = Re.match (R ' Dog ', ' dog Cat dog ') >>> match.group (0) ' dog '

We'll talk more about the group () method later. Now, we just need to know that we called it with 0 as its argument, and the group () method returns the matching pattern found.

I also skipped over the returned Sre_match object, and we will soon be discussing it.

However, if we call the math () method on the same string and find the mode ' cat ', then no match will be found.

>>> Re.match (R ' Cat ', ' dog Cat dog ') >>>

3. Use Re.search to find – match any location

The search () method is similar to match (), but the search () method does not restrict us from finding matches only from the beginning of the string, so finding ' cat ' in our sample string finds a match:

Search (R ' Cat ', ' dog Cat dog ') >>> match.group (0) ' Cat '

The search () method, however, stops the search after it finds a match, so finding ' dog ' in our sample string only finds its first occurrence in the Searc () method.

>>> match = Re.search (R ' Dog ', ' dog Cat dog ') >>> match.group (0) ' dog '

4. Use re.findall– all matching objects

The most common lookup method I have used in Python so far is the FindAll () method. When we call the FindAll () method, we can simply get a list of all matching patterns instead of getting the match object (we'll talk more about the match object in the next step). It's easier for me. Calling the FindAll () method on the sample string we get:

[' Dog ', ' dog ']>>> re.findall (R ' Cat ', ' dog cat Dog ') [' Cat ']

5. Using the Match.start and Match.end methods

So what exactly is the ' match ' object previously returned to us by the search () and Match () methods?

Unlike a simple return string, the match object returned by search () and match () is actually a wrapper class about the matched substring.

Previously you saw that I could get a matching substring by calling the group () method (we'll see in the next section that the matching object is actually very useful in dealing with the grouping problem), but the matching object also contains more information about the matching substring.

For example, the match object can tell us where the matching content starts and ends in the original string:

>>> match = Re.search (R ' Dog ', ' dog Cat dog ') >>> Match.start () 0>>> match.end () 3

Knowing this information is sometimes very useful.

6. Using Mathch.group to GROUP by numbers

As I mentioned before, matching objects are handy when working with groupings.

Grouping is the ability to locate a specific substring of an entire regular expression. We can define a grouping as part of the entire regular expression, and then individually locate the content corresponding to that part.

Let's take a look at how it works:

>>> contactinfo = ' Doe, john:555-1212 '

The string I just created is like a fragment taken from someone's address book. We can match this line with a regular expression like this:

>>> Re.search (R ' \w+, \w+: \s+ ', ContactInfo) <_sre. Sre_match Object at 0xb74e1ad8<

By enclosing the specific parts of the regular expression with parentheses (characters ' (' and ') '), we can group the content and then treat the subgroups individually.

>>> match = Re.search (R ' (\w+), (\w+): (\s+) ', ContactInfo)

These groupings can be obtained by using the group () method of the grouped object. They can be positioned from left to right in the regular expression (starting at 1):

>>> match.group (1) ' Doe ' >>> match.group (2) ' John ' >>> match.group (3) ' 555-1212 '

The number of groups starting from 1 is due to the fact that the No. 0 Group is reserved to hold all matching objects (we learned the match () method and the search () method before we saw it).

>>> match.group (0) ' Doe, john:555-1212 '

7. Use Match.group to group by aliases

Sometimes, especially when a regular expression has a lot of grouping, it becomes unrealistic to locate it by the order in which the group appears. Python also allows you to specify a group name by using the following statement:

>>> match = Re.search (? P
 
  
   
  \w+), (? P
  
   
    
   \w+): (? P
   
    
     
    \s+) ', ContactInfo)
   
    
  
   
 
  

We can still use the group () method to get the grouped content, but at this point we're going to use the same set name as we specified instead of the number of bits in the group we used earlier.

>>> match.group (' last ') ' Doe ' >>> match.group (' first ') ' John ' >>> match.group (' phone ') ' 555-1212 '

This greatly enhances the clarity and readability of the code. You can imagine that when regular expressions become more complex, it becomes more and more difficult to understand what a group is capturing. Naming your group will clearly tell you and your readers about your intentions.

Although the FindAll () method does not return a grouped object, it can also use grouping. Similarly, the FindAll () method returns a collection of tuples, where the nth element in each tuple corresponds to the nth grouping in the regular expression.

>>> Re.findall (R ' (\w+), (\w+): (\s+) ', ContactInfo) [(' Doe ', ' John ', ' 555-1212 ')]

However, naming the group does not apply to the FindAll () method.

In this article we describe some of the basics of using regular expressions in Python. We learned the original string type (and there are some headaches in using regular expressions that it can help you solve). We also learned how to use the match (), search (), and FindAll () methods for basic queries, and how to use grouping to work with sub-components of matching objects.

As always, if you want to see more about this topic, the Python official documentation for the RE module is a great resource.

In a future article, we'll discuss the application of Python's expression in more depth. We'll learn more about matching objects, learn how to use them to replace them in strings, and even use them to parse Python data structures from text files.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.