Python re Regular Expression and pythonre Regular Expression
1. Brief Introduction
Regular Expressions are a small and highly specialized programming language. They are not unique in python and are a basic and important part of many programming languages. In python, the re module is mainly used for implementation.
The regular expression mode is compiled into a series of bytecode and then executed by the matching engine written in c. What are the common use cases of regular expressions?
- For example, specify a rule for the corresponding string set to be matched;
- This string set can be an e-mail address, Internet address, phone number, or custom string set as needed;
- Of course, you can also determine whether a string set meets our defined matching rules;
- Find the part of the string that matches the rule;
- Modify and cut a series of text processing;
- ......
Ii. special symbols and characters (metacharacters)
Here we will introduce some common metacharacters, which give the regular expression powerful functions and flexibility. Table 2-1 lists common symbols and characters.
Iii. Regular Expressions 1. Use the compile () function to compile regular expressions
Because the python code is eventually translated into bytecode and then executed on the interpreter. Therefore, it is easier to execute regular expressions that are frequently used in our code for pre-compilation.
Most functions in the re module have the same name and function as the methods of compiled regular expression objects and regular expression matching objects.
Example:
>>> Import re >>> r1 = r'buckets' # adding the "r" backslash before the string won't be processed in any special way. This is a habit, although it is not used here> re. findall (r1, 'bugsbunny ') # directly use the re module to perform an interpreted match ['buckets'] >>>>> r2 = re. compile (r1) # If the matching rule r1 is frequently used, pre-compile it to improve efficiency> r2 # compiled regular object <_ sre. SRE_Pattern object at 0x7f5d7db99bb0 >>>>>> r2.findall ('bugsbunny ') # The matching result obtained by the findall method of the access object is consistent with the above ['buckets'] # So, most functions in the re module have the same name and function as the methods of compiled regular expression objects and regular expression matching objects.
The re. compile () function also accepts optional flag parameters, which are often used to implement different special functions and syntax changes. These flags can also be used as parameters for most re module functions. These marks can be combined with the operator (|.
Example:
>>> Import re >>> r1 = r'bugs '>>> r2 = re. compile (r1, re. i) # The case-insensitive flag is selected here. The complete one is re. IGNORECASE, which is short for re. I >>> r2.findall ('bucketsbonny ') ['buckets'] # re. s. match All characters including linefeeds # re. M multi-line Matching. The hero ^ and $ # re and X are used to make the regular expression match mode clearer.
For a complete list of flag parameters and usage, refer to the relevant official documentation.
2. Use a regular expression
The re module provides an interface for the Regular Expression Engine. The following describes some common functions and methods.
- Matching object and group () and groups () Methods
When processing a regular expression, in addition to a regular expression object, there is also an object type: matching object. These are the objects returned by successfully calling match () or search. There are two main methods for matching objects: group () and group ().
Group () either returns the entire matching object or the special sub-group as required. Groups () returns only one tuples that contain a unique or all sub-groups. If no sub-group is required, when group () still returns the entire match, groups returns an empty tuples. The following function examples demonstrate this method.
- Use the match () method to match a string
The match () function matches the pattern from the starting part of the string. If the match succeeds, a matching object is returned. If the match fails, None is returned. The group () method of the matching object can be used to display the successful match.
Example:
>>> M = re. match ('buckets', 'buckets') # pattern matching string >>> if m is not None: # if the match is successful, the matching content is output... m. group ()... 'buckets'> m <_ sre. SRE_Match object at 0x7f5d7da1f168> # confirm the returned matching object
- Search for a string using search ()
The search () method works exactly the same as match (). The difference is that search () is the first matching condition for a given regular expression pattern search. Simply put, matching can be successful at any position, not only the starting part of the string, but also the difference with the match () function. Use your fingers to think about the search () function () the method is more widely used.
Example:
>>> m = re.search('bugs', 'hello bugsbunny')>>> if m is not None:... m.group()... 'bugs'
- Use findall () and finditer () to locate each occurrence
Findall () is used to find all (non-repeated) Regular Expression Patterns in a string and return a matching list. finditer () and findall () the difference is that an iterator returns a matching object for each matching.
>>> M = re. findall ('buckets', 'buckets')> m ['buckets', 'buckets']> m = re. finditer ('buckets', 'buckets') >>> m. next () # The iterator returns a matching object using the next () method <_ sre. SRE_Match object at 0x7f5d7da71a58> # display the matching result using the group () method> m. next (). group () 'buckets'
- Search and replace using sub () and subn ()
Replace all the parts of a string that match the regular expression in some form. Sub () returns a replacement string that defines the number of replicas. By default, all the positions that appear are replaced. Subn () is the same as sub (), but subn () also returns a replacement always. The replaced string and the total number of replicas are returned together as a tuples with two elements.
Example:
>>> r = 'a.b'>>> m = 'acb abc aab aac'>>> re.sub(r,'hello',m)'hello abc hello aac'>>> re.subn(r,'hello',m)('hello abc hello aac', 2)
The string also has a replace () method. When you encounter fuzzy search replacement, you need a more flexible sub () method.
- Use split () to separate strings
Similarly, the string also contains split (), but it cannot process the matching of the regular expression. In the re module, the mode Separator of the separated regular expression. The split function splits the string into a list and returns the list of successful matches.
Example:
>>> s = '1+2-3*4'>>> re.split(r'[\+\-\*]',s)['1', '2', '3', '4']
Sometimes, when matching, we only want to extract the desired information or classify the extracted information. In this case, we need to group the regular expression matching mode by adding.
Example:
>>> M = re. match ('(\ w {3})-(\ d {3})', 'abc-123 ')> m. group () # complete match 'abc-123 '>>> m. group (1) # Sub-group 1 'abc'> m. group (2) # Sub-group 2 '20140901'> m. groups () # all sub-groups ('abc', '123 ')
As shown in the preceding example, group () is usually used to display all matching parts in normal mode, but can also be used to obtain matched sub-groups. You can use the groups () method to obtain a tuples that contain all matching strings.