Definition of regular expressions
Regular Expressions (re) (Regular expression). A regular expression is a logical formula for a string operation, which is a "rule string" that is used to express a filter logic for a string, using predefined specific characters and combinations of these specific characters.
In Python, you can refer to it through the built-in re module.
Function:
- You can specify rules for the corresponding set of strings that you want to match
- The string set may contain English statements, E-amil addresses, commands, or anything you want. Regular expression--meta character 1. [ ]
- Used to specify a character set: [ABC]; [A-z]
- Metacharacters does not work in the character set: [akm$]
- Complement matches characters that are not in the range: [^5]2. ^
- Matches the beginning of the line. Unless you set the multiline flag, it only matches the beginning of the string. In multiline mode, it can also directly match each line break in a string. 3. & Repetitive use
- *
- Specifies that the previous character can be matched 0 times, or more than once, and that the matching engine will try to repeat as many times as possible (no more than the defined range of integers, 2 billion) for example: a[bcd]*b-----"ABCBD"
- Matches the end of the line, and the end of the line is defined as either a string or any position following a newline character.
* +*
Represents a match one or more times.
Note: the difference between and-- matches 0 or more times, so it does not appear at all, and + requires at least one occurrence.
? **
Match once or 0 times; it can be considered optional to identify something.
{M,n}
where m and n are decimal integers. The qualifier means at least m duplicates, up to n repetitions. A/{1,3}b
Ignoring m would consider the lower boundary to be 0, while ignoring n would result in an infinity on the upper boundary (actually 2 billion)
{0,} equals , {1,} equals +, and {0,1} is associated with? Same. If so, would it be better to use *,+, or?
Use of escape character \
- You can use \ To cancel all metacharacters: [\
- \d matches any decimal number, which is equivalent to class [0-9]
- \d matches any non-numeric character, which is equivalent to class [^0-9]
- \s matches any whitespace character, which is equivalent to class [\t\n\r\f\v]
- \s matches any non-whitespace character, which is equivalent to class [^\t\n\r\f\v]
- \w matches any alphanumeric character, which is equivalent to a class [A-Z-0-9]
- \w matches any non-alphanumeric character, which is equivalent to class [^a-z A-Z 0-9]
Use of the RE module in Python
Using regular expressions in Python, Python provides the RE module, which contains the functionality of all regular expressions.
- Because the Python string itself is also escaped with \, be aware, for example:
s = ' abc\-001 ' # python string
The corresponding regular expression string becomes:
' ABC-001 '
Therefore, in order to avoid conflicts, it is recommended to use the R prefix in Python, it is not necessary to consider the problem of escaping!
- Python has its own re module, which can be
import re
used to use this module, the RE module has many methods, gives all the methods, in Ipython can see the specific meaning of each method.
Some common methods are listed below:
- 1.re.findall (pattern,string,flags=0): Returns a tuple that contains all the matching strings that are not duplicated.
Example 1:
-
- Re.match (Pattern, String, flags=0): matches the regular expression from the beginning of the string, if the match returns a matching object, if no match is returned to none, the common if Judgment statement uses
Example 2:
Use of the Python Express expression