Many technical engineers who need to process texts must have a comprehensive and in-depth understanding of Python regular expressions, not only what is a Python regular expression, you also need to know Python Regular Expression characters.
In addition, there are a few special characters that do not match themselves, but match something other than its literal value, which may be a character set, number of repetitions, or location. Common metacharacters include:
. ^ $ * +? {} [] \ | ()
This article will introduce these special characters one after another. However, let's take a look at the metacharacters used to match characters. First, the sentence ". "This metacharacter is usually used to match" any character ": Generally, it matches any character except the line break, but in alternate mode (re. DOTALL), which matches any character in the true sense, including line breaks.
The following metacharacters are [and]. They are often paired to specify a character set to match, that is, any element in the set can meet our requirements. The characters in the set can be listed individually. If these characters are consecutive, you can use the "-" separator to specify a character range.
For example, [abc] matches any character in "a", "B", or "c". Of course, the interval [a-c] can also be used to represent the same character set, the two indicate that the method is equivalent. To match all the vowels in a string, use the following code:
- import re
- def re_show(pat, s):
- print re.compile(pat, re.M).sub("{\g<0>}", s.rstrip()),'\n'
-
- s = '''In company or association with respect to place or time;
- as, to live together in one house; to live together in the
- same age; they walked together to the town.'''
- re_show(r"[aeiou]",s)
The running result is as follows:
- In c{o}mp{a}ny {o}r {a}ss{o}c{i}{a}t{i}{o}n w{i}th r{e}sp{e}ct t{o} pl{a}c{e} {o}r t{i}m{e};
-
- {a}s, t{o} l{i}v{e} t{o}g{e}th{e}r {i}n {o}n{e} h{o}{u}s{e}; t{o} l{i}v{e} t{o}g{e}th{e}r {i}n th{e}
-
- s{a}m{e} {a}g{e}; th{e}y w{a}lk{e}d t{o}g{e}th{e}r t{o} th{e} t{o}wn.
Note that metacharacters are "downgraded" to common characters in square brackets. For example, [a.] matches any one of the characters "a" or ".". As mentioned earlier, "." is generally used as a metacharacter. However, in character set combination, its particularity will be deprived and restored to normal characters. In this case, you can modify the above Code to perform an experiment.
Sometimes you need to find characters that do not belong to a character set. For example, if you want to search for any character except the number 6, you need to use the negative effect: the metacharacter "^" is used as the first character of the set. For example, [^ 5] will match any character except "6.
The Backslash "\" is a very important metacharacter. We know that in Python strings, The backslash is also used as a special character (or escape character). It can be followed by different characters to indicate different special meanings; it can also be used to cancel all metacharacters so that you can match them in the mode. For example, if you need to match the character "\", you can use a backslash to cancel their special meaning :\\.
- Python source code compilation skills
- Detailed analysis of Python script problems
- Introduction to the Python Programming Language
- In-depth analysis of the Python testing framework
- Detailed description of the Python operator Style
◆ \ D matches any decimal number. It is equivalent to the combination of character sets [0-9].
◆ \ D matches any non-numeric character. It is equivalent to the character set [^ 0-9].
◆ \ S matches any blank character. It is equivalent to the character set [\ t \ n \ r \ f \ v].
◆ \ S matches any non-blank characters. It is equivalent to the character set combination [^ \ t \ n \ r \ f \ v].
◆ \ W matches any letter, digit, and underline character; it is equivalent to character set in combination with [a-zA-Z0-9 _].
◆ \ W matches any non-alphanumeric underline character; it is equivalent to character set in combination with [^ a-zA-Z0-9 _].
We have already discussed how to specify the number of repetitions for a single character-simply add a qualifier after the character. Now let's take a look at the eight methods for repeating multiple characters: you can use parentheses to refer to the subexpression (also called grouping). Then you can specify the number of repetitions of the subexpression. You can also perform other operations on the subexpression.
We know that IP addresses are the four numbers separated by periods, and each number cannot exceed 255. (\ D {1, 3 }\.) {3} \ d {} is a simple IP address matching expression, where: \ d {} matches 1 to 3 digits (\ d }\.) {3} matches three digits with an English ending (this group is used as a whole), repeats three times, and finally adds one to three digits (\ d {1, 3 }).
However, it will also match an impossible IP address such as 256.300.888.999. If arithmetic comparison can be used, this problem may be solved simply. However, regular expressions do not provide any mathematical functions, so they can only use lengthy groups.
Select the Python Regular Expression and Character Set combination to describe a correct IP Address: (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?). The key to understanding this expression is to understand 2 [0-4] \ d | 25 [0-5] | [01]? \ D ?, After the above introduction, I believe that the readers can analyze its meaning.