3.2.1 Regular Expression syntax, 3.2.1 Regular Expression
Special Character Set:
'.'
Point number. By default, it can be used to replace any character except line breaks. If the DOTALL flag is set, it can be used to replace any character, including line breaks.
Example:
# Re
Import re
M = re. findall ('A. ',' AB a2 bb a + ')
If m:
Print (m)
The output is as follows:
['AB', 'a2 ', 'a +']
In this example, first import the Regular Expression Library re using the import Statement, and then use the findall function to find all the regular expressions 'a. 'matched string, generate a list and save it in m. Finally, determine whether to find the string. If yes, print it out. In this regular expression, 'a. 'is composed of letters a and periods, meaning that it matches two character strings starting with. Any character here does not include line breaks.
'^'
The insert operator is used to match only the strings starting with the string, or when the multiline flag MUTILINE is set, each new line can also start to match. If this character is not used, the string is matched at any position.
Example:
# Re
Import re
M = re. findall ('A. ',' AB a2 bb a + ')
If m:
Print (m)
M = re. findall ('a .. ',' AB a2 bb a + ')
If m:
Print (m)
M = re. findall ('^ a.', 'AB a2 bb a + ')
If m:
Print (m)
The output is as follows:
['AB', 'a2 ', 'a +']
['AB', 'a2 ']
['AB']
In this example, we can see that the output of the last line matches only the first string.
'$'
Dollar signs are used to indicate matching the end of a string or matching the end of a string before a new line. For example, if foo $ is used to match strings foo and foobar, only foo can be matched successfully, but foobar does not, because foo is not at the end of the word. For example, if the regular expression is foo. $, when the string 'foo1 \ nfoo2 \ n', when MULTILINE does not work, it will only match foo2; when multiple lines work, it will also match to foo1. If there is only one $ character, search for a string 'foo \ n', and match two empty strings. One is before the new string, and the other is after the string.
Example:
Print ('$ ')
M = re. findall ('A. $ ',' AB a2 bb a + ')
If m:
Print (m)
M = re. findall ('$ a.', 'AB a2 bb a + ')
If m:
Print (m)
M = re. findall (r 'foo $ ', r 'foobar \ n ')
If m:
Print (m)
Print (r 'foobar \ n testfoo ')
M = re. findall (r 'foo $ ', r 'foobar \ n testfoo ')
If m:
Print (m)
M = re. findall (r 'foo. $ ', r 'foo1 \ nfoo2 ')
If m:
Print (m)
M = re. findall (R' $ ', r'foo \ n ')
If m:
Print (m)
The output is as follows:
$
['A + ']
Foobar \ n testfoo
['Foo']
['Foo2']
['']
'*'
The asterisk can be 0 to infinite characters before the asterisk. For example, AB * will match a, AB, or a strings without a limit on B, such as abbbbbbb.
Example:
Print ('*')
M = re. findall (r 'AB *', r 'a AB abc abbb abbbb2 abbbbbbbbb ')
If m:
Print (m)
The output is as follows:
*
['A', 'AB', 'abbbb']
'+'
The plus sign can be 1 to an infinite number of characters before the plus sign. For example, AB + indicates that a string with an infinite number of BITs can be matched successfully. That is, AB, abb, or unlimited B. matching a won't succeed, which is different from asterisks.
Example:
Print ('+ ')
M = re. findall (r 'AB +', r'a AB abc abbb abbbb2 abbbbbbbbb ')
If m:
Print (m)
The output is as follows:
+
['AB', 'AB', 'abbbb', 'abbbb', 'abbbbbbbb']
'? '
The question mark can only appear once or only once before the question mark. For example, AB? Indicates that only a or AB is matched.
Example:
Print ('? ')
M = re. findall (r' AB? ', R'a AB abc abbb abbbb2 abbbbbbbbbbbbb ')
If m:
Print (m)
The output is as follows:
?
['A', 'AB']
*?, + ?,??
Asterisks, plus signs, and question marks are greedy algorithms that match as many characters as possible. But they will be a little different after combination. For example:
Print ('*? ')
M = re. findall (r' AB *? ', R'a AB abc abbb abbbb2 abbbbbbbbbbbbb ')
If m:
Print (m)
Print ('+? ')
M = re. findall (r' AB +? ', R'a AB abc abbb abbbb2 abbbbbbbbbbbbb ')
If m:
Print (m)
Print ('?? ')
M = re. findall (r' AB ?? ', R'a AB abc abbb abbbb2 abbbbbbbbbbbbb ')
If m:
Print (m)
The output is as follows:
*?
['A', 'A', 'a']
+?
['AB', 'AB']
??
['A', 'A', 'a']
{M}
Add a number to the braces to match the number of characters before the braces. For example, a {6} indicates that six identical a instances are matched. If less than six a instances are matched successfully, more than six a instances are allowed.
Example:
Print ('a {6 }')
M = re. findall (r 'a {6} ', r 'baaaaaad aabaaaa aaaaa ')
If m:
Print (m)
The output is as follows:
A {6}
['Aaaaa']
{M, n}
Braces and m, n form a repeating character at the upper and lower limits. M indicates the minimum number of duplicates, which can be 0; n indicates the maximum number of duplicates, which can be infinite; match as many as possible. For example, a {2, 3} B Indicates finding 2 to 3 consecutive a strings followed by B. For example, a {2,} B indicates to search for more than two a strings followed by B.
Example:
Print ('a {2, 3} B ')
M = re. findall (r 'a {2, 3} B ', r 'aab baaab AB aaa aaaaaab ')
If m:
Print (m)
Print ('a {2,} B ')
M = re. findall (r 'a {2,} B ', r 'aab baaab AB aaa aaaaaab ')
If m:
Print (m)
Print ('a {0, 3} B ')
M = re. findall (r 'a {0, 3} B ', r 'aab baaab AB aaa aaaaaab ')
If m:
Print (m)
The output is as follows:
A {2, 3} B
['Aab', 'aaab', 'aaab']
A {2,} B
['Aab', 'aaab', 'aaaaaab']
A {0, 3} B
['Aab', 'B', 'aaab', 'AB', 'aaab']
{M, n }?
Braces and m, n form a repeating character at the upper and lower limits. M indicates the minimum number of duplicates, which can be 0; n indicates the maximum number of duplicates, which can be an infinite number; match as few as possible. For example, a {2, 3 }? B Indicates to search for 2 to 3 consecutive a strings followed by B. For example, the aaaaaa, a {3, 5} strings will match 5 a as much as possible, while a {3, 5 }? Only three a matches are returned.
Example:
Print ('a {2, 3 }')
M = re. findall (r 'a {2, 3} ', r 'aa abaaaaaaa ')
If m:
Print (m)
Print ('a {2, 3 }? ')
M = re. findall (r'a {2, 3 }? ', R'aa abaaaaaaa ')
If m:
Print (m)
The output is as follows:
A {2, 3}
['Aaa', 'aaa', 'aaa']
A {2, 3 }?
['A', 'a']
'\'
The backslash represents the character followed by the original character. For example, \ * indicates the asterisk character.
Example:
Print ('a *\*')
M = re. findall (r 'a * \ * ', r 'a * abaaa * abbbb ')
If m:
Print (m)
The output is as follows:
A *\*
['A', 'aaa * ']
[]
Square brackets indicate a character set combination.
L list possible character set combinations. For example, [amk] indicates that a, m, and k may be matched.
L specify the character set by range. The range format is composed of two characters and. For example, [a-z] indicates a set of lower-case letters, including all letters from a to z. For example, [0-5] [0-9] indicates a set of two digits from 00 to 59. For example, [0-9A-Fa-f] indicates all character sets in hexadecimal format. If '-' appears after the backslash, it indicates that only the '-' character is used, for example, [a \-z]; or '-' indicates that the '-' character is used at the beginning or end of the set, for example, [a-].
L all characters with special meanings will lose the meaning in square brackets and will only be used as their own characters. For example, [(+ *)] indicates matching (, +.
L character meaning of classification characters can be used, such as \ w or \ s. These characters define meanings later.
L a set of non-occurrence characters. Add the character ^ after the brackets to exclude the characters in the set. For example, [^ 5] indicates that all characters except 5 match. [^] Indicates that all characters except the ^ character match. If the character ^ is not placed in the first character, it makes no special sense.
L to match the brackets], use the Escape Character \ or place it at the first character position. For example, [() [\] {}] or [] () [{}], note that the last one is to put] at the beginning of the first character.
Example:
Print ('[amk]')
M = re. findall (R' [amk] ', r'e a d m k B C ')
If m:
Print (m)
Print ('[a-z]')
M = re. findall (R' [a-z] ', r'1 3 e a d m k B c 8 9 ')
If m:
Print (m)
Print ('[0-5] [0-9]')
M = re. findall (R' [0-5] [0-9] ', r'01 a1 99 49 38 ')
If m:
Print (m)
The output result is as follows:
[Amk]
['A', 'M', 'K']
[A-z]
['E', 'A', 'D', 'M', 'k', 'B', 'C']
[0-5] [0-9]
['01', '49', '38']
'|'
A | Regular Expression of B, A, or B. The expression can be matched successfully if expression A is true or expression B is true. If expression A is completely matched, expression B is not considered. If you want to match |, you can use \ | or [|.
Example:
Print ('[2-4] | [7-9]')
M = re. findall (R' [2-4] | [7-9] ', r'1 2 3 4 5 6 7 8 9 ')
If m:
Print (m)
The output result is as follows:
[2-4] | [7-9]
['2', '3', '4', '7', '8', '9']
(...)
Match Based on any regular expression in the brackets. After successful matching, It is output as a tuple. Three dots like brackets indicate matching any three characters as a tuple. If there are four dots, Match Four arbitrary characters as a single tuple. If you want to match parentheses, use a backslash, or [(] or [)] in the brackets.
Example:
Print ('(...)')
M = re. findall (R' (...) ', r'2017 123 abcdefghijk ')
If m:
Print (m)
The output result is as follows:
(...)
['000000', '45', '000000', 'AB', 'cde', 'fgh', 'ijk ']
Cai junsheng QQ: 9073204 Shenzhen
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.