Metacharacters: Match character, match position, match number, match pattern.
List of common meta characters
. Match any character other than line break
\b Match the beginning or end of a word
\d Matching numbers
\w matches letters, numbers, underscores, or kanji
\s matches any whitespace character, including spaces, tabs (tab), line breaks, Chinese full-width spaces, and so on
^ Start of matching string
$ match End of string
Common qualifiers
* Repeat 0 or more times
+ Repeat one or more times
? Repeat 0 or one time
{n} repeats n times
{n,} repeats n or more times
{N,m} repeats n to M times
Common anti-semantic characters
\w matches any character that is not alphabetic, numeric, or underlined
\s matches any character that is not a white letter
\d matches any non-numeric character
\b Match is not where the word starts or ends
[^a] matches any character except a
[^ABCDE] matches any character other than the letters A, B, C, D, E
[^ (123|ABC)] matches any character except the characters 1, 2, 3, or a, B, c
Back reference:
Using parentheses to specify an expression can be considered a grouping. By default, each grouping automatically has a group number, with the rule: left to right, with the left parenthesis of the group as the flag, and the first occurrence of the group number 1
Table common groupings of the situation
Categorical grammatical meanings
(exp) matches exp, and captures text into an automatically named group
Capture (? P<NAME>EXP) capture Exp, and capture Wenben to a group named name
(?: EXP) matches exp, does not capture matching text, and does not assign group numbers to this group
(? =exp) matches the position of the exp front
0 Wide assertion (? <=exp) matches the position after exp
(?! EXP) match the location followed by the EXP
(? <!exp) matches a position that is not previously exp
Note (? #comment) This type of grouping does not have any effect on the processing of regular expressions and is used only to provide comments for people to read
0 width assertion: ' \b ', ' ^ ' matches a position, and this position needs to meet certain conditions, we call this condition an assertion or a 0-width assertion.
0 width Positive lookahead assertion: (? =exp), he asserts that the back of the position can match the expression exp. for example [a-z]* (? =exp) matches the portion ending in ing, finding I love cooking and singing will match cook and sing.
The antecedent assertion is performed by finding the first "ing" from the very right side of the character to match, and then matching the preceding expression, or finding the second one if it does not match.
0 width is recalling the post assertion: (? <=exp), he asserts that the front of this position can match the expression exp. For example (<=ABC). * Matches the following part of a string beginning with ABC, can match abcdefgabc in Defgabc Two is not ABCDEFG, the latter assertion and antecedent assertion just opposite, he starts from the leftmost end of the string to match to find the assertion expression, The subsequent string is then matched, and if it does not match, the second assertion expression continues to be found, so repeat.
0 width Negative lookahead assertion: (?! EXP) asserts that after this position cannot match the expression exp. Like \b (?! ABC) \w) +\b matches words that do not contain continuous string ABC
0 width Service The assertion (? <!exp) asserts that the front of this position cannot match the expression exp. For example (? <![ A-z] \d{7} matches a seven-digit number that is not preceded by a lowercase letter.
Used to match content within simple HTML tags that do not contain attributes (?<=< (\w+) >). * (? =<\/\1).
Greed and laziness
When a regular expression contains a qualifier that can accept duplicates, the usual behavior is to match as many characters as possible (the entire expression matches).
This is greedy mode. Take the expression a\w+b as an example, if the search a12b34b as many matches as possible, the left will match the entire a12b34b instead of the a12b,
But if you want to match the a12b, how to deal with it? We're going to turn on lazy mode and change the a\w+b to A\w+?b.
How lazy qualifiers are used
*? Repeat any number of times, but repeat as little as possible
+? Repeat 1 or more times, but repeat as little as possible
?? Repeat 0 or 1 times, but repeat as little as possible
{n,m}? Repeat N to M times, but as few repetitions as possible
{n,}? Repeat more than n times, but repeat as little as possible
Table Python's matching rules
Syntax meaning expression full match string
\a only matches string beginning \AABC ABC
\z only matches string end abc\z ABC
(? p<name>) grouping, in addition to the original number external specifies an additional alias (? P<WORD>ABC) {2} ABCABC
(? P=name) refers to a string (?) that is matched to a grouping of <name> aliases. p<id>\d) ABC (? P=id) 1abc1,5abc5
The method of matching processing in Python is mainly through several methods inside the RE module.
1.
Python Regular Expressions