1. Meta-characters: . It matches any character except the newline character in alternate mode (re. Dotall) It can even match line breaks
^ matches the beginning of the line. Unless you set the MULTILINE flag, it only matches the beginning of the string.
$ matches the end of the line, and the end of the line is defined as either the end of the string or any position following a newline character.
* Repeat 0 or n times
+ repeat 1 or n times
? Repeat 0 or 1 times
{} This qualifier means at least m duplicates, up to n repetitions
[] They are often used to specify a character category, so-called character categories are the character set you want to match.
\ The backslash can be followed by a different character utilises to represent different special meanings. It can also be used to cancel all meta-characters so you can match them in the pattern.
| An option, or an "or" operator.
() group
2.[akm$] will match the character "a", "K", "M", or "$" in any one; "$" is usually used as a meta-character, but in the character category, its properties are removed and restored to normal characters.
3. Predefined character sets represented by special characters starting with "\"
\d matches any decimal number; it is equivalent to class [0-9].
\d matches any non-numeric character; it is equivalent to class [^0-9].
\s matches any whitespace character; it is equivalent to class [\t\n\r\f\v].
\s matches any non-whitespace character; it is equivalent to class [^\t\n\r\f\v].
\w matches any alphanumeric character; it is equivalent to class [a-za-z0-9_].
\w matches any non-alphanumeric character; it is equivalent to class [^a-za-z0-9_].
\a matches only the first string. When not in MULTILINE mode, \a and ^ are actually the same. However, they are different in the MULTILINE mode; \a just matches the first string, and ^ can also match any position in the string after the line break.
\z only matches the end of the string.
\b Word boundaries. This is a 0-wide qualifier (Zero-width assertions) that matches only the first and final words of a word. A word is defined as an alphanumeric sequence, so the ending is marked with a white or non-alphanumeric character.
\b Another 0-wide qualifier (zero-width assertions), which is exactly the same as \b, and matches only when the current position is not at the word boundary.
4. Repeating meta-characters
* Repeat 0 or n times
+ repeat 1 or n times
? Repeat 0 or 1 times
{m,n}, where m and n are decimal integers. The qualifier means at least m duplicates, up to n repetitions
Common methods for 5.RegexObject objects
Match () determines whether the RE matches the position of the beginning of the string
Search () scan the string to find the location of the RE match
FindAll () finds all the substrings that the RE matches and returns them as a list
Finditer () finds all the substrings that the RE matches and returns them as an iterator
Common methods for 6.MatchObject objects
Group () returns a string that is matched by RE
Start () returns the position where the match started
End () returns the position of the end of the match
Span () returns a tuple containing the position of the match (start, end)
7. Compile Flag
Dotall, S make. Match all characters, including line breaks
IGNORECASE, I make matching to case insensitive
Locale, L do localization identification (locale-aware) match
MULTILINE, M multi-line match, Impact ^ and $
VERBOSE, X can use the VERBOSE state of REs to make it easier to understand
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.