Regular expression pattern
A pattern string uses a special syntax to represent a regular expression:
Letters and numbers denote themselves. The letters and numbers in a regular expression pattern match the same string.
Most letters and numbers have a different meaning when they are put in front of a backslash.
Punctuation marks only match themselves if they are escaped, otherwise they represent special meanings.
Backslashes themselves need to be escaped with backslashes.
Because regular expressions usually contain backslashes, you might want to use the original string to represent them. The pattern element (such as R '/t ', equivalent to '//t ') matches the corresponding special character.
The following table lists the special elements in the regular expression pattern syntax. If you use the pattern while providing optional flag parameters, the meaning of some pattern elements will change.
Pattern description
^ matches the beginning of the string
$ matches the end of the string.
. Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.
[...] Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '
[^...] Characters not in []: [^ABC] matches characters other than a,b,c.
re* matches 0 or more expressions.
Re+ matches 1 or more expressions.
Re? Matches 0 or 1 fragments defined by a preceding regular expression, not greedy
re{N}
re{N,} exactly matches n preceding expressions.
re{N, m} matches N to M times the fragment defined by the preceding regular expression, greedy way
a| b matches A or b
(RE) The G matches the expression in parentheses, and also represents a group
The (? IMX) Regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.
(?-imx) Regular expression Close I, M, or x optional flag. Affects only the areas in parentheses.
(?: RE) similar (...), but does not represent a group
(? imx:re) use I, M, or x optional flag in parentheses
(?-imx:re) do not use I, M, or x optional flags in parentheses
(?#...) Comments.
(? = RE) forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.
(?! Re) forward negative delimiter. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string
(?> re) matches the standalone mode, eliminating backtracking.
\w Match Alpha-numeric
\w matches non-alphanumeric numbers
\s matches any whitespace character, equivalent to [\t\n\r\f].
\s matches any non-null character
\d matches any number, equivalent to [0-9].
\d matches any non-numeric
\a Match string start
\z matches the end of the string, if there is a newline, matches only the end string before the line break. C
\z Match string End
\g matches the position where the last match was completed.
\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\ n, \ t, and so on. Matches a line break. Matches a tab character. such as
\1...\9 matches the sub-expression of the nth grouping.
\10 matches the sub-expression of nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.
Regular expression Instances
Character matching
Example description
Python matches "Python".
Character class
Example description
[Pp]ython matches "python" or "python"
Rub[ye] matches "Ruby" or "Rube"
[Aeiou] matches any one of the letters within the brackets
[0-9] matches any number. Similar to [0123456789]
[A-z] matches any lowercase letter
[A-z] matches any uppercase letter
[a-za-z0-9] matches any letter and number
[^aeiou] All characters except the Aeiou letter
[^0-9] matches characters except for numbers
Special character Classes
Example description
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
\d matches a numeric character. equivalent to [0-9].
\d matches a non-numeric character. equivalent to [^0-9].
\s matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\w matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.
Common Regular Expressions