Python Regular Expressions

Source: Internet
Author: User

character meaning
. Represents any character that matches except for a line break
Note: By setting up re. Dotall flag can be made. Match any character (including line breaks)
| A | B, indicating A or b matching regular expression
^ 1. (caret) matches the starting position of the input string
2. If you set up re. MULTILINE flag, ^ also matches the position after the line break
$ 1. Match the end position of the input string
2. If you set up re. Multiline flag, $ also matches the position before the line break
\ 1. Turn a normal character into a special character, for example \d to match all decimal digits
2. Remove the special features of metacharacters, for example. Represents the match point number itself
3. Strings matched by a subgroup corresponding to the reference number
4. See listing below
[...] A character class that matches any one of the characters contained in the
Note 1: Hyphen-Indicates a character range description if it appears in the middle of the string, or if it appears in the first place only as a normal character
Note 2: Special characters have only backslashes \ Hold special meanings for escaping characters. Other special characters such as *, +,? Match as normal character
Note 3: Caret ^ If appearing in the first place indicates that the match does not contain any of the characters, if ^ appears in the middle of the string only matches as normal characters
{M,n} M and n are non-negative integers, where m <= N, indicating that the front RE matches m ~ N times
Note 1:{m,} indicates that the match is at least M times
Note 2:{,n} is equivalent to {0,n}
Note 3:{n} indicates the need to match N times
* Matches the preceding subexpression 0 or more times, equivalent to {0,}
+ Matches the preceding subexpression one or more times, equivalent to {1,}
? Match previous subexpression 0 or one time, equivalent to {0,1}
*?, +?, ?? By default , the + and? Match patterns are greedy (that is, they match the rules as much as possible); And?? Indicates that the corresponding non-greedy mode is enabled
For example: For the string "FISHCCC", the regular expression fishc+ matches the entire string, and fishc+? Then match "FISHC"
{m,n}? Ditto, enable non-greedy mode, which matches only M-times
(...) Matches a regular expression in parentheses, or specifies the start and end position of a child group
Note: The contents of a subgroup can be referenced again by the \ number after a match
For example: (\w+) \1 can be the string "FISHC fishc.com" in the "FISHC FISHC" (note that there are spaces)
(?...) (? The extension syntax that begins with the regular expression (these are all the extended syntaxes supported by Python)
(? ailmsux) 1. (?) can be followed by ' a ', ' I ', ' L ', ' m ', ' s ', ' u ', ' X ' in one or more characters, can only be used at the beginning of the regular expression
2. Each character corresponds to a matching flag: re-a (matches only ASCII characters), re-i (ignoring case), re-l (locale), RE-M (multiline mode), Re-s (. Match any symbol), Re-x (detailed expression), containing the rules that will affect the entire regular expression
3. This method is useful when you don't want to set the regular expression flag by Re.compile ().
Note that because (? x) determines how the regular expression is parsed, it should always be placed at the front (up to the front with whitespace characters). If the front of the (? x) is a non-whitespace character, then (? x) does not work.
(?:...) A non-capturing group, that is, a string that matches the subgroup cannot be fetched from behind
(? P ...) Named groups, which can be accessed by a group's name (name) to a string that matches a subgroup
(? P=name) Reverse referencing a named group that matches anything that specifies a named group match
(?#...) Comments, the contents of the parentheses are ignored
(? = ...) Forward positive assertion. If the current containing regular expression (here with ... Indicates a successful match at the current position, which means success or failure. Once this part of the regular expression is tried by the matching engine, the match is not continued, and the rest of the pattern continues to be attempted where this assertion begins.
For example: Love (? =FISHC) matches only the string followed by "FISHC"
(?! ...) Forward negative assertion. This is opposite to a positive assertion (a mismatch indicates success, and a match indicates a failure)
For example: FISHC (?!. COM) matches only the string "FISHC" that is not behind ". com"
(? <= ...) Positive assertion of the latter. The same as the positive assertion, just opposite the direction.
For example: <=love) FISHC only matches the string "FISHC" that follows "Love"
(? Back to negative assertions. The same as the positive assertion, just opposite the direction.
For example: (? <! FISHC) \.com matches only the string ". com" that is not the "FISHC" front
(? (id/name) Yes-pattern|no-pattern) 1. If the number or name of the subgroup exists, try Yes-pattern match mode; otherwise try No-pattern match mode
2.no-pattern is optional
For example: (<)? (\[email protected]\w+ (?: \. \w+) +) (? ( 1) >|$ is a regular expression that matches the message format, can match [email protected] and ' [email protected] ', but will not match [email protected], [Email protected]
\ A special meaning consisting of the character ' \ ' and another character is listed below. Note that the combination of ' \ ' + meta characters can remove the special function of metacharacters
\ serial Number 1. Reference the string corresponding to the sub-group of the ordinal, and the sub-group ordinal is calculated from 1
2. If the sequence number starts with 0, or the length of 3 digits. It is not used to refer to the corresponding subgroup, but instead to match the characters of the ASCII code value represented by the octal number
For example: (. +) \1 matches "FISHC FISHC" or "55 55", but does not match "FISHCFISHC" (note, because there is a space behind the subgroup)
\a Match the starting position of the input string
\z Match the end position of the input string
\b Matches a word boundary, the word is defined as a unidcode alphanumeric or a lower line character
For example: \bfishc\b will match the string "Love FISHC", FISHC. "or" (FISHC) "
\b Matches a non-word boundary, which is actually the opposite of \b
For example: py\b matches the string "Python", "py3", or "py2", but does not match "py", "py.", or "py!"
\d 1. For Unicode (str type) mode: matches any number, including [0-9] and other numeric characters, if re is turned on. ASCII flag, just match [0-9]
2. For 8-bit (bytes type) mode: matches any number in [0-9]
\d Matching any non-Unicode number is actually the opposite of \d, if re is turned on. The ASCII flag is equivalent to [^0-9]
\s 1. For Unicode (str type) mode: matches whitespace characters in Unicode (including [\t\n\r\f\v] and other whitespace characters), if re is turned on. ASCII flag, just match [\t\n\r\f\v]
2. For 8-bit (bytes type) mode: matches the white space character defined in ASCII, i.e. [\t\n\r\f\v]
\s Matching any whitespace character in non-Unicode is actually the opposite of \s, if re is turned on. ASCII flag, equivalent to match [^ \t\n\r\f\v]
\w 1. For Unicode (str type) mode: matches any Unicode word character, basically all language characters can match, and of course include numbers and dashes, if re is turned on. ASCII flag, just match [a-za-z0-9_]
2. For 8-bit (bytes type) mode: matches the alphanumeric number defined in ASCII, i.e. [a-za-z0-9_]
\w Matching any non-Unicode word character is actually the opposite of \w, if re is turned on. The ASCII flag is equivalent to [^a-za-z0-9_]
Escape symbols Regular expressions also support escape symbols for most Python strings: \a,\b,\f,\n,\r,\t,\u,\u,\v,\x,\\
Note 1:\b is typically used to match a word boundary, which is only represented in the character class.
Note 2:\u and \u are only recognized in Unicode mode
Note 3: Octal escape (\ Number) is limited if the first number is 0, or if there are 3 octal digits, then it is considered an octal number, other cases are considered to be a subgroup reference; As for strings, octal escapes are always a maximum of 3 digits long

Python Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.