Python Regular Expressions

Last Update:2018-01-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

character	meaning
.	Represents any character that matches except for a line break Note: By setting up re. Dotall flag can be made. Match any character (including line breaks)
\|	A \| B, indicating A or b matching regular expression
^	1. (caret) matches the starting position of the input string 2. If you set up re. MULTILINE flag, ^ also matches the position after the line break
$	1. Match the end position of the input string 2. If you set up re. Multiline flag, $ also matches the position before the line break
\	1. Turn a normal character into a special character, for example \d to match all decimal digits 2. Remove the special features of metacharacters, for example. Represents the match point number itself 3. Strings matched by a subgroup corresponding to the reference number 4. See listing below
[...]	A character class that matches any one of the characters contained in the Note 1: Hyphen-Indicates a character range description if it appears in the middle of the string, or if it appears in the first place only as a normal character Note 2: Special characters have only backslashes \ Hold special meanings for escaping characters. Other special characters such as *, +,? Match as normal character Note 3: Caret ^ If appearing in the first place indicates that the match does not contain any of the characters, if ^ appears in the middle of the string only matches as normal characters
{M,n}	M and n are non-negative integers, where m <= N, indicating that the front RE matches m ~ N times Note 1:{m,} indicates that the match is at least M times Note 2:{,n} is equivalent to {0,n} Note 3:{n} indicates the need to match N times
*	Matches the preceding subexpression 0 or more times, equivalent to {0,}
+	Matches the preceding subexpression one or more times, equivalent to {1,}
?	Match previous subexpression 0 or one time, equivalent to {0,1}
*?, +?, ??	By default , the + and? Match patterns are greedy (that is, they match the rules as much as possible); And?? Indicates that the corresponding non-greedy mode is enabled For example: For the string "FISHCCC", the regular expression fishc+ matches the entire string, and fishc+? Then match "FISHC"
{m,n}?	Ditto, enable non-greedy mode, which matches only M-times
(...)	Matches a regular expression in parentheses, or specifies the start and end position of a child group Note: The contents of a subgroup can be referenced again by the \ number after a match For example: (\w+) \1 can be the string "FISHC fishc.com" in the "FISHC FISHC" (note that there are spaces)
(?...)	(? The extension syntax that begins with the regular expression (these are all the extended syntaxes supported by Python)
(? ailmsux)	1. (?) can be followed by ' a ', ' I ', ' L ', ' m ', ' s ', ' u ', ' X ' in one or more characters, can only be used at the beginning of the regular expression 2. Each character corresponds to a matching flag: re-a (matches only ASCII characters), re-i (ignoring case), re-l (locale), RE-M (multiline mode), Re-s (. Match any symbol), Re-x (detailed expression), containing the rules that will affect the entire regular expression 3. This method is useful when you don't want to set the regular expression flag by Re.compile (). Note that because (? x) determines how the regular expression is parsed, it should always be placed at the front (up to the front with whitespace characters). If the front of the (? x) is a non-whitespace character, then (? x) does not work.
(?:...)	A non-capturing group, that is, a string that matches the subgroup cannot be fetched from behind
(? P ...)	Named groups, which can be accessed by a group's name (name) to a string that matches a subgroup
(? P=name)	Reverse referencing a named group that matches anything that specifies a named group match
(?#...)	Comments, the contents of the parentheses are ignored
(? = ...)	Forward positive assertion. If the current containing regular expression (here with ... Indicates a successful match at the current position, which means success or failure. Once this part of the regular expression is tried by the matching engine, the match is not continued, and the rest of the pattern continues to be attempted where this assertion begins. For example: Love (? =FISHC) matches only the string followed by "FISHC"
(?! ...)	Forward negative assertion. This is opposite to a positive assertion (a mismatch indicates success, and a match indicates a failure) For example: FISHC (?!. COM) matches only the string "FISHC" that is not behind ". com"
(? <= ...)	Positive assertion of the latter. The same as the positive assertion, just opposite the direction. For example: <=love) FISHC only matches the string "FISHC" that follows "Love"
(?	Back to negative assertions. The same as the positive assertion, just opposite the direction. For example: (? <! FISHC) \.com matches only the string ". com" that is not the "FISHC" front
(? (id/name) Yes-pattern\|no-pattern)	1. If the number or name of the subgroup exists, try Yes-pattern match mode; otherwise try No-pattern match mode 2.no-pattern is optional For example: (<)? (\[email protected]\w+ (?: \. \w+) +) (? ( 1) >\|$ is a regular expression that matches the message format, can match [email protected] and ' [email protected] ', but will not match [email protected], [Email protected]
\	A special meaning consisting of the character ' \ ' and another character is listed below. Note that the combination of ' \ ' + meta characters can remove the special function of metacharacters
\ serial Number	1. Reference the string corresponding to the sub-group of the ordinal, and the sub-group ordinal is calculated from 1 2. If the sequence number starts with 0, or the length of 3 digits. It is not used to refer to the corresponding subgroup, but instead to match the characters of the ASCII code value represented by the octal number For example: (. +) \1 matches "FISHC FISHC" or "55 55", but does not match "FISHCFISHC" (note, because there is a space behind the subgroup)
\a	Match the starting position of the input string
\z	Match the end position of the input string
\b	Matches a word boundary, the word is defined as a unidcode alphanumeric or a lower line character For example: \bfishc\b will match the string "Love FISHC", FISHC. "or" (FISHC) "
\b	Matches a non-word boundary, which is actually the opposite of \b For example: py\b matches the string "Python", "py3", or "py2", but does not match "py", "py.", or "py!"
\d	1. For Unicode (str type) mode: matches any number, including [0-9] and other numeric characters, if re is turned on. ASCII flag, just match [0-9] 2. For 8-bit (bytes type) mode: matches any number in [0-9]
\d	Matching any non-Unicode number is actually the opposite of \d, if re is turned on. The ASCII flag is equivalent to [^0-9]
\s	1. For Unicode (str type) mode: matches whitespace characters in Unicode (including [\t\n\r\f\v] and other whitespace characters), if re is turned on. ASCII flag, just match [\t\n\r\f\v] 2. For 8-bit (bytes type) mode: matches the white space character defined in ASCII, i.e. [\t\n\r\f\v]
\s	Matching any whitespace character in non-Unicode is actually the opposite of \s, if re is turned on. ASCII flag, equivalent to match [^ \t\n\r\f\v]
\w	1. For Unicode (str type) mode: matches any Unicode word character, basically all language characters can match, and of course include numbers and dashes, if re is turned on. ASCII flag, just match [a-za-z0-9_] 2. For 8-bit (bytes type) mode: matches the alphanumeric number defined in ASCII, i.e. [a-za-z0-9_]
\w	Matching any non-Unicode word character is actually the opposite of \w, if re is turned on. The ASCII flag is equivalent to [^a-za-z0-9_]
Escape symbols	Regular expressions also support escape symbols for most Python strings: \a,\b,\f,\n,\r,\t,\u,\u,\v,\x,\\ Note 1:\b is typically used to match a word boundary, which is only represented in the character class. Note 2:\u and \u are only recognized in Unicode mode Note 3: Octal escape (\ Number) is limited if the first number is 0, or if there are 3 octal digits, then it is considered an octal number, other cases are considered to be a subgroup reference; As for strings, octal escapes are always a maximum of 3 digits long

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Regular Expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Regular Expressions

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support