Python regular expression,
Python Regular Expression learning Summary: 1. First, we recommend the Learning Website:
Cainiao learning: http://www.runoob.com/python/python-reg-expressions.html
MOOC: http://www.imooc.com/learn/550
Self-improvement school: http://code.ziqiangxuetang.com/regexp/regexp-tutorial.html
2. recommended books: Basic python tutorials and core python programming (with python basics) 3. My personal summary:
The most important thing about python is source code learning!
Python has added the re module since version 1.5. It provides the Perl-style regular expression mode. The re module enables the Python language to have all the regular expression functions.
Regular Expression match mainly includes: single character match, boundary match, Character Set match, restriction and negation, group match and extended notation. The following is a summary:
1. Single Character matching
\ Cx |
Match the control characters specified by x. For example, \ cM matches a Control-M or carriage return character. The value of x must be either a A-Z or a-z. Otherwise, c is treated as an original 'C' character. |
\ D |
Match a numeric character. It is equivalent to [0-9]. |
\ D |
Match a non-numeric character. It is equivalent to [^ 0-9]. |
\ F |
Match a form feed. It is equivalent to \ x0c and \ cL. |
\ N |
Match A linefeed. It is equivalent to \ x0a and \ cJ. |
\ R |
Match a carriage return. It is equivalent to \ x0d and \ cM. |
\ S |
Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v]. |
\ S |
Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v]. |
\ T |
Match a tab. It is equivalent to \ x09 and \ cI. |
\ V |
Match a vertical tab. It is equivalent to \ x0b and \ cK. |
\ W |
Match any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'. |
\ W |
Match any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'. |
\ Xn |
Match n, where n is the hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can be ASCII encoded. |
\ Num |
Matches num, where num is a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters. |
\ N |
Identifies an octal escape value or a backward reference. If at least n subexpressions are obtained before \ n, n is backward referenced. Otherwise, if n is an octal digit (0-7), n is an octal escape value. |
\ Nm |
Identifies an octal escape value or a backward reference. If at least one child expression is obtained before \ nm, the nm is backward referenced. If at least n records are obtained before \ nm, n is a backward reference followed by text m. If none of the preceding conditions are met, if n and m are Octal numbers (0-7), \ nm matches the octal escape value nm. |
\ Nml |
If n is an octal number (0-3) and m and l are Octal numbers (0-7), the octal escape value nml is matched. |
\ Un |
Match n, where n is a Unicode character represented by four hexadecimal numbers. For example, \ u00A9 matches the copyright symbol (?). |
2. boundary matching
\ B |
Match A Word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '. |
\ B |
Match non-word boundary. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '. |
^ | \ |
Matches the start position of the input string. If the Multiline attribute of the RegExp object is set, ^ matches the position after '\ n' or' \ R. |
$ | \ Z |
Matches the end position of the input string. If the Multiline attribute of the RegExp object is set, $ also matches the position before '\ n' or' \ R. |
3. Character Set matching
[Xyz] |
Character Set combination. Match any character in it. For example, '[abc]' can match 'A' in "plain '. |
[^ Xyz] |
Negative value character set combination. Match any character not included. For example, '[^ abc]' can match 'p' in "plain '. |
4. Restriction and Negation
[A-z] |
Character range. Matches any character in the specified range. For example, '[a-z]' can match any lowercase letter in the range of 'A' to 'Z. |
[^ A-z] |
Negative character range. Matches any character that is not within the specified range. For example, '[^ a-z]' can match any character that is not in the range of 'A' to 'Z. |
5. group matching
\ D + (\. \ d *)? |
A string that represents a simple floating point number. |
([\ W] +) \ w + \ 1 |
Matches html or xml tags. <span> python <span> |
6. Extended notation
(? : Pattern) |
Matches pattern but does not get the matching result. That is to say, this is a non-get match and is not stored for future use. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries. |
(? = Pattern) |
Forward pre-query: matches the search string at the beginning of any string that matches the pattern. |
(?! Pattern) |
Negative pre-query: matches the search string at the beginning of any string that does not match pattern. |
X | y |
Match x or y. For example, 'z | food' can match "z" or "food ". '(Z | f) ood' matches "zood" or "food ". |
The most important thing is naming combinations and non-Greedy use:
(* | + |? | {})? It is used to match the non-Greedy version of the above frequently repeated symbols.
(? P <name>...) # name is a valid identifier used to name a capture group.
In addition:
It is best to use the original character pattern = r'pattern' to define pattern in python'
Attached python doc-re module:Https://docs.python.org/3/library/re.html