Objective:
What is a regular expression?
The regular expression (Regular expressions, also known as REs, or regexes or regex patterns) is essentially a tiny, highly specialized programming language.
Regular expressions are not part of Python. It is embedded in Python and is provided to the program ape using the RE module. With regular expressions, you need to specify rules that describe the set of strings that you want to match. These string collections may contain English sentences, e-mail addresses, TeX commands, or any strings you want.
Shows the process of matching using regular expressions:
You can practice ************************* with online regular expression testing
First, simple introduction 1, ordinary characters
>>> wildcard characters (.)
A regular expression can match more than one string, and a wildcard (.) can match any character except the line break "\ n".
Example: Text to be matched: "School"
Regular expression: "S....L"
Match result: "School"
>>> escape character (\)
Escape character, use to make a character change the original meaning, if there are characters in the string need to match, you can use (\ character) to express.
Example: Text to be matched: "W3.school"
Regular expression: "W3\.school"
Match result: "W3.school"
>>> character set ([])
We can use the brackets ([]) to enclose the string to create a character set. You can use a range, such as ' [A-z] ' to match any character from A to Z, or you can combine the range one by one, such as ' [a-za-z0-9] ' to match any uppercase and lowercase letters and numbers.
Example: Text to be matched: "W3school"
Regular expression: "[A-z]"
Match result: "School"
Regular expression: "[0-9]"
Match result: "3"
Regular expression: "[A-z]"
Match result: "W"
>>> inverse Character Set ([^])
You can use the ^ character at the beginning, such as ' [^ABC] ' to match any character except A, B, and C.
Example: Text to be matched: "W3school"
Regular expression: "[^a-z]"
Match result: "W3"
2. Pre-defined character set
>>> number (\d): matches a numeric character. equivalent to [0-9].
Example: Text to be matched: "W3school"
Regular expression: "\d"
Match result: "3"
>>> non-numeric (\d): matches a non-numeric character. equivalent to [^0-9].
Example: Text to be matched: "W3school"
Regular expression: "\d"
Match result: "Wschool"
>>> white space Character (\s): matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
Example: Text to be matched: "W3 School"
Regular expression: "W3\sschool"
Match result: "W3 School"
>>> non-whitespace character (\s): matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
Example: Text to be matched: "W3 School"
Regular expression: "\s"
Match result: "W3school"
>>> any word character (\w): matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
Example: Text to be matched: "W3_school"
Regular expression: "\w"
Match result: "W3_school"
>>>> any non-word character (\w): matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.
Example: Text to be matched: "[email protected] #W3school"
Regular expression: "\w"
Match result: "[email protected]#]
3. Quantity words
>>> (Pattern) *: Allows mode to repeat 0 or more times
Example: Text to be matched: "www.jianshu.com"
Regular expression: "w*\.jianshu\.com"
Match result: "Www.jianshu.com"
>>> (pattern) +: Allow mode to repeat 1 or more times
Example: Text to be matched: "www.jianshu.com"
Regular expression: "w+\.jianshu\.com"
Match result: "Www.jianshu.com"
>>> (pattern) {M,n}: Allow mode to repeat m~ n times
Example: Text to be matched: "wwwww.jianshu.com"
Regular expression: "w{1,3}\.jianshu\.com"
Match result: "Www.jianshu.com"
4. Boundary matching
>>> Match string Start (^): matches the beginning of each line in multiline mode
Example: Text to be matched: "W3school"
Regular expression: "^W3"
Match result: "W3"
>>> Match string End ($): matches the end of each line in multiline mode
Example: Text to be matched: "W3school"
Regular expression: "ool$"
Match result: "Ool"
>>> matches only the beginning of the string (\a)
Example: Text to be matched: "W3school"
Regular expression: "\AW3"
Match result: "W3"
>>> matches only the end of the string (\z)
Example: Text to be matched: "W3school"
Regular expression: "Ool\z"
Match result: "Ool"
5. Logic, grouping
>>> pipe Symbol (|): The left and right expressions match any one. If | is not included in the (), its scope is the entire regular expression.
Example: Text to be matched: "W3school"
Regular expression: "W3|ol"
Matching results: "W3 ol"
>>> grouping (): The enclosed expression will be grouped, starting from the left side of the expression, each encounter a grouped opening parenthesis ' (', number +1. In addition, grouping expressions as a whole can be followed by a number of words. The | In expression is only valid in this group.
Example: Text to be matched: "W3schoolw3school"
Regular expression: "(W3school) {2}"
Match result: "W3schoolw3school"
Regular expression: "W3 (Sch|ol)"
Match result: "W3sch"
>>> (? P<name>, ...) : grouping, except for the original number, specifies an additional alias.
>>> (? p=<name>): Refers to the string to which the grouping of aliases <name> is matched.
Second, re module
Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern. The RE module enables the Python language to have all the regular expression functionality.
Some important functions in the RE module, such as:
1, Re.compile (strpattern[, flag])
This method is the factory method of the pattern class, which compiles a regular expression in the form of a string into a pattern object. The second parameter, flag, is the matching pattern, and the value can use the bitwise OR operator ' | ' To take effect at the same time, such as re. I | Re. M.
Regular expression modifier-optional flag
Example:
B=re.compile (R "\d+\.\d*", re. I | Re. M
2. Re.search (pattern, string[, flags)
This method is used to find substrings in a string that can match a success. Attempts to match the pattern from the POS subscript of string, returns a match object if the pattern ends with a match, and tries to match the POS after 1 if it does not match, and returns none until Pos=endpos is still not matched. The default values for POS and Endpos are 0 and Len (string), and Re.search () cannot specify both parameters, and the parameter flags specify a matching pattern when compiling pattern.
Re.search
3, Re.split (pattern, string[, Maxsplit])
Returns a list after splitting a string by a substring that can be matched. The maxsplit is used to specify the maximum number of splits and does not specify that all will be split.
Re.split
4, Re. FindAll (pattern, string[, flags])
Searches for a string, returning all matching substrings as a list.
Re. FindAll
5, Re.sub (pattern, REPL, string[, Count])
Replaces the substring string (leftmost and overlapping substrings) of the matching pattern with the given replacement content.
Re.sub
6, Re.escape (String)
You can apply a function that escapes all characters in a string that may be interpreted as a regular operator. You can use this function if the string is long and contains many special characters, and you do not want to enter a large number of backslashes.
Re.escape
Above
My book of Jane's Address: http://www.jianshu.com/u/da1677475c27
python-Regular Expressions