Regular expressions, referred to as regex, are descriptive methods of text patterns. For example, \d is a regular expression that represents a numeric character, that is, any number from 0 to 9.
Use steps
The functions of all regular expressions in Python are in the RE module.
The steps for using regular expressions in ▎python are as follows:
① Import the regular expression module with import re;
② creates a Regex object with the Re.compile () function.
③ passes the search () method of the Regex object to the string that you want to find. It returns a Match object.
④ calls the group () method of the match object, returning the string that actually matches the text.
Character classification
character type character meaning
\d 0 to 9 of any number
\d any character except a number from 0 to 9
\w any letter, number, or underscore (word)
\w any character except letters, numbers, and underscores
\s Space, tab, or line break (blank)
\s any character except spaces, tabs, and newline characters
Regular expression symbols
? Match 0 or one of the preceding groupings
* Match 0 or more previous groupings
+ Match one or more of the preceding groupings
| Matches one of several expressions
() use parentheses to create a "group"
{n} matches the previous grouping of n times
{n,} matches n or more preceding groupings
{, m} matches 0 times to M-Times before grouping
{N,m} matches groups that are at least n times, up to M times before
{n,m}? or *? or +? Non-greedy matching of the preceding groupings
^spam string must start with spam
spam$ string must end with spam
. Match all characters, except line breaks
\d, \w, and \s match numbers, words, and spaces
\d, \w, and \s match all characters in words, words, and spaces
[ABC] matches any character in square brackets
[^ABC] matches any character that is not in square brackets
Regular Expression methods
1, compile ()
Passes a string value to Re.compile (), representing the regular expression, which returns a Regex pattern object.
If you want to ignore whitespace characters and comments in a regular expression string, you can pass in the variable re. VERBOSE.
If it is not case-sensitive, you can pass in the re. IgnoreCase or RE.I.
If you want a period character to match a line break, you can pass in the re. Dotall.
The Re.compile () function takes only one value as its second argument, and can be combined with a pipe character to circumvent this limitation.
>>> Import re>>> phonenum=re.compile (R ' \d\d\d-\d\d\d\d\d\d\d\d ')
2. Group ()
The match object has a group () method that returns the text that is actually matched in the found string.
Adding parentheses creates a "group" in the regular expression. The first pair of parentheses in the regular expression string is group 1th. The second pair of parentheses is group 2nd. Passing an integer 1 or 2 to the group () Matching object method allows you to get different parts of the matched text. Passing a 0 or no parameter to the group () method returns the entire matched text. If you want to get all the groupings at once, use the groups () method.
>>> Import re>>> Phonenum=re.compile (R ' (\d\d\d)-(\d\d\d\d\d\d\d\d) ') >>> mo= Phonenum.search (' My number is 021-68000000 ') >>> print (Mo.group (0)) 021-68000000>>> print (Mo.group ( 1)) 021>>> print (Mo.group (2)) 68000000>>> print (Mo.groups ()) (' 021 ', ' 68000000 ')
3. Search ()
The search () method of the Regex object looks for the passed-in string, looking for all occurrences of the regular expression. If the regular expression pattern is not found in the string, the search () method returns none. If the pattern is found, the search () method returns a Match object.
>>> Import re>>> phonenum=re.compile (R ' \d\d\d-\d\d\d\d\d\d\d\d ') >>> Mo=phonenum.search ( ' My number is 021-68000000 ') >>> print (Mo.group ()) 021-68000000
4, FindAll ()
Search () returns a Match object containing the "first" matching text in the found string, and the FindAll () method returns a set of strings containing all the matches in the found string.
▎ as the return result of the FindAll () method, there are two points to note:
① If the call is on a regular expression that does not have a grouping, such as \d\d\d-\d\d\d-\d\d\d\d, a list of matching strings is returned, such as [' 123-456-7890 ', ' 000-000-0000 '].
② if called on a regular expression that has a grouping, for example (\d\d\d)-(\d\d\d)-(\d\d\d\d), returns a list of the tuples of a string, such as [(' 123 ', ' 456 ', ' 7890 '), (' 000 ', ' 000 ', ' 0000 ')]
>>> Import re>>> Phonenum=re.compile (R ' (\d\d\d) ') >>> phonenum.search (' 68000000 ') <_ Sre. Sre_match object; Span= (0, 3), match= ' 680 ' >>>> phonenum.findall (' 68000000 ') [' 680 ', ' 000 ']
5, Sub ()
The sub () method requires the passing of two parameters. The first argument is a string that replaces the found match. The second argument is a string, which is a regular expression. The sub () method returns the string after the replacement is complete.
>>> Import re>>> phonenum=re.compile (R ' 021-6800 ') >>> phonenum.sub (' 8800 ', ' My number is 021-68000000. ') ' My number is 88000000. '
Greed and non-greed
Python's regular expressions are "greedy" by default, which means that they match the longest string possible in the case of two semantics. The "non-greedy" version of the curly braces matches the shortest possible string, which is followed by a question mark after the closing curly brace.
A question mark may have two meanings in a regular expression: declaring a non-greedy match or representing an optional grouping. These two meanings are completely irrelevant.
>>> Import re>>> Phonenum01=re.compile (R ' (\d\d\d) {1,3} ') >>> Phonenum02=re.compile (R ' (\d \d\d) {1,3}? ') >>> mo01=phonenum01.search (' 68000000 ') >>> mo02=phonenum02.search (' 68000000 ') >>> Mo01.group () ' 680000 ' >>> mo02.group () ' 680 '
This article is from the "garbled Age" blog, please be sure to keep this source http://juispan.blog.51cto.com/943137/1949567
[Python 3 Series] Regular expression