Use regular expressions in python
1. Matching characters
The metacharacters in the regular expression are. ^ $ * +? {} [] \ | ()
The matching characters are in the following modes:
\ D match any number
\ D match any non-digit
\ S matches any blank characters
\ S matches any non-space characters
\ W matches any number or letter
\ W matches any non-digit or letter
2. Regular Expression
In python, compile is used to process regular expressions, for example:
Import re;
P = re. compile ('[a-c]');
P. match (s );
S is the string to be matched, and match is the matching method. Similar methods include
Match () Determines matching starting from the beginning of the line
Search () matches at any location
Findall () finds all matched substrings and returns them as substrings.
Finditer () finds all matched substrings and returns them in the form of an iterator.
There are also many methods to match, such:
Group () returns the string matching the regular expression.
Start () returns the matched start point.
End () returns the matched end point.
Span () returns the matched (start, end) tuples.
Example 1: >>> import re;
>>> P = re. compile ('^ [a-c]')
>>> Q = p. match ("abcd ");
>>> Print q. group ()
AB
>>>> Q. span ()
(0, 2)
Example 2:
>>> Import re
>>> P = re. compile ('\ d + ');
>>> Q = p. findall ('1 and 10 and 20 ');
>>> Print q
['1', '2', '3']
The above matching can also be in another form:
Re. match ('\ d +', 'd23r ')
Example 3:
>>> P = re. match ('\ d +', 'd23r ')
>>> Print p
None
Other matching parameters:
Re. compile ('[a-c]', re. I) re. I indicates case-insensitive
Re. compile ('^ AB $', re. M) re. M indicates that ^ or $ matches the beginning and end of a row and the end of a string. If this flag is not added, it will only match the start and end of the string.
Example 4:
Re. compile (""
[1-3] #1-3
[A-c] # a-c
"", Re. VERBOSE
) Re. VERBOSE enables the regular expression to appear in multiple rows, and you can add comments to each row.
The above match is equivalent to re. compile ('[1-3] [a-c]')
3. Group
Use () for grouping
Example 5:
>>> P = re. compile ('(12) + ')
>>> M = p. match ('20140901 ')
>>> Print m. group ()
121212
The above match is 12 repeat once or multiple times
You can also print group information,
>>> Print m. group (1)
12
Python automatically captures the group information. If you do not want to capture the group information, can you use? :
Example 6:
>>> Import re
>>> S = "hello ab1cd ";
>>> P = re. search ('(? : H. *) (a. *) (c .*)');
>>> Print "a * {0}". format (p. group (1 ))
A * AB
>>> Print "c * {0}". format (p. group (2 ))
C * cd
P. group (0) stores the matching of the entire expression, p. group (1) Stores (. *) Matching information, p. group (2) Stores (B. *), while h. * Because there are? : Not captured
If there are too many groups, it is still difficult to use group labels. In this case, you can name the groups and use them by name.
Example 7:
>>> Import re;
>>> S = "hello ab1cd"
>>> P = re. search ('(? P <a> .*)(? P <c> c .*)');
>>> Print "a * {0}". format (p. group ('A ')
A * AB
>>> Print "c * {0}". format (p. group ('C '))
C * cd
4. Greedy and non-Greedy Models
In greedy mode, * + matches as many characters as possible, for example:
Example 8:
>>> Import re;
>>> P = re. compile ('
>>> M = p. findall ('
>>> Print m;
['<H1>
Sometimes you want it to match two results:
Example 9:
>>> Import re;
>>> P = re. compile ('
>>> M = p. findall ('
>>> Print m;
['<H1>
5. forward and backward delimiters
If Mode A is matched first and mode B is matched, (? = B). If you first match Mode A without B, you can use (?! B ).
Example 10:
>>> Import re;
>>> S = "ab2cd"
>>> M = re. search ("ab2 (? = Cd) ", s );
>>> Print m. group ();
Ab2cd
Example 11:
>>> Import re;
>>> S = 'ab2cd'
>>> M = re. search ('ab2 (?! Cd) ', s );
>>> Print m
None
Similarly, if pattern B is matched and pattern A needs to be present before it, you can use (? <= A) B format,
If pattern B is matched and there is no pattern A before it, you can use (? <! A) Form of B
Example 12:
>>> Import re;
>>> S = "ab2cd ";
>>> M = re. search ('(? <= Ab2) cd ', s)
>>> Print m. group ()
Cd
Example 13:
>>> Import re
>>> String = "ab2cd"
>>> Pattern = re. search (R '(? <! Ab2) cd ', string)
>>> Print pattern;
None