Defined
In essence, a regular expression (or RE) is a small, highly specialized programming language,
(in Python) it is embedded in Python and is implemented through the RE module. The regular expression pattern is
Compiled into a sequence of bytecode, which is then executed by a matching engine written in C.
Binary characters
. ^ $ * + ? { } [ ] | ( ) \
Instance
+ match + number 1 times to unlimited
? 0 to 1 times before matching the number.
{m} Match the previous content m times
{M,n} Match the preceding content m to n times
*?,+?,??, {m,n}? In front of the *,+, and so on are greedy matches, that is, match as much as possible, after adding the number to make it an inert match
From the previous description can see ' * ', ' + ' and ' * ' are greedy, but this may not be what we say,
So, you can add a question mark later, change the strategy to non-greedy, just match as few re as possible. Example
1 Print(Re.findall (R"A (\d+?)","a23b"))#non-greedy mode2>>>['2']3 Print(Re.findall (R"A (\d+)","a23b"))4>>> [' at']
\:
A backslash followed by a meta-character to remove special functions,
A backslash followed by a normal character for special functions.
A string that matches a group of words that reference the ordinal
1 # 2 n=re.search (r " Alex" (Eric) com\2 , " alexericcomeric " ) 3 print (N.span ()) 4 >>> (0)
\d matches any decimal number; it is equivalent to class [0-9].
\d matches any non-numeric character; it is equivalent to class [^0-9].
\s matches any whitespace character; it is equivalent to class [\t\n\r\f\v].
\s matches any non-whitespace character; it is equivalent to class [^ \t\n\r\f\v].
\w matches any alphanumeric character; it is equivalent to class [a-za-z0-9_].
\w matches any non-alphanumeric character; it is equivalent to a class [^a-za-z0-9_]
\b: Matches a word boundary, that is, the position between a word and a space.
Match word boundaries (including start and end), where "words" refer to consecutive letters, numbers, and
A string consisting of underscores. Note that the definition of \b is the junction of \w and \w,
This is a 0-wide qualifier (Zero-width assertions) that matches only the first and final words of a word.
A word is defined as a sequence of alphanumeric characters, so the ending is marked with a blank or non-alphanumeric character.
The following.
1 Print(Re.findall (R"abc\b","dzx &abc SDSADASABCASDSADASDABCASDSA"))2>>>['ABC']3 Print(Re.findall (R"\babc\b","dzx &abc SDSADASABCASDSADASDABCASDSA"))4>>>['ABC']5 Print(Re.findall (R"\babc\b","dzx SABC SDSADASABCASDSADASDABCASDSA"))6>>>[]
Attention:
Re.match (' com ', ' Comwww.runcomoob ')
Re.search (' \dcom ', ' www.4comrunoob.5com ')
Once the match succeeds, it is a match object object, and the match object object has the following methods:
Group () returns a string that is matched by RE
Start () returns the position where the match started
End () returns the position of the end of the match
Span () returns a tuple containing the position of the match (start, end)
Group () returns a string that matches the whole of the RE, and can enter multiple group numbers at a time, corresponding to the string matching the group number.
1. Group () returns the whole string of re-matches,
2. Group (N,M) returns a string that matches the group number n,m and returns the Indexerror exception if the group number does not exist
The 3.groups () groups () method returns a tuple containing all the group strings in the regular expression, from 1 to
The included group number, usually groups () does not require parameters, returns a tuple, and the tuple is a regular
A group defined in an expression.
1 ImportRe2A ="123abc456"3Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (0)#123abc456, return to the whole4Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (1) 5>>>[123]6Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (2) 7>>>[ABC]8Re.search ("([0-9]*) ([a-z]*) ([ 0-9]*)", a). Group (3)9>>>[456]
Group (1) lists the first bracket matching section, Group (2) lists the second bracket matching part, Group (3)
Lists the third bracket matching section.
#Re.findall returns all matching strings as a list#Re.findall can get all the matching strings in the string. such as:p = Re.compile (r'\d+')Print(P.findall ('ONE1TWO2THREE3FOUR4'))>>>['1','2','3','4']#gets all non-character words in the string that contain ' oo 'Text ='Djks#dooljsdj ('Print(Re.findall (R'\w*oo\w*', text))>>>['DOOLJSDJ']
#split a string with a numberp = Re.compile (r'\d+')Print(P.split ('ONE1TWO2THREE3FOUR4'))>>>[' One',' Both','three',' Four',"']#equivalent toPrint(Re.split ('\d+','ONE1TWO2THREE3FOUR4'))>>>[' One',' Both','three',' Four',"']
1A ='abc123abv23456'2b = Re.findall (r'(a)?', a)3 Print(b)4>>>['a',"']5 #Match 23 followed by any one character6b = Re.findall (r'(?: a)?', a)7 Print(b)8>>>['23a',' at']
python-day6-Regular Expressions