Regular expressions
I. Recognition of regular expressions
What is a regular expression?
Regular expression is a separate voice, not only python, many languages can call it, it's self-system
What's the use of regular expressions?
First, let's take a look at some of the functional methods of Python's own string.
str1 = "ABCDEFG" RESULT1 = Str1.find ("b") Result2 = str1.find ("BC") RESULT3 = Str1.split ("b") Result4 = Str1.replace ("AB", "A B ") print (" B's position:%s "% result1) print (" BC Position:%s "% result2) print (" String split to:%s "% result3) print (" The result of string substitution is:%s "% RESULT4)
We can observe that both the search, the segmentation, or the substitution require specific elements to be defined.
The regular expression can give a pattern, meet the requirements can have a lot, not only limited to the unique
Two. The metacharacters of regular expressions and their significance
. : matches any single character except for any line break
^: Anchor at the beginning of the line
$: End of line anchoring
*: Matches any time, the greedy match is equivalent to {0,}
+: Match 1 to several times, greedy match price with {1,}
?: Match 0 or 1 times, greedy match price reduction with {0,1}
{}: Fixed number of times
{3}:3 times
{3,5}:3 to 5 times
\:
1. Backslash followed by the meta-character to remove special functions
2. A backslash followed by ordinary characters to achieve special functions
\d: Matches any single decimal number, which is equivalent to class [0-9], which matches a number
\d: Matches any single non-numeric character, which is equivalent to [^0-9]
\s: matches any single whitespace character, which is equivalent to class [\t\n\r\f\v]
\s: Matches any single non-whitespace character, which is equivalent to [^ \t\n\r\f\v]
\w: Matches any single alphanumeric character; it is equivalent to a class [a-za-z0-9]
\w: Matches any single non-alphanumeric character, which is equivalent to class [^a-za-z0-9]
\b: Matches a word boundary, that is, the position between the word and the space
3. Strings that match the string corresponding to the reference number
Re.search (R "(Alex) (Eric) Com\2", "Alexericcomeric"). Group ()
' Alexericcomeric '
[]: Any one of the character sets, the metacharacters in the character set are not escaped
^: ^ In the character set indicates non-
Re.findall (' [^1-9] ', ' a1b2c3 ')
[' A ', ' B ', ' C ']
\: The use of escaped ordinary character \d,\w,\s is not changed
Re.findall (' [\d] ', ' WW3 wa8.d0 ')
[' 1 ', ' 2 ', ' 3 ']
(): Grouping, making a string of characters as a whole
Greedy mode:
Re.search (R "A (\d+)", "a12345678b"). Group ()
a12345678
Non-greedy mode:
Re.serach (R "A (\d+?)", "a12345678b"). Group ()
A1
Three. The existence mechanism of greedy mode
Re.findall (R "A (\d+?)", "a23b") #非贪婪模式
[' 2 '] #匹配括号里面的
Re.findall (R "A (\d+) b", "a23b") #如果前后均有限定条件, do not match by non-greedy pattern
[' 23 ']
Re.findall (R "A (\d+?) B "," a23b ") #如果前后均有限定条件, does not match the non-greedy pattern
[' 23 ']
Re.findall (R "A (\d+)", "a23b") #贪婪模式
[' 23 ']
17.python full Stack Road: regular expression Full analysis