Introduction to 1.re
The Python re module, while not satisfying all the complex matches, is sufficient to effectively parse the complex strings and extract relevant information in most cases. Python converts regular expressions into bytecode, using the C-language matching engine for depth-first matching.
Copy Code code as follows:
Import re
Print re.__doc__
You can query the function information of the RE module, the following will be combined with several examples to illustrate.
2.re Regular-expression syntax
The Regular Expression syntax table is as follows:
Grammar |
Significance |
Description |
"." |
Any character |
|
"^" |
String start |
' ^hello ' matches ' HelloWorld' and does not match ' aaaahellobbb ' |
"$" |
End of string |
In the same vein |
"*"
|
0 or more characters (greedy match)
|
<*> Matching <title>chinaunix</title> |
"+"
|
1 or more characters (greedy match )
|
In the same vein
|
"?"
|
0 or more characters (greedy match )
|
In the same vein
|
*?,+?,??
|
The above three takes the first matching result (not greedy match ) |
<*> Matching <title>
|
{M,n}
|
Repeat M to n times for the previous character, {m} can also
|
A{6} matches 6 A, a{2,4} matches 2 to 4 a |
{m,n}?
|
Repeat M to n times for the previous character and take as few
|
a{2,4} in ' Aaaaaa ' will only match 2 |
"\\"
|
Special word escape or special sequence |
|
[]
|
Represents a character set |
[0-9], [A-Z], [a-z], [^0] |
"|"
|
Or |
a| B, or operation |
(...)
|
Match any expression in parentheses |
|
(?#...)
|
Note, you can ignore |
|
(?=...)
|
Matches if ... matches next, but doesn ' t consume the string.
|
' (? =test) ' matches in hellotest Hello |
(?! ...)
|
Matches if ... doesn ' t match next.
|
'(?! =test) ' if Hello is not followed by test, match hello
|
(<= ...)
|
Matches if preceded by ... (must be fixed length).
|
' (? <=hello) test ' matches test in Hellotest
|
(?<!...)
|
Matches if not preceded by ... (must be fixed length).
|
' (? <!hello)test ' does not match test in hellotest
|
The regular expression Special sequence table is as follows:
Special sequence Symbols
|
Significance |
\a
|
Match only at start of string |
\z
|
Match only at end of string |
\b
|
Matches an empty string at the beginning or end |
\b
|
Matches an empty string that is not at the beginning or end |
\d
|
equivalent to [0-9] |
\d
|
equivalent to [^0-9] |
\s
|
Match any whitespace characters: [\t\n\r\r\v] |
\s
|
Match any non-white-space character:[^\t\n\r\r\v] |
\w
|
Match any number and letter: [A-za-z0-9] |
\w
|
Match any non-numeric and alphabetic: [^a-za-z0-9] |
Main functional functions of 3.re
Common functional functions include: Compile, search, match, split, FindAll (Finditer), Sub (SUBN)
Compile
Re.compile (pattern[, flags])
Function: Converts regular expression syntax into regular expression objects
The flags definition includes:
Re. I: Ignore case
Re. L: Represents special character Set \w, \w, \b, \b, \s, \s dependent on the current environment
Re. M: Multi-line mode
Re. S: '. ' and include any characters, including line breaks (note: '. ' does not include line breaks '
Re. U: Special Character Set \w, \w, \b, \b, \d, \d, \s, \s dependent on Unicode character Property database
Search
Re.search (pattern, string[, flags])
Search (string[, pos[, Endpos])
Action: Finds the position in the string that matches the regular expression pattern, returns an instance of Matchobject, or none if no matching location is found.
Match
Re.match (pattern, string[, flags])
Match (string[, pos[, Endpos])
Function: the match () function attempts to match the regular expression only at the beginning of the string, which is only to report a match starting at position 0, and the search () function scans the entire string to find a match. If you want to search the entire string to find a match, you should use Search ().
Here are a few examples:
For example: The most basic usage, through re. Regexobject Object Invocation
Copy Code code as follows:
#!/usr/bin/env python
Import re
R1 = Re.compile (R ' World ')
If R1.match (' HelloWorld '):
print ' Match succeeds '
Else
print ' match fails '
If R1.search (' HelloWorld '):
print ' Search succeeds '
Else
print ' Search fails '
Explain: R is raw (original) meaning. Because there are some escape characters in the presentation string, such as carriage return ' \ n '. If you want to indicate that the table needs to be written as ' \ n '. But if I just need to express a ' \ ' + ' n ', do not use the R way to write as: ' \\n '. But using R means R ' \ n ' is so much clearer.
Example: Setting up Flag
Copy Code code as follows:
#r2 = Re.compile (R ' n$ ', re. S
#r2 = Re.compile (' \n$ ', re. S
r2 = re.compile (' world$ ', re. I)
If R2.search (' helloworld\n '):
print ' Search succeeds '
Else
print ' Search fails '
Example: direct call
Copy Code code as follows:
If Re.search (R ' ABC ', ' Helloaaabcdworldn '):
print ' Search succeeds '
Else
print ' Search fails '
Split
Re.split (Pattern, string[, maxsplit=0, flags=0])
Split (string[, maxsplit=0])
Function: You can split a string to match the part of a regular expression and return a list
Example: Simple analysis of IP
Copy Code code as follows:
#!/usr/bin/env python
Import re
R1 = re.compile (' w+ ')
Print r1.split (' 192.168.1.1 ')
Print Re.split (' (w+) ', ' 192.168.1.1 ')
Print Re.split (' (w+) ', ' 192.168.1.1 ', 1)
The results are as follows:
[' 192 ', ' 168 ', ' 1 ', ' 1 ']
[' 192 ', '. ', ' 168 ', '. ', ' 1 ', '. ', ' 1 ']
[' 192 ', '. ', ' 168.1.1 ']
FindAll
Re.findall (pattern, string[, flags])
FindAll (string[, pos[, Endpos])
Action: Find all substrings in the string that match the regular expression and form a list to return
Find [] included content (greedy and non-greedy lookup)
Copy Code code as follows:
#!/usr/bin/env python
Import re
R1 = Re.compile (' ([. *]) ')
Print Re.findall (r1, "Hello[hi]heldfsdsf[iwonder]lo")
R1 = Re.compile (' ([. *?]) ')
Print Re.findall (r1, "Hello[hi]heldfsdsf[iwonder]lo")
Print Re.findall (' [0-9]{2} ', ' FDSKFJ1323JFKDJ ')
Print Re.findall (' ([0-9][a-z]) ', "FDSKFJ1323JFKDJ")
Print Re.findall (' =www) ', "afdsfwwwfkdjfsdfsdwww")
Print Re.findall (' <=www) ', "afdsfwwwfkdjfsdfsdwww")
Finditer
Re.finditer (pattern, string[, flags])
Finditer (string[, pos[, Endpos])
Description: Similar to FindAll, find all substrings in the string that match the regular expression and form an iterator to return. Likewise regexobject are:
Sub
Re.sub (Pattern, REPL, string[, Count, flags))
Sub (REPL, string[, count=0])
Description: Finds all substrings that match the pattern of regular expressions in string strings and replaces them with another string repl. If no string matching pattern is found, a string that is not modified is returned. Repl can be either a string or a function.
Cases:
Copy Code code as follows:
#!/usr/bin/env python
Import re
p = re.compile (' (one|two|three) ')
Print p.sub (' num ', ' one word two words three words apple ', 2)
Subn
RE.SUBN (Pattern, REPL, string[, Count, flags))
Subn (Repl, string[, count=0])
Description: The function has the same function as sub (), but it also returns a new string and the number of substitutions. Likewise regexobject are: