Introduction to 1.re the use of the Python re module, although not sufficient to meet all the complex matching situation, but enough in most cases to effectively implement the analysis of complex strings and extract relevant information. Python translates the regular expression into bytecode and uses the C language matching engine for depth-first matching.
Copy CodeThe code is as follows: Import re print re.__doc__
You can query the function information of the RE module, which is illustrated in several examples below.
The regular expression syntax for 2.re Regular expressions syntax is as follows:
Grammar |
Significance |
Description |
"." |
Any character |
|
"^" |
String start |
' ^hello ' matches ' HelloWorld' and does not match ' aaaahellobbb ' |
"$" |
End of string |
In the same vein |
"*" |
0 or more characters (greedy match) |
<*> Matching <title>chinaunix</title> |
"+" |
1 or more characters (greedy match ) |
In the same vein |
"?" |
0 or more characters (greedy match ) |
In the same vein |
*?,+?,?? |
Above three take the first matching result (non-greedy match ) |
<*> Matching <title> |
{M,n} |
Repeat m to n for the previous character, {m} can also |
A{6} matches 6 A, a{2,4} matches 2 to 4 a |
{m,n}? |
Repeat m to n for the first character and take as little as possible |
' Aaaaaa' in a{2,4} will only match 2 |
"\\" |
Special character escapes or special sequences |
|
[] |
Represents a character set |
[0-9], [A-Z], [A-Z], [^0] |
"|" |
Or |
a| B, or arithmetic |
(...) |
Match any expression in parentheses |
|
(?#...) |
Annotations, which can be ignored |
|
(?=...) |
Matches If ... Matches next, but doesn ' t consume the string. |
' (? =test) ' matches hello in hellotest |
(?! ...) |
Matches If ... doesn ' t match next. |
‘(?! =test) ' if Hello is not behind test, match Hello |
(? <= ...) |
Matches if preceded by ... (must be fixed length). |
' (? <=hello) test ' matches test in Hellotest |
(?<!...) |
Matches if not preceded by ... (must be fixed length). |
' (? <!hello)test ' does not match test in Hellotest |
The regular expression special sequence list is as follows:
Special sequence Symbols |
Significance |
\a |
Match only at the beginning of the string |
\z |
Match only at the end of a string |
\b |
Match an empty string at the beginning or end |
\b |
Match an empty string that is not at the beginning or end |
\d |
equivalent to [0-9] |
\d |
equivalent to [^0-9] |
\s |
Match any whitespace character: [\t\n\r\r\v] |
\s |
Match any non-whitespace character:[^\t\n\r\r\v] |
\w |
Match any number and letter: [A-za-z0-9] |
\w |
Match any non-number and letter: [^a-za-z0-9] |
The main function function of 3.re
Common function functions include: Compile, search, match, split, FindAll (Finditer), sub (SUBN) Compile re.compile (pattern[, flags]) Function: Converts the regular expression syntax into a regular expression object The flags definition includes: RE. I: Ignore case re. L: Represents a special character set \w, \w, \b, \b, \s, \s dependent on the current environment re. M: Multi-line mode re. S: '. ' and any character including newline characters (note: '. ' Do not include line breaks ' re. U: Represents a special character set \w, \w, \b, \b, \d, \d, \s, \s dependent on Unicode character Property database
Search Re.search (pattern, string[, flags]) search (string[, pos[, Endpos]): Finds the position in the string that matches the regular expression pattern, returns an instance of Matchobject, If no matching location is found, none is returned.
Match Re.match (pattern, string[, flags]) match (string[, pos[, Endpos]): the match () function attempts to match the regular expression only at the beginning of the string, that is, only report from position 0 The match begins, and the search () function scans the entire string to find a match. If you want to search the entire string for a match, you should use Search ().
Here are a few examples: the most basic usage, through re. Regexobject Object Invocation
Copy CodeThe code is as follows: #!/usr/bin/env python import re r1 = Re.compile (R ' World ') if R1.match (' HelloWorld '): print ' match succeeds ' else: print ' match fails ' if R1.search (' HelloWorld '): print ' search succeeds ' else:print ' search fails '
Note: R is the meaning of raw (raw). Because there are some escape characters in the representation string, such as the carriage return ' \ n '. If you want to indicate \ table needs to be written as ' \ \ '. But if I just need to represent a ' \ ' + ' n ', do not use the R method to write: ' \\n '. But using R means R ' \ n ' is much clearer.
Example: Setting flag
Copy CodeThe code is as follows: #r2 = Re.compile (R ' n$ ', re. S) #r2 = Re.compile (' \n$ ', re. S) r2 = re.compile (' world$ ', re. I) if R2.search (' helloworld\n '): print ' search succeeds ' else:print ' search fails '
Example: calling Directly
Copy CodeThe code is as follows: if Re.search (R ' ABC ', ' Helloaaabcdworldn '): print ' search succeeds ' else:print ' search fails '
Split Re.split (Pattern, string[, maxsplit=0, Flags=0]) split (string[, maxsplit=0]) function: You can split the part of a string match regular expression and return a list Example: Simple analysis IP
Copy CodeThe code is as follows: #!/usr/bin/env python import re r1 = re.compile (' w+ ') print r1.split (' 192.168.1.1 ') print Re.split (' (w+) ', ' 192.168. 1.1 ') Print Re.split (' (w+) ', ' 192.168.1.1 ', 1)
The results are as follows: [' 192 ', ' 168 ', ' 1 ', ' 1 '] [' 192 ', '. ', ' 168 ', '. ', ' 1 ', '. ', ' 1 '] [' 192 ', '. ', ' 168.1.1 ']
FindAll Re.findall (pattern, string[, flags]) FindAll (string[, pos[, Endpos]): Finds all substrings that match the regular expression in the string and makes up a list return example: Find [] What's included (greedy and non-greedy lookups)
Copy CodeThe code is as follows: #!/usr/bin/env python import re r1 = Re.compile (' ([. *]) ') Print Re.findall (R1, "Hello[hi]heldfsdsf[iwonder]lo") r1 = Re.compile (' ([. *?]) ') Print Re.findall (R1, "Hello[hi]heldfsdsf[iwonder]lo") print Re.findall (' [0-9]{2} ', " Fdskfj1323jfkdj ") Print Re.findall (' ([0-9][a-z]) '," FDSKFJ1323JFKDJ ") Print Re.findall (' (? =www) '," Afdsfwwwfkdjfsdfsdwww ") Print Re.findall (' (? <=www) '," afdsfwwwfkdjfsdfsdwww ")
Finditer Re.finditer (pattern, string[, flags]) Finditer (string[, pos[, Endpos]) Description: Similar to FindAll, finds all substrings that match the regular expression in the string, And form an iterator to return. The same regexobject are:
Sub Re.sub (Pattern, REPL, string[, Count, flags]) sub (REPL, string[, count=0]) Description: Finds all substrings that match the regular expression pattern in string strings, using another A string repl to replace it. If no string matching the pattern is found, a string that has not been modified is returned. Repl can be either a string or a function. Cases:
Copy CodeThe code is as follows: #!/usr/bin/env python import re p = Re.compile (' (one|two|three) ') Print p.sub (' num ', ' one word ', ' words three words Apple ', 2)
Subn re.subn (Pattern, REPL, string[, Count, flags]) subn (REPL, string[, count=0])
Description: The function has the same function as a sub (), but it also returns the new string and the number of substitutions. The same regexobject are:
Regular usage of RE regular