Previously summary
It seems to be the best way to find what you want from a large amount of text content. It is also an indispensable skill to write crawlers. So, don't ink it!
Tutorials from http://www.runoob.com/python3/python3-reg-expressions.html, thanks to the rookie tutorial.
I. The RE module in Python3
Import re
Two. Re.match function
Re.match–> matches a pattern from the starting position of the string, and if the match is not successful, match () returns none, Syntax:
Re.match (pattern, string, flags = 0)
Pattern–> Matching Regular expressions
String–> the string to match.
The flags–> flag bit, used to control the regular match, such as: case sensitivity, multi-line matching, and so on.
The match succeeds Re.match method returns a matching object, otherwise none is returned.
You can use the group (NUM) or groups () match object functions to get a matching expression.
Group (num = 0) –> A string that matches the entire expression, and group () can enter more than one level number at a time, in which case it returns a tuple that contains the corresponding values for those groups.
Groups () –> returns a tuple containing all the group strings, from 1 to the included group number.
Example:
#!/usr/bin/python3
#-*-Coding:utf-8-*-
Import re
Print (Re.match (' www ', ' www.runoob.com). span ()) # matches at the starting position. –> (0, 3)
Print (Re.match (' com ', ' www.runoob.com ') # does not match the starting position .–> None
Example 2:
Import re
Line = ' Cats is smarter than dogs '
Matchobj = Re.match (R ' (. *) is (. *?). * ', line, re. M|re. I)
If Matchobj:
Print (' Matchobj.group (): ', Matchobj.group ())
Print (' Matchobj.group (1): ', Matchobj.group (1))
Print (' Matchobj.group (2): ', Matchobj.group (2))
Else
Print (' No match!! ')
Three. Re.search method
Re.search scans the entire and returns the first successful match. Syntax:
Re.search (Pattern, String, flags=0) –> parameter description is the same as above
A match succeeds in returning a matching object, otherwise none is returned.
The group (num=0) and groups () usages are the same as above.
Example:
Import re
Print (Re.search (' www ', ' www.baidu.com '). span ()) # matches at the starting position. –> (0,3)
Print (re.search (' com ', ' www.baidu.com '. span ()) # does not match the starting position .–> (11,14)
Example 2:
Import re
Line = ' Cats is smarter than dogs '
Searchobj = Re.search (R ' (. *) is (. *?). * ', line, re. M|re. I)
If Searchobj:
Print ('searchobj.group (): ', searchobj. Group ())
Print ('searchobj.group (1): ', searchobj. Group (1))
Print ('searchobj.group (2): ', searchobj. Group (2))
Four. The difference between Re.match and Re.search
Re.match matches only the beginning of the string, and if the string does not begin to match the regular, then the match fails, returning none. The Re.search matches the entire string until a match is found.
Example:
Import re
Line = ' Cats is smarter than dogs '
Matchobj = Re.match (R ' Dogs ', line, re. M|re. I)
If Matchobj:
Print (' Match–>matchobj.group (): ', Matchobj.group ())
Else
Print (' No match ')
Matchobj = Re.search (R ' Dogs ', line, re. M|re. I)
If Matchobj:
Print (' Search–> matchobj.group (): ', Matchobj.group ())
Else
Print (' No match ')
Five. Search and replace
Re.sub (Pattren, Repl, String, count=0)
Parameters:
Pattren–> slightly
Repl–> the replacement string, or a function.
String–> the original string to be looked up for replacement.
Count–> the maximum number of substitutions after a pattern match, and the default of 0 means that all matches are replaced.
Example:
Import re
Phone = ' 2004-959-559 # This is a phone number '
# Delete Comments
num = re.sub (R ' #.*$ ', "", phone)
Print (' phone number ', num)
#移除非数字的内容
num = re.sub (R ' \d, "", phone)
Print (' Phone number: ', num)
Six. The REPL parameter is a function
Import re
#把匹配的数字
def double (matched):
value = Int (Matched.group (' value '))
return str (value * 2)
s = ' a23g4hfd567 '
Print (Re.sub (? p<balue>\d+) ', double, s)
Seven. Regular expression modifier – optional flag
Regular expressions can contain optional flag modifiers to control the matching weapon. The modifier is specified as an optional flag. Multiple flags can be passed by bitwise OR (|) They are to be specified, such as re. I|re. M is set to the I and M flags:
re.l–> case matching is not case sensitive.
Re. l–> localization identification (locale-aware) matching
Re. M–> multi-line matching, affecting ^ and $
Re. S–>. Matches all characters, including line breaks.
Re. U–> parsing characters According to the Unicode character set, this flag affects \w, \w, \b, \b
Re. X–> by giving you a more flexible format so that you can write regular expressions more easily.
Eight. Regular expression pattern
A pattern string uses a special syntax to represent a regular:
Letters and numbers denote themselves. The letters and numbers in a regular expression pattern match the same string.
Most letters and numbers have a different meaning when they are put in front of a backslash.
Punctuation only matches themselves when they are black, otherwise they represent special meanings.
Backslashes themselves need to be escaped with backslash
Since regular expressions usually contain backslashes, you might want to use the original string to represent them. The pattern element (such as R '/t ' = = '//t ') matches the corresponding special character.
The following is a list of the special elements in the regular expression pattern syntax. If you use patterns with optional flag parameters, the meaning of some pattern elements changes.
^–> matches the beginning of a string
$–> matches the end of a string
. –> matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.
[...] –> is used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ', or ' k '
[^ ...] –> characters that are not in []: [^ABC] matches characters other than a,b,c.
A re*–> matches 0 or more expressions.
A re+–> matches 1 or more expressions.
Re? –> matches 0 or 1 fragments defined by a preceding regular expression, not greedy.
Re{n}–>
Re{n,}–> exactly matches n preceding expressions.
Re{n, m}–> matches N to M times the fragment defined by the preceding expression, greedy way
A|b–> match A or b
(re) –> G matches the expression in parentheses, which also represents a group
The (? imx) –> Regular expression contains three optional flags: I,M,X, which affects only the areas in parentheses.
(?-imx) –> Regular expression Close I, M, or x optional flag. Affects only the areas in parentheses.
(?: RE) –> similar (...), but does not represent a group
(? imx:re) –> use I, M, or x optional flag in parentheses
(?-imx:re) –> do not use I, M, or x optional flags in parentheses
(?#...) –> notes
(? = re) –> forward positive qualifier. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.
(?! Re) –> the forward Negation qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string
(?> re) –> a matching standalone mode, eliminating backtracking.
\w–> Match Alpha-numeric
\w–> matches non-alphanumeric numbers
\s–> matches any white space character, equivalent to [\t\n\r\f]
\s–> matches any non-null character
\d–> matches any number, equal parts in [0-9]
\d–> matches any non-numeric
\a–> Match string start
\z–> matches the end of the string, if there is a newline, matches only the end string before the line break.
\z–> Match string End
\g–> matches the position where the last match was completed.
\b–> matches a word boundary, which is the position between a word and a space. For example, ' re\b ' can match ' er ' in ' Never ' but not ' er ' in ' verb '
\b–> matches a non-word boundary. and \b in turn
\ t \ –> matches a newline character tab, and so on.
\1...\9–> matches the sub-expression of the nth grouping.
\10–> matches the sub-expression of the nth grouping, if it is matched, it refers to an expression of octal character code
Nine. Regular expression instances
python–> matching "python"
Character class
[pp]ython–> ' python ' or ' python '
Rub[ye]–> ' Ruby ' or ' rube '
[aeiou]–> matches any one of the letters within the brackets
[0-9]–> matches any number similar to [0123456789]
[a-z]–> matches any lowercase letter
[a-z]–> matches any uppercase letter
[a-za-z0-9]–> can match any letter or number
[^aeiou]–> all characters except the Aeiou letter
[^0-9]–> characters except for numbers
Special character Classes
. –> matches any single character except \ n. to match any character including \ n, use [. \ n's mode ]
\d \d \s \s \w \w.
Regular expression for Python3