Having just touched the re module in Python, it's hard to learn because it wasn't a systematic study of regular expressions, so write this blog to accumulate and consolidate for later use.
The application of regular expressions is very extensive, whether in Linux or in programming, we always encounter regular expressions, through the opportunity to learn python, but also a little systematic study of regular expressions.
I looked at the help of the regular expression of the document, but also on the Internet to read some relevant information, the online information on this introduction is still a lot.
Here are some of your learning experiences:
The ' * ' symbol is most commonly used in wildcards, and we often use it to match any character, as in Re, where ' * ' means: match 0 or more characters .
Print (Re.match (R ' ab* ', ' ABB '). Group ())
In the above example, * indicates a character that matches multiple B endings.
‘.‘ This symbol is dot, dot character, which means: matches any character.
When:
Print (Re.match (R '. * ', ' abc\ndef '). Group ())
Represents a matching row, adding the function re. Dotall, matches the entire string, multiple lines.
Print (Re.match (R '. * ', ' abc\ndef ', re. Dotall). Group ())
' + ' means: match one or more characters , indicating
Print (Re.match (R ' ab+ ', ' abbbb '))
Matches one or more of the B characters.
‘?‘ means: matches 0 or one character , indicating
Print (Re.match (R ' ab? ', ' abbb '))
will also match, because ABBB contains Ab,a
The ' ^ ' symbol is caret, the caret, which indicates that the first character of a line is matched.
Description: When
Print (Re.findall (R ' ^abc ', ' abc\nabc ',))
Matches the preceding string, returning only one ABC, but the following is true:
Print (Re.findall (R ' ^abc ', ' abc\nabc ', re. MULTILINE))
Match two ABC strings, RE. The multiline function, as the name implies, matches more than one line when matched, so it matches two ABC characters.
The ' $ ' symbol is a string that matches the trailing character of a line.
Description
Print (Re.findall (R ' abc\d$ ', ' ABC1\NABC3 ', re. MULTILINE))
When the re appears. When multiline, it means matching multiple lines.
The ' \ ' escape character, which is often applied in other languages and environments, does not create ambiguity if added escaping.
' [] ' matches the set symbol, indicating the character in the match [] , stating:
Print (Re.search (R ' 0[xx] ([0-9a-fa-f]{6}) ', ' the hex value is 0x2378ad ')
This statement represents the number of hexadecimal matches.
' {m} ' means: match M characters in {} , Description:
Print (Re.match (R ' ab{3,5} ', ' abbbbb '). Group ())
The expression is: match 3-5 B in the string, but Python will match 5 by default, matching a large number. (Greedy mode)
Explain the difference between re.match () and Re.search ()
#!/usr/bin/python#-*-coding:utf-8-*-Import reprint (Re.match (' www ', ' www.runoob.com '). span ()) # matches print at the starting position ( Re.match (' com ', ' www.runoob.com ') # does not match at start position
return Result:
(0, 3) None
#!/usr/bin/python#-*-coding:utf-8-*-Import reprint (re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position ( Re.search (' com ', ' www.runoob.com '). span ()) # does not match at the start position
return Result:
(0, 3) (11, 14)
As you can see from the comparison, the difference between the two is whether the match is started, match is matched from the starting position, and search does not match from the beginning (in fact, understanding both English meaning can also be understood, one is a match, one is search)
Description of the special escape:
\a
matches the start of the string .
\b
matches an empty string (the matching position is easier to understand), but only at the beginning or end of the word . ( also as a split string ) A word is made up of alphanumeric or underscore characters, so the boundaries of a word are blank or non-alphanumeric, and do not include underscores. Note that \b refers to the boundary between \w and \w, so the exact character set definition depends on the value of the Unicode and locale compilation flags. Within the character range, \b represents backspace, which is compatible with Python strings.
\b
Matches an empty string (the matching position is easier to understand), but when it is not at the beginning or end of a word. This is the opposite of \b and is also affected by locale and Unicode settings.
\d
When the Unicode flag is not specified, matches any 10 binary number, which is equivalent to [0-9]. With a Unicode flag, it matches any character that is part of a numeric classification in the Unicode character set.
\d
When the Unicode flag is not specified, matches any non-numeric character equivalent to [^0-9]. With a Unicode flag, it matches any character that is not part of a numeric classification in the Unicode character set.
\s
When locale and Unicode flags are not specified, match any whitespace character, which is equivalent to [\t\n\r\f\v]. With the locale flag, it matches the white space character defined by the current environment. If you have a Unicode flag, any symbols that are classified as whitespace are matched.
\s
When locale and Unicode flags are not specified, matches any non-whitespace character, which is equivalent to [^\t\n\r\f\v]. With the locale flag, it will match the non-whitespace character defined by the current environment. If you have a Unicode flag, any symbols that are not classified as whitespace are matched.
\w
When locale and Unicode flags are not specified, match any alphanumeric character, underscore, which is equivalent to [a-za-z0-9_]. With the locale flag, it will match the current environment-defined letter and [0-9_]. With the Uincode flag, characters that are divided into letters in the Unicode character set and [0-9_] are matched.
\w
When locale and Unicode flags are not specified, match any non-alphanumeric character, underscore, which is equivalent to [^a-za-z0-9_]. With the locale flag, it will match the letter, except for the current environment definition, [0-9_]. With the Uincode flag, the characters that are divided into letters in the Unicode character set are matched, [0-9_].
\z
matches the end of the string .
The above rules are relatively good memories, because they are 22 corresponding.
This article is from the "9651854" blog, please be sure to keep this source http://9661854.blog.51cto.com/9651854/1784290
Python Regular expression (Regular Expressions) Learning