Python Regular expression (Regular Expressions) Learning

Source: Internet
Author: User

Having just touched the re module in Python, it's hard to learn because it wasn't a systematic study of regular expressions, so write this blog to accumulate and consolidate for later use.

The application of regular expressions is very extensive, whether in Linux or in programming, we always encounter regular expressions, through the opportunity to learn python, but also a little systematic study of regular expressions.

I looked at the help of the regular expression of the document, but also on the Internet to read some relevant information, the online information on this introduction is still a lot.

Here are some of your learning experiences:

The ' * ' symbol is most commonly used in wildcards, and we often use it to match any character, as in Re, where ' * ' means: match 0 or more characters .

Print (Re.match (R ' ab* ', ' ABB '). Group ())

In the above example, * indicates a character that matches multiple B endings.

‘.‘ This symbol is dot, dot character, which means: matches any character.

When:

Print (Re.match (R '. * ', ' abc\ndef '). Group ())

Represents a matching row, adding the function re. Dotall, matches the entire string, multiple lines.

Print (Re.match (R '. * ', ' abc\ndef ', re. Dotall). Group ())

' + ' means: match one or more characters , indicating

Print (Re.match (R ' ab+ ', ' abbbb '))

Matches one or more of the B characters.

‘?‘ means: matches 0 or one character , indicating

Print (Re.match (R ' ab? ', ' abbb '))

will also match, because ABBB contains Ab,a

The ' ^ ' symbol is caret, the caret, which indicates that the first character of a line is matched.

Description: When

Print (Re.findall (R ' ^abc ', ' abc\nabc ',))

Matches the preceding string, returning only one ABC, but the following is true:

Print (Re.findall (R ' ^abc ', ' abc\nabc ', re. MULTILINE))

Match two ABC strings, RE. The multiline function, as the name implies, matches more than one line when matched, so it matches two ABC characters.

The ' $ ' symbol is a string that matches the trailing character of a line.

Description

Print (Re.findall (R ' abc\d$ ', ' ABC1\NABC3 ', re. MULTILINE))

When the re appears. When multiline, it means matching multiple lines.

The ' \ ' escape character, which is often applied in other languages and environments, does not create ambiguity if added escaping.

' [] ' matches the set symbol, indicating the character in the match [] , stating:

Print (Re.search (R ' 0[xx] ([0-9a-fa-f]{6}) ', ' the hex value is 0x2378ad ')

This statement represents the number of hexadecimal matches.

' {m} ' means: match M characters in {} , Description:

Print (Re.match (R ' ab{3,5} ', ' abbbbb '). Group ())

The expression is: match 3-5 B in the string, but Python will match 5 by default, matching a large number. (Greedy mode)


Explain the difference between re.match () and Re.search ()

#!/usr/bin/python#-*-coding:utf-8-*-Import reprint (Re.match (' www ', ' www.runoob.com '). span ()) # matches print at the starting position ( Re.match (' com ', ' www.runoob.com ') # does not match at start position

return Result:

(0, 3) None
#!/usr/bin/python#-*-coding:utf-8-*-Import reprint (re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position ( Re.search (' com ', ' www.runoob.com '). span ()) # does not match at the start position

return Result:

(0, 3) (11, 14)

As you can see from the comparison, the difference between the two is whether the match is started, match is matched from the starting position, and search does not match from the beginning (in fact, understanding both English meaning can also be understood, one is a match, one is search)

Description of the special escape:


\a

matches the start of the string .

\b

matches an empty string (the matching position is easier to understand), but only at the beginning or end of the word . ( also as a split string ) A word is made up of alphanumeric or underscore characters, so the boundaries of a word are blank or non-alphanumeric, and do not include underscores. Note that \b refers to the boundary between \w and \w, so the exact character set definition depends on the value of the Unicode and locale compilation flags. Within the character range, \b represents backspace, which is compatible with Python strings.

\b

Matches an empty string (the matching position is easier to understand), but when it is not at the beginning or end of a word. This is the opposite of \b and is also affected by locale and Unicode settings.

\d

When the Unicode flag is not specified, matches any 10 binary number, which is equivalent to [0-9]. With a Unicode flag, it matches any character that is part of a numeric classification in the Unicode character set.

\d

When the Unicode flag is not specified, matches any non-numeric character equivalent to [^0-9]. With a Unicode flag, it matches any character that is not part of a numeric classification in the Unicode character set.

\s

When locale and Unicode flags are not specified, match any whitespace character, which is equivalent to [\t\n\r\f\v]. With the locale flag, it matches the white space character defined by the current environment. If you have a Unicode flag, any symbols that are classified as whitespace are matched.

\s

When locale and Unicode flags are not specified, matches any non-whitespace character, which is equivalent to [^\t\n\r\f\v]. With the locale flag, it will match the non-whitespace character defined by the current environment. If you have a Unicode flag, any symbols that are not classified as whitespace are matched.

\w

When locale and Unicode flags are not specified, match any alphanumeric character, underscore, which is equivalent to [a-za-z0-9_]. With the locale flag, it will match the current environment-defined letter and [0-9_]. With the Uincode flag, characters that are divided into letters in the Unicode character set and [0-9_] are matched.

\w

When locale and Unicode flags are not specified, match any non-alphanumeric character, underscore, which is equivalent to [^a-za-z0-9_]. With the locale flag, it will match the letter, except for the current environment definition, [0-9_]. With the Uincode flag, the characters that are divided into letters in the Unicode character set are matched, [0-9_].

\z

matches the end of the string .

The above rules are relatively good memories, because they are 22 corresponding.

This article is from the "9651854" blog, please be sure to keep this source http://9661854.blog.51cto.com/9651854/1784290

Python Regular expression (Regular Expressions) Learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.