Python Regular expression (Regular Expressions) Learning

Last Update:2016-05-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Having just touched the re module in Python, it's hard to learn because it wasn't a systematic study of regular expressions, so write this blog to accumulate and consolidate for later use.

The application of regular expressions is very extensive, whether in Linux or in programming, we always encounter regular expressions, through the opportunity to learn python, but also a little systematic study of regular expressions.

I looked at the help of the regular expression of the document, but also on the Internet to read some relevant information, the online information on this introduction is still a lot.

Here are some of your learning experiences:

The ' * ' symbol is most commonly used in wildcards, and we often use it to match any character, as in Re, where ' * ' means: match 0 or more characters .

Print (Re.match (R ' ab* ', ' ABB '). Group ())

In the above example, * indicates a character that matches multiple B endings.

‘.‘ This symbol is dot, dot character, which means: matches any character.

When:

Print (Re.match (R '. * ', ' abc\ndef '). Group ())

Represents a matching row, adding the function re. Dotall, matches the entire string, multiple lines.

Print (Re.match (R '. * ', ' abc\ndef ', re. Dotall). Group ())

' + ' means: match one or more characters , indicating

Print (Re.match (R ' ab+ ', ' abbbb '))

Matches one or more of the B characters.

‘?‘ means: matches 0 or one character , indicating

Print (Re.match (R ' ab? ', ' abbb '))

will also match, because ABBB contains Ab,a

The ' ^ ' symbol is caret, the caret, which indicates that the first character of a line is matched.

Description: When

Print (Re.findall (R ' ^abc ', ' abc\nabc ',))

Matches the preceding string, returning only one ABC, but the following is true:

Print (Re.findall (R ' ^abc ', ' abc\nabc ', re. MULTILINE))

Match two ABC strings, RE. The multiline function, as the name implies, matches more than one line when matched, so it matches two ABC characters.

The ' $ ' symbol is a string that matches the trailing character of a line.

Description

Print (Re.findall (R ' abc\d$ ', ' ABC1\NABC3 ', re. MULTILINE))

When the re appears. When multiline, it means matching multiple lines.

The ' \ ' escape character, which is often applied in other languages and environments, does not create ambiguity if added escaping.

' [] ' matches the set symbol, indicating the character in the match [] , stating:

Print (Re.search (R ' 0[xx] ([0-9a-fa-f]{6}) ', ' the hex value is 0x2378ad ')

This statement represents the number of hexadecimal matches.

' {m} ' means: match M characters in {} , Description:

Print (Re.match (R ' ab{3,5} ', ' abbbbb '). Group ())

The expression is: match 3-5 B in the string, but Python will match 5 by default, matching a large number. (Greedy mode)

Explain the difference between re.match () and Re.search ()

#!/usr/bin/python#-*-coding:utf-8-*-Import reprint (Re.match (' www ', ' www.runoob.com '). span ()) # matches print at the starting position ( Re.match (' com ', ' www.runoob.com ') # does not match at start position

return Result:

(0, 3) None

#!/usr/bin/python#-*-coding:utf-8-*-Import reprint (re.search (' www ', ' www.runoob.com '). span ()) # matches print at the starting position ( Re.search (' com ', ' www.runoob.com '). span ()) # does not match at the start position

return Result:

(0, 3) (11, 14)

As you can see from the comparison, the difference between the two is whether the match is started, match is matched from the starting position, and search does not match from the beginning (in fact, understanding both English meaning can also be understood, one is a match, one is search)

Description of the special escape:

matches the start of the string .

matches an empty string (the matching position is easier to understand), but only at the beginning or end of the word . ( also as a split string ) A word is made up of alphanumeric or underscore characters, so the boundaries of a word are blank or non-alphanumeric, and do not include underscores. Note that \b refers to the boundary between \w and \w, so the exact character set definition depends on the value of the Unicode and locale compilation flags. Within the character range, \b represents backspace, which is compatible with Python strings.

Matches an empty string (the matching position is easier to understand), but when it is not at the beginning or end of a word. This is the opposite of \b and is also affected by locale and Unicode settings.

When the Unicode flag is not specified, matches any 10 binary number, which is equivalent to [0-9]. With a Unicode flag, it matches any character that is part of a numeric classification in the Unicode character set.

When the Unicode flag is not specified, matches any non-numeric character equivalent to [^0-9]. With a Unicode flag, it matches any character that is not part of a numeric classification in the Unicode character set.

When locale and Unicode flags are not specified, match any whitespace character, which is equivalent to [\t\n\r\f\v]. With the locale flag, it matches the white space character defined by the current environment. If you have a Unicode flag, any symbols that are classified as whitespace are matched.

When locale and Unicode flags are not specified, matches any non-whitespace character, which is equivalent to [^\t\n\r\f\v]. With the locale flag, it will match the non-whitespace character defined by the current environment. If you have a Unicode flag, any symbols that are not classified as whitespace are matched.

When locale and Unicode flags are not specified, match any alphanumeric character, underscore, which is equivalent to [a-za-z0-9_]. With the locale flag, it will match the current environment-defined letter and [0-9_]. With the Uincode flag, characters that are divided into letters in the Unicode character set and [0-9_] are matched.

When locale and Unicode flags are not specified, match any non-alphanumeric character, underscore, which is equivalent to [^a-za-z0-9_]. With the locale flag, it will match the letter, except for the current environment definition, [0-9_]. With the Uincode flag, the characters that are divided into letters in the Unicode character set are matched, [0-9_].

matches the end of the string .

The above rules are relatively good memories, because they are 22 corresponding.

This article is from the "9651854" blog, please be sure to keep this source http://9661854.blog.51cto.com/9651854/1784290

Python Regular expression (Regular Expressions) Learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Regular expression (Regular Expressions) Learning

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support