Explore Regular Expressions in Python

Last Update:2015-04-29 Source: Internet

Author: User

Tags valid email address

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Explore Regular Expressions in Python

This article mainly introduces some usage of Regular Expressions in Python. The use of regular expressions is an important knowledge in Python learning. For more information, see

String is the most involved data structure during programming, and the demand for string operations is almost everywhere. For example, to determine whether a string is a valid Email address, although it can be programmed to extract the substring before and after @ and then determine whether it is a word or a domain name separately, this is not only troublesome, but also difficult to reuse the code.

Regular Expressions are powerful weapons used to match strings. It is designed to define a rule for a string in a descriptive language. Any character string that complies with the rule will be considered "matched". Otherwise, this string is invalid.

Therefore, we can determine whether a string is valid by Email:

Create a regular expression that matches the Email;

Use this regular expression to match the user's input to determine if it is legal.

Because regular expressions are also represented by strings, we need to first understand how to use characters to describe characters.

In a regular expression, if a character is directly given, exact match is performed. Use \ d to match a number, and \ w to match a letter or number, so:

'00 \ d' can match '007 ', but cannot match '00a ';

'\ D \ d' can match '010 ';

'\ W \ d' can match 'py3 ';

. Can match any character, so:

'Py. 'Can match 'pyc', 'pyo', and 'py! 'And so on.

To match variable-length characters, in a regular expression, use * to represent any character (including 0 characters), use + to represent at least one character, and use? 0 or 1 characters, {n} represents n characters, and {n, m} represents n-m characters:

Let's take a complex example: \ d {3} \ s + \ d {3, 8 }.

Let's explain from left to right:

\ D {3} indicates that three numbers are matched, for example, '010 ';

\ S can match a space (including tabs and other blank characters), SO \ s + indicates at least one space, such as matching ''and;

\ D {3, 8} indicates 3-8 numbers, for example, '123 '.

In combination, the above regular expression can match a telephone number with a zone number separated by any space.

What if I want to match a number like '010-12345? Because '-' is a special character, it must be escaped using '\' in a regular expression. Therefore, the above regular expression is \ d {3} \-\ d {3 }.

However, it still cannot match '010-12345 'because it contains spaces. Therefore, we need more complex matching methods.

Advanced

To perform more precise matching, you can use [] to indicate the range, for example:

[0-9a-zA-Z \ _] can match a number, letter, or underline;

[0-9a-zA-Z \ _] + can match strings consisting of at least one digit, letter, or underline, such as 'a100', '0 _ Z', and 'py3000;

[A-zA-Z \ _] [0-9a-zA-Z \ _] * can match a string starting with a letter or underline followed by any number, letter, or underline, that is, the Python valid variable;

[A-zA-Z \ _] [0-9a-zA-Z \ _] {0, 19} more precisely, the variable length is limited to 1-20 characters (the first 1 character + the last 19 characters at most ).

A | B can match A or B, so [P | p] ython can match 'python' or 'python '.

^ Indicates the beginning of a row, and ^ \ d indicates that it must start with a number.

$ Indicates the end of the row, and \ d $ indicates that the end must be a number.

You may have noticed that py can also match 'python', but adding ^ py $ will change to a full line match, so you can only match 'py.

Re Module

With the preparation knowledge, we can use regular expressions in Python. Python provides the re module, including all regular expressions. Because the Python string itself uses \ escape, pay special attention to the following:

S = 'abc \-001' # Python string

# The Corresponding Regular Expression string is changed:

# 'Abc \-001'

Therefore, we strongly recommend that you use the r prefix of Python so that you do not need to consider escaping:

S = r'abc \-001' # Python string

# The Corresponding Regular Expression string remains unchanged:

# 'Abc \-001'

First, let's see how to determine whether the regular expression matches:

>>> Import re

>>> Re. match (R' ^ \ d {3} \-\ d {3, 8} $ ', '010-12345 ')

<_ Sre. SRE_Match object at 0x1026e18b8>

>>> Re. match (R' ^ \ d {3} \-\ d {3, 8} $ ', '010 12345 ')

>>>

The match () method is used to determine whether a Match exists. If the match succeeds, a Match object is returned. Otherwise, None is returned. Common judgment methods are:

Test = 'user-input string'

If re. match (r'regular expression', test ):

Print 'OK'

Else:

Print 'failed'

Split string

Splitting strings with regular expressions is more flexible than using fixed characters. Please refer to the normal splitting code:

1 2	>>> 'A B C'. split ('') ['A', 'B', '','', 'C']

Well, we cannot identify consecutive spaces. Try using regular expressions:

1 2	>>> Re. split (R' \ s + ', 'a B C ') ['A', 'B', 'C']

No matter how many spaces are separated. Join and try:

1 2	>>> Re. split (R' [\ s \,] + ', 'a, B, c D ') ['A', 'B', 'C', 'D']

Try again:

1 2	>>> Re. split (R' [\ s \, \;] + ', 'a, B; c D ') ['A', 'B', 'C', 'D']

If you have entered a group of tags, remember to use a regular expression to convert the nonstandard input into a correct array next time.

Group

In addition to a simple match, regular expressions also provide powerful functions to extract substrings. () Indicates the Group to be extracted ). For example:

^ (\ D {3})-(\ d {3, 8}) $ defines two groups respectively. You can extract the area code and local number from the matching string directly:

>>> M = re. match (R' ^ (\ d {3})-(\ d {3, 8}) $ ', '010-12345 ')

>>> M

<_ Sre. SRE_Match object at 0x1026fb3e8>

>>> M. group (0)

'010-12345'

>>> M. group (1)

'0'

>>> M. group (2)

'123'

If a group is defined in a regular expression, the substring can be extracted using the group () method on the Match object.

Note that group (0) is always the original string, group (1), group (2 )...... Indicates 1st, 2 ,...... Substring.

It is very useful to extract substrings. Let's look at a more ferocious example:

>>> T = '19: 05: 30'

>>> M = re. match (R' ^ (0 [0-9] | 1 [0-9] | 2 [0-3] | [0-9]) \ :( 0 [0-9] | 1 [0-9] | 2 [0-9] | 3 [0-9] | 4 [0-9] | 5 [0 -9] | [0-9]) \ :( 0 [0-9] | 1 [0-9] | 2 [0-9] | 3 [0-9] | 4 [0-9] | 5 [0 -9] | [0-9]) $ ', t)

>>> M. groups ()

('19', '05 ', '30 ')

This regular expression can directly identify valid time. However, in some cases, regular expressions cannot be used for full verification, such as date identification:

1	'^ (0 [1-9] \| 1 [0-2] \| [0-9]) -(0 [1-9] \| 1 [0-9] \| 2 [0-9] \| 3 [0-1] \| [0-9]) $'

Invalid dates like '2-30' and '4-31 'cannot be identified by regular expressions, or are difficult to write. In this case, the program must be used for identification.

Greedy match

In the end, we must note that regular expression matching is greedy by default, that is, matching as many characters as possible. For example, match the value 0 after the number:

1 2	>>> Re. match (R' ^ (\ d +) (0 *) $ ', '123'). groups () ('20140901 ','')

Because \ d + uses greedy match, all the following 0 is matched directly, and the result 0 * can only match null strings.

The \ d + must adopt non-Greedy match (that is, as little as possible) to match the following 0 and add? We can make \ d + adopt non-Greedy match:

1 2	>>> Re. match (R' ^ (\ d + ?) (0 *) $ ', '123'). groups () ('20140901', '00 ')

Compile

When we use a regular expression in Python, the re module does two tasks internally:

Compile a regular expression. If the string of the regular expression is invalid, an error is returned;

Use the compiled regular expression to match the string.

If a regular expression needs to be reused several thousand times, we can pre-compile the regular expression for efficiency reasons. Then, we do not need to compile this step when repeating the regular expression. We will directly match the regular expression:

>>> Import re

# Compilation:

>>> Re_telephone = re. compile (R' ^ (\ d {3})-(\ d {3, 8}) $ ')

# Use:

>>> Re_telephone.match ('010-12345 '). groups ()

('010 ', '123 ')

>>> Re_telephone.match ('010-8086 '). groups ()

('010 ', '123 ')

The Regular Expression object is generated after compilation. Because the object itself contains a Regular Expression, you do not need to provide a Regular string when calling the corresponding method.

Summary

Regular Expressions are very powerful, and it is impossible to finish it in a short section. To understand all the regular expressions, you can write a thick book. If you often encounter regular expressions, you may need a reference book for regular expressions.

Please try to write a regular expression to verify the Email address. Version 1 should be able to verify a similar Email:

Someone@gmail.com

Bill.gates@microsoft.com

Try

Version 2 verifies and extracts the Email address with the name:

1	Tom@voyager.org <Tom Paris>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More