Regular expressions in Python

Last Update:2015-08-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the Prelude:
1. Regular expressions: Regular expressions are a powerful weapon for matching strings. Its design idea is to use a descriptive language to define a rule for a string, and any string that conforms to the rule, we think it "matches", otherwise the string is illegal.

2. Because the regular expression is also represented by a string, we want to first understand how to use characters to describe a character, the following are the most commonly used descriptive characters:
\d: Indicates that a number can be matched, such as: ' 00\d ' can match ' 001 ', ' \d\d\d ' can match ' 010 '
\w: Indicates that a letter or number can be matched, for example: ' \w\w\w ' can match ' abc '
All of the above matches are exact matches, that is, a letter or a number to match one by one of the corresponding

3. So, for the variable length matching we need to use other characters, with * for any character (including 0), with a + to represent at least one character, with? represents 0 or 1 characters, with {n} representing n characters, with {n,m} representing n-m characters, in fact, as long as you understand that these symbols correspond to the role, It is not very difficult to understand that you can use it again by practicing.
Example: \d{3}\s+\w{3,8}:\d{3} represents three number combinations, \s+ has at least one space, \w{3,8} indicates 3 to 8 letters

4. To make a more accurate match, you can use [] to represent the range

(1). For example: [0-9a-za-z\_] can match a number, letter or underline, note that there is an underscore in front of the need to use a \ to escape the character;

[0-9a-za-z\_]+ can match a string of at least one number, letter, or underscore, note that a \ is used before the underscore to escape the character;

[A-za-z\_] [0-9a-za-z\_]* can be matched by a letter or underscore, followed by a string consisting of a number, letter, or underscore;

[A-za-z\_] [0-9a-za-z\_] {0, 19} More precisely restricts the length of the variable to 1-20 characters (the preceding 1 characters + 19 characters later);
^ Represents the beginning of a line, and ^\d indicates that it must begin with a number.
$ represents the end of the line, and \d$ indicates that it must end with a number.
(2). In fact [] the role of a delimiter, each delimiter in the middle of the required number or letter range, followed by the value of {} is determined in the range of [].

Second, use:
1. We are using the match method in the RE module for matching, Re.match (R ' Regular expression ', ' need matching string '), we strongly recommend using the Python r prefix, do not consider the problem of escaping.
The 2.match () method determines if the match is successful, returns a Match object if the match succeeds, otherwise returns none, generally we can use it after the IF statement to make a successful match.

Example: Import re,time
A=input (' Please enter a landline number where you want to find your place of ownership ')
If Re.match (R ' [0-9]{4}[\-][0-9]{7} ', a):
Print (' The number you entered is correct, please wait for your inquiry ... ')
Time.sleep (3)
Print (' Hello, this is Xinjiang's landline number ')
Else
Print (' Input error please re-enter ')

>>> ================================ RESTART ================================
>>>
Please enter a landline number 0993-6999720 where you want to find your place of ownership
The number you entered is correct, please wait for your inquiry ...
Hello, this is the landline number of Xinjiang

3. If you want to match a comma, underline these special characters, you need to precede the regular expression with a \.

Three, cut the string:
1. We can use split in regular expressions using re.split (regular expression, a string that needs to be split), where the regular expression is split as the basis of the split string.

Example:>>> re.split (R ' [\s\;\,]+ ', ' as,sd we;was;,;bi ')

[' as ', ' SD ', ' we ', ' was ', ' bi ']
We can use this method to convert the non-canonical input to the correct list output.

Four, group:
1. In addition to simply judging whether or not to match, the regular expression also has the powerful function of extracting the substring, in () means the grouping to be extracted (group), in fact, the original []{} division () is expanded to represent a grouping.

Example: >>> m = re.match (R ' ^ ([0-9]{4})-([0-9]{7}) $ ', ' 1111-6666666 ')
>>> m
<_sre. Sre_match object; span= (0,), match= ' 1111-6666666 ' >
>>> M.group (1)
' 1111 '
>>> M.group (0)
' 1111-6666666 '
>>> M.group (2)
' 6666666 '
>>> m.groups ()
(' 1111 ', ' 6666666 ')
2. The occurrence of a () in the regular expression indicates a grouping, each () will match the subsequent string, the match succeeds in the grouping, we can use group to intercept the matching values in each group.
Note that group (0) is all intercepted, and the parameters inside are 0-based, and groups () returns the matching values in all groups as tuples

Five, greedy match:
1. A greedy match exists for grouping matches on regular expressions, because the regular match is a greedy match by default, which is to match as many characters as possible

Example: >>> m = re.match (R ' ^ (\d+) (6*) $ ', ' 1101016666666 ')
>>> m.groups ()
(' 1101016666666 ', ')
In this case the first group will all match the string, so the second grouping is the null value

2. Do we add a question mark to the last face of the first group? Make the group adopt a non-greedy match

Example: >>> m = re.match (R ' ^ (\d+?) (6*) $ ', ' 1101016666666 ')
>>> m.groups ()
(' 110101 ', ' 6666666 ')

Six, compile:
1. When we use regular expressions in Python, there are two things inside the RE module: ①, compiling a regular expression, or an error if the string of the regular expression itself is not valid;
②, use the compiled regular expression to match the string.

2. If a regular expression is to be reused thousands of times, for efficiency reasons, we can precompile the regular expression, and we don't need to compile this step when we re-use it.
Direct match: >>> m = re.compile (R ' ^ (\d+?) (6*) $ ')
>>> m.match (' 1231313666666 '). Groups ()
(' 1231313 ', ' 666666 ')
The M object is generated after compilation, because the object itself contains a regular expression, so the corresponding method is called without giving a regular string.

Vii. Summary:
For regular expressions I think it can be understood that the so-called match is to see if the string needs to be checked to meet the regular rules you write, so-called regular rules are you define a number of situations, such as [com]* this regular rule, * means matching 0 or more times, so this regular rule we can understand as ' C ', ' com ', ' Co ', ' Om ', ' comcom ' and many more strings by c,o,m combination, as long as these conditions are counted to match successfully,
And [COM] means (the default 1 times) only match ' com ' and for the other cases mentioned above is wrong (note that if the parentheses are fixed string then {n} is a match n times, such as (COM) {2} So it is necessary to match comcom), I think the regular expression is actually a function of checking.

Regular expressions in Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular expressions in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Regular expressions in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support