Regular Expressions (Python)

Source: Internet
Author: User
Tags locale setting

    • Understanding Regular Expressions

A regular expression is a logical formula for a string operation, which is a "rule string" that is used to express a filter logic for a string, using predefined specific characters and combinations of these specific characters. Regular expressions are a very powerful tool for matching strings, and in other programming languages there is also the concept of regular expressions, and Python is no exception, and using regular expressions, we want to extract what we want from the returned page content.

    • The approximate matching process of regular expressions
    1. Take out the expression in turn and compare the characters in the text,
    2. If each character matches, the match succeeds, and the match fails once there is a character that matches unsuccessfully.
    3. If there are quantifiers or boundaries in an expression, the process is slightly different.
    • Syntax rules for regular expressions (Python)

    • Greedy mode and non-greedy mode of quantitative words

Regular expressions are typically used to find matching strings in text. The number of words in Python is greedy by default (which may be the default non-greedy in a few languages), always trying to match as many characters as possible, and not greedy, instead, always trying to match as few characters as possible. For example, if the regular expression "AB" is used to find "ABBBC", "abbb" will be found. If you use a non-greedy quantity word "ab?", you will find "a".

Note: We generally use non-greedy mode to extract.

    • Anti-slash problem

As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, you will need 4 backslashes "\ \" in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes and then escaped in the regular expression into a backslash.

The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\". Similarly, a "\d" that matches a number can be written as r "\d".

    • Python RE Module

Python has its own RE module, which provides support for regular expressions. The main usage examples are as follows:

# return Pattern Object Re.compile (String[,flag])   # The following functions are used to match the

PATTERNG Concept:

Pattern can be understood as a matching pattern, so how do we get this matching pattern? Very simply, we need to use the Re.compile method. For example

Pattern = Re.compile (r'hello')

In the argument we pass in the native string object, build a pattern object by compiling the compile method, and then we use this object for further matching.

In addition, you may notice another parameter, flags, explaining the meaning of this parameter here:

The parameter flag is a matching pattern, and the value can use the bitwise OR operator ' | ' To take effect at the same time, such as re. I | Re. M.

The optional values are:

? re. I (full spell: IGNORECASE): Ignoring case (full notation in parentheses, same as below)? Re. M (full spell: MULTILINE): Multiline mode, changing the behavior of '^' and '$' (see)? re. S (full spell: dotall): Point any match mode, change '. ' behavior? re. L (full spell: locale): Make the predetermined character class \w \w \b \b \s \s depends on the current locale setting? Re. U (full spell: Unicode): Make a predetermined character class \w \w \b \b \s \s \d \d depends on the character attributes of the UNICODE definition? Re. X (full spell: VERBOSE): Verbose mode. In this mode, the regular expression can be multiple lines, ignore whitespace characters, and can be added to comments. 

We need to use this pattern in a few other ways, such as Re.match, which we have described below.

Note: The following seven methods of flags also represent the meaning of the matching pattern, if the pattern generated by the flags have been indicated, then in the following method does not need to pass this parameter.

    • Re.match function

Re.match tries to match a pattern from the beginning of the string.

function Syntax:

Re.match (Pattern, string, flags=0)

Function parameter Description:

Parameters Describe
Pattern Matches a regular expression.
String The string to match.
Flags A flag bit that controls how regular expressions are matched, such as case sensitivity, multiline matching, and so on.

The match succeeds Re.match method returns a matching object, otherwise none is returned.

We can use the group (NUM) or groups () matching object function to get a match expression

Matching Object methods Describe
Group (num=0) A string that matches the entire expression, group () can enter more than one group number at a time, in which case it returns a tuple that contains the corresponding values for those groups.
Groups () Returns a tuple containing all the group strings, from 1 to the included group number.

Instance:

1 #!/usr/bin/python2 ImportRe3  4line ="Cats is smarter than dogs"5  6Matchobj = Re.match (r'(. *) is (. *?). *', line, re. m|Re. I)7  8 ifMatchobj:9    Print("Matchobj.group ():", Matchobj.group ())Ten    Print("Matchobj.group (1):", Matchobj.group (1)) One    Print("Matchobj.group (2):", Matchobj.group (2)) A Else: -    Print("No match!!")

The result of the above instance execution:

Matchobj.group ():  Cats is smarter than Dogsmatchobj.group (1):  catsmatchobj.group (2):  Smarter

    • Re.search method

Re.search tries to match a pattern from the beginning of the string.

function Syntax:

Re.search (Pattern, string, flags=0)

Instance:

1 #!/usr/bin/python2 ImportRe3  4line ="Cats is smarter than dogs";5  6Matchobj = Re.search (r'(. *) is (. *?). *', line, re. m|Re. I)7  8 ifMatchobj:9    Print("Matchobj.group ():", Matchobj.group ())Ten    Print("Matchobj.group (1):", Matchobj.group (1)) One    Print("Matchobj.group (2):", Matchobj.group (2)) A Else: -    Print("No match!!")

Execution Result:

Matchobj.group ():  Cats is smarter than Dogsmatchobj.group (1):  catsmatchobj.group (2):  Smarter
    • The difference between Re.match and Re.search

Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

Instance:

1 ImportRe2 3line ="Cats is smarter than dogs"4 5Matchobj = Re.match (r'Dogs', line, re. m|Re. I)6 ifMatchobj:7     Print("match--matchobj.group ():", Matchobj.group ())8 Else:9     Print("No match!!")Ten  OneSearchobj = Re.search (r'Dogs', line, re. m|Re. I) A ifSearchobj: -     Print("Search--Matchobj.group ():", Searchobj.group ()) - Else: the     Print("No match!!")

Operation Result:

-Matchobj.group ():  dogs
    • Retrieving and replacing

The Python re module provides re.sub to replace matches in a string.

Grammar:

Re.sub (Pattern, Repl, String, max=0)

The returned string is replaced by a match that is not repeated on the leftmost side of the re in the string. If the pattern is not found, the character will be returned unchanged.

The optional parameter count is the maximum number of times a pattern match is replaced, and count must be a non-negative integer. The default value is 0 to replace all matches.

Instance:

1 #!/usr/bin/python2 ImportRe3  4Phone ="2004-959-559 # This is Phone number"5  6 #Delete Python-style Comments7num = Re.sub (r'#.*$',"", phone)8 Print("Phone Num:", num)9  Ten #Remove anything other than digits Onenum = Re.sub (r'\d',"", phone) A Print("Phone Num:", num)

Operation Result:

Phone num:  2004-959-559phone num:  2004959559

Regular Expressions (Python)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.