Python Regular Expressions

Source: Internet
Author: User
Tags locale locale setting

1. The regular expression base 1.1. Simple Introduction

Regular expressions are not part of Python. Regular expressions are powerful tools for working with strings, with their own unique syntax and an independent processing engine, which may not be as efficient as Str's own approach, but very powerful. Thanks to this, in the language that provides the regular expression, the syntax of the regular expression is the same, except that the number of grammars supported by different programming languages is different; but don't worry, the unsupported syntax is usually the less common part. If you've already used regular expressions in other languages, simply take a look and get started.

Shows the process of matching using regular expressions:

The approximate matching process for regular expressions is to take out the expression and compare the characters in the text, and if each character matches, the match succeeds; If a match is unsuccessful, the match fails. If there are quantifiers or boundaries in an expression, the process can be slightly different, but it is also well understood, with examples of fancy and a few more times you can understand them.

Lists the regular expression meta characters and syntax supported by Python:

1.2. Greedy mode and non-greedy mode of counting quantifiers

Regular expressions are typically used to find matching strings in text. The number of words in Python is greedy by default (which may be the default non-greedy in a few languages), always trying to match as many characters as possible, and not greedy, instead, always trying to match as few characters as possible. For example: the regular expression "ab*" will find "abbb" if it is used to find "ABBBC". And if you use a non-greedy quantity word "ab*?", you will find "a".

1.3. Anti-slash puzzle

As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes, and then escaped in the regular expression into a backslash. The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\ \". Similarly, a "\\d" that matches a number can be written as r "\d". With the native string, you no longer have to worry about missing the backslash, and the expression is more intuitive.

1.4. Matching mode

Regular expressions provide some matching patterns that are available, such as ignoring case, multiline matching, and so on, which is described in the factory method Re.compile (pattern[, flags]) of the pattern class.

2. Re module 2.1. Start using RE

Python provides support for regular expressions through the RE module. The general step for using re is to compile the string form of the regular expression into a pattern instance, then use the pattern instance to process the text and get the matching result (a match instance), and finally use the match instance to get the information and do other things.

# Encoding:utf-8import RE # compiles the regular expression into a pattern object pattern = Re.compile (R ' Hello ') # matches the text with pattern, gets the matching result, and returns Nonematch when the match is not matched Pattern.match (' Hello world! ') if match:    # use Match to get Group Info    Print Match.group () # # # output # # # # # Hello

Re.compile (strpattern[, flag]):

This method is the factory method of the pattern class, which compiles a regular expression in the form of a string into a pattern object. The second parameter, flag, is the matching pattern, and the value can use the bitwise OR operator ' | ' To take effect at the same time, such as re. I | Re. M. Alternatively, you can specify patterns in the Regex string, such as re.compile (' pattern ', re. I | Re. M) is equivalent to Re.compile (' (? im) pattern ').
The optional values are:

      • Re. I (Re. IGNORECASE): Ignore case (full notation in parentheses, same as below)
      • M (MULTILINE): Multiline mode, changing the behavior of ' ^ ' and ' $ ' (see)
      • S (dotall): Point any matching pattern, change '. ' The behavior
      • L (LOCALE): Make a predetermined character class \w \w \b \b \s \s depends on the current locale setting
      • U (Unicode): Make a predetermined character class \w \w \b \b \s \s \d \d Depending on the character attributes of the UNICODE definition
      • X (VERBOSE): Verbose mode. In this mode, the regular expression can be multiple lines, ignore whitespace characters, and can be added to comments. The following two regular expressions are equivalent:
Match

The match object is a matching result that contains a lot of information about this match and can be obtained using the readable properties or methods provided by match.

Property:

    1. string: The text to use when matching.
    2. re: The pattern object to use when matching.
    3. POS: The index in which the text expression begins the search. The value is the same as the parameter with the same name as the Pattern.match () and Pattern.seach () methods.
    4. endpos: The index of the end-of-search text expression. The value is the same as the parameter with the same name as the Pattern.match () and Pattern.seach () methods.
    5. lastindex: The index of the last captured grouping in the text. If there are no captured groupings, it will be none.
    6. Lastgroup: The alias of the last captured group. If the group has no aliases or no captured groupings, it will be none.

Method:

      1. Group ([Group1, ...]):  
        Gets the string that is intercepted by one or more packets; When multiple parameters are specified, they are returned in tuples. Group1 can use numbers or aliases; number 0 represents the entire matched substring; returns Group (0) when no parameters are filled; Groups that have not intercepted a string return none; The group that intercepted multiple times returns the last substring intercepted.
      2. groups ([default]):  
        Returns the string intercepted by all groups as a tuple. Equivalent to calling group (,... last). Default indicates that a group that does not intercept a string is replaced with this value, which defaults to none.
      3. groupdict ([default]):  
        returns a dictionary with aliases for the alias of the group, with the substring intercepted by the group as a value, and no alias group included. The default meaning is the same.
      4. Start ([group]):  
        Returns the starting index of the substring intercepted by the specified group in string (the index of the first character of the substring). The group default value is 0.
      5. End ([group]):  
        Returns the end index of the substring intercepted by the specified group in string (the index of the last character of the substring + 1). The group default value is 0.
      6. span ([group]):  
        Returns (Start (group), End (group)).
# match, starting from the start position match, match successfully returned an object, unmatched successfully returned none match (pattern, string, flags=0) # pattern: Regular Model # string: String to match # Falgs: Horse     Mating mode X VERBOSE Ignore whitespace and comments for nicer looking RE ' s.     I IGNORECASE Perform case-insensitive matching.                    M MULTILINE "^" matches the beginning of lines (after a newline) as well as the string.     "$" matches the end of lines (before a newline) as well as the end of the string.      S Dotall "." matches any character @ all, including the newline.  A ASCII for string patterns, make \w, \w, \b, \b, \d, \d match the corresponding ASCII character                    Categories (rather than the whole Unicode categories, which is the default).           For bytes patterns, this flag was the only available behaviour and Needn ' t was specified. L LOCALE make \w, \w, \b, \b, dependent on the CurrENT locale. U UNICODE for compatibility only. Ignored for string patterns (it's the default), and forbidden for bytes patterns.
#No groupingR = Re.match ("h\w+", origin)Print(R.group ())#get all the results that match to        Print(R.groups ())#gets the grouped results that are matched in the model        Print(R.groupdict ())#gets the grouped results that are matched in the model        #have group        #Why should I have a group? Extracts the specified content that matches successfully (first matches all the regular, then matches the successful local content extracted)R= Re.match ("h (\w+). * (? p<name>\d) $", origin)Print(R.group ())#get all the results that match to        Print(R.groups ())#gets the grouped results that are matched in the model        Print(R.groupdict ())#gets the group in the model to which all keys are executed in the grouping
Demo

Search (string[, pos[, Endpos]) | Re.search (pattern, string[, flags]):
This method is used to find substrings in a string that can match a success. Attempts to match the pattern from the POS subscript of string, returns a match object if the pattern ends with a match, and tries to match the POS after 1 if it does not match, and returns none until Pos=endpos is still not matched.
The default values for POS and Endpos are 0 and Len (string), and Re.search () cannot specify both parameters, and the parameter flags specify a matching pattern when compiling pattern.

# Search, browse the entire string to match the first one, the unmatched successful return none# search (pattern, string, flags=0)
        #No groupingR= Re.search ("a\w+", origin)Print(R.group ())#get all the results that match to        Print(R.groups ())#gets the grouped results that are matched in the model        Print(R.groupdict ())#gets the grouped results that are matched in the model        #have groupR= Re.search ("A (\w+). * (? p<name>\d) $", origin)Print(R.group ())#get all the results that match to        Print(R.groups ())#gets the grouped results that are matched in the model        Print(R.groupdict ())#gets the group in the model to which all keys are executed in the grouping
Demo

Split (string[, Maxsplit]) | Re.split (Pattern, string[, Maxsplit]):
Returns a list after splitting a string by a substring that can be matched. The maxsplit is used to specify the maximum number of splits and does not specify that all will be split.

# split, split string according to regular matches split (pattern, String, maxsplit=0, flags=0) # pattern: Regular Model # string: String to match # Maxsplit: Specify number of Splits # flags
   : Matching mode
        #No groupingOrigin ="Hello Alex bcd Alex Lge Alex ACD"R= Re.split ("Alex", origin, 1)        Print(R)#have groupOrigin="Hello Alex bcd Alex Lge Alex ACD"R1= Re.split ("(Alex)", origin, 1)        Print(R1) R2= Re.split ("(Al (ex))", origin, 1)        Print(R2)
Demo

FindAll (string[, pos[, Endpos]) | Re.findall (pattern, string[, flags]):
Searches for a string, returning all matching substrings as a list.

# FindAll, gets a non-repeating match list, and if one group is returned as a list, each match is a string, and if there are multiple groups in the model, it is returned as a list, and each match is ganso; # empty matches are also included in the results #findall (pattern, String, flags=0)
# No grouping        R = Re.findall ("a\w+", origin)        print(r)        #  There are groups        "Hello Alex bcd abcd Lge ACD"        = Re.findall ("A ((\w*) c) (d)", origin)        print(r)
Demo

Sub (repl, string[, Count]) | Re.sub (Pattern, REPL, string[, Count]):
Returns the replaced string after each matched substring in string is replaced with REPL.
When Repl is a string, you can use \id or \g<id>, \g<name> reference grouping, but you cannot use number 0.
When Repl is a method, this method should only accept one parameter (the match object) and return a string for substitution (the returned string cannot be referenced in the grouping).
Count is used to specify the maximum number of replacements, not all when specified.

# Sub, replacing the specified position string that matches successfully a sub (pattern, REPL, String, count=0, flags=0) # pattern: Regular Model # REPL   : string to replace or executable # string: Word to match String # count  : Specify number of Matches # flags  : Matching pattern
  # unrelated to grouping          "Hello Alex bcd Alex Lge Alex ACD" = Re.sub ("           a\w+""999", origin, 2)        print(r)
Demo

Subn (REPL, string[, Count]) |re.sub (pattern, REPL, string[, Count]):
Returns (Sub (REPL, string[, Count]), number of replacements).

Import Re p = re.compile (R ' (\w+) (\w+) ') s = ' I say, hello world! ' print p.subn (R ' \2 \1 ', s) def func (m):    return M.grou P (1). Title () + "+ m.group (2)." Title () print P.subn (func, s) # # output # # # # (' Say I, World hello! ', 2) # (' I say, hello W Orld! ', 2)

  

Python Regular Expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.