Python Regular Expressions

Source: Internet
Author: User
Tags readable expression engine

Python expression-matching process:

The regular expression engine compiles regular expression literals to get a regular expression object, and then the regular expression object matches the text to match, resulting in matching results, such as matching the resulting text, grouping, and indexing in the text.

Regular expression meta-characters:

Greedy and non-greedy modes of quantitative words:

Regular expressions are typically used to find the string to match in the text. The number of words in a python regular expression is greedy by default (non-greedy in a few languages), always trying to match as many characters as possible, and not greedy instead, always trying to match as few characters as possible. For example: the regular expression ' ab* ' matches ' abbbbc ' to a string of ' abbbb ', and if the regular expression is ' ab*? ', then the match to ' ABBBBC ' is matched to ' a ' (because ' *+? ') The regular expression becomes non-greedy mode.

The haunting of the backslash

As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes, and then escaped in the regular expression into a backslash. The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\ \". Similarly, a "\\d" that matches a number can be written as r "\d". With the native string, you no longer have to worry about missing the backslash, and the expression is more intuitive.

Re module

The Pythont provides support for regular expressions through the re-module. The general step of using re is to compile the text of the positive expression into a Patten instance, then use the Patten instance to process the text and get the matching result (a math instance), finally use the live information of the math instance, and do other things.

#Encoding:utf-8ImportRe#compiling a regular expression into a pattern objectPattern = Re.compile (r'Hello') #Match text with pattern, get match result, cannot match when will return noneMatch = Pattern.match ('Hello world!') ifmatch:#use match to get grouped information    PrintMatch.group ()## # output # # ##Hello

Re.compile (Strpatten[,flag])

This method is called a factory method of the Patten class and is used to compile a regular expression in the form of a string into a Patten object. The second parameter, flag, is a matching pattern, and you can use the bitwise OR operator ' | ' The expression takes effect at the same time, such as re. M|re. I. Re.compile (' Patten ', re. M|re. I) has the same effect as Re.compile (' (? im) Patten ').

Common patterns of matching are:

Re. I: Ignore case

Re. M: Multi-line pattern matching

Re.match

The match object is a matching result that contains a lot of information about this match and can be obtained using the readable properties or methods provided by match.

Property:

    1. string: The text to use when matching.
    2. re: The pattern object to use when matching.
    3. POS: The index in which the text expression begins the search. The value is the same as the parameter with the same name as the Pattern.match () and Pattern.seach () methods.
    4. endpos: The index of the end-of-search text expression. The value is the same as the parameter with the same name as the Pattern.match () and Pattern.seach () methods.
    5. lastindex: The index of the last captured grouping in the text. If there are no captured groupings, it will be none.
    6. Lastgroup: The alias of the last captured group. If the group has no aliases or no captured groupings, it will be none.

Method:

      1. Group ([Group1, ...]):
        Gets the string that is intercepted by one or more groups, and returns a tuple when multiple parameters are specified. Group1 can use numbers or aliases; number 0 represents the entire matched substring; returns Group (0) when no parameters are filled; Groups that have not intercepted a string return none; The group that intercepted multiple times returns the last substring intercepted.
      2. groups ([default]):
        Returns the string intercepted by all groups as a tuple. Equivalent to calling group (,... last). Default indicates that a group that does not intercept a string is replaced with this value, which defaults to none.
      3. Groupdict ([default]):
        Returns a dictionary with aliases for the alias of the group, the value of the substring intercepted by the group, and no alias for the group. The default meaning is the same.
      4. start ([group]):
        Returns the starting index of the substring intercepted by the specified group in string (the index of the first character of the substring). The group default value is 0.
      5. End ([group]):
        Returns the end index of the substring intercepted by the specified group in string (the index of the last character of the substring + 1). The group default value is 0.
      6. span ([group]):
        Returns (Start (group), End (group)).
      7. expand (template):
        Substituting the matched grouping into the template and then returns. The template can be grouped using \id or \g<id>, \g<name> reference, but cannot use number 0. \id and \g<id> are equivalent, but \10 will be considered a 10th grouping, if you want to express \1 after the character ' 0 ', use only \g<1>0.

ImportREM= Re.match (r'( \w+) (\w+) (? p<sign>.*)','Hello world!')Print(m.string)Print(M.group ())Print(M.groups ())Print(M.groupdict ())Print(M.start (1))Print(M.end (2))Print(M.span (3))Print(M.expand (R'\2 \3\g<1>0000000'))#return as follows:" "Hello world! (' Hello ', ' world ') (' Hello ', ' world ', '! ') {' sign ': '! '} 011 (one, one) world!hello0000000" "
Pattern

The Patten object is a compiled regular expression that can be matched to the text by a series of methods provided by pattern.

Pattern cannot be instantiated directly, but can only be constructed by Re.compile ().

The pattern provides several readable properties that the user obtains information about an expression.

1. Pattern: An expression string at compile time.

2, Flags: Compile-time use of the matching mode, digital form.

3. Groups: The number of groupings in an expression.

4, Groupindex: The alias of the group with aliases in the expression is the key to reorganize the corresponding numbered dictionary, the group without aliases is not included.

Import Rep=re.compile (R'(\w+) (\w+) (? p<id>.*)', re. Dotall)print (p.pattern)print(p.flags)print(p.groups)  Print(p.groupindex)# returns as follows:"'(\w+) (\w+) (? p<id>.*) 483{' id ': 3"

Method of Pattern:

    1. Match (string[, pos[, Endpos]) | Re.match (pattern, string[, flags]):
      This method attempts to match pattern from the POS subscript of string, returns a match object if pattern is still matched at the end, or none if pattern does not match during the match, or if the match does not end with Endpos.
      The default values for POS and Endpos are 0 and Len (string), Re.match () cannot specify these two parameters, and the parameter flags specifies the matching pattern when compiling pattern.
      Note: This method is not an exact match. If the string has any remaining characters at the end of the pattern, it is still considered successful. If you want an exact match, you can add the boundary match ' $ ' at the end of the expression.
      See section 2.1 for an example.
    2. Search (string[, pos[, Endpos]) | Re.search (pattern, string[, flags]):
      This method is used to find substrings in a string that can match a success. Attempts to match the pattern from the POS subscript of string, returns a match object if the pattern ends with a match, and tries to match the POS after 1 if it does not match, and returns none until Pos=endpos is still not matched.
      The default values for POS and Endpos are 0 and Len (string), and Re.search () cannot specify both parameters, and the parameter flags specify a matching pattern when compiling pattern.
      1. #Encoding:utf-8ImportRe#compiling a regular expression into a pattern objectPattern = Re.compile (r' World')  #use Search () to find a matching substring, no matching substring will be returned when none is present#using Match () in this example does not match successfullyMatch = Pattern.search ('Hello world!')  ifmatch:#use match to get grouped information    PrintMatch.group ()## # output # # ## World

        3, Split (String[,maxsplit]) |re.split (Pattern,string[,maxsplit])
        Divides the pattern as a delimiter, splits a string, and returns a list after splitting

        Import= re.compile (R'\d+')print p.split (' ONE1TWO2THREE3FOUR4 '  ## # # # # ##  [' One ', ' One ', ' three ', ' four ', ']

        4, FindAll, find all the matching objects, return a list

        Import= re.compile (R'\d+')print p.findall (' ONE1TWO2THREE3FOUR4 '  ## #Output# ## #  [' 1 ', ' 2 ', ' 3 ', ' 4 ']

        5, Finditer, finds all matching objects, returns an iterator of all matching objects sequentially

         import   re p  = Re.compile (r '   

        6, Sub (Repl,string[,count]) |re.sub (Pattern,repl,string[,count])
        Returns the replaced string after each matched substring in string is replaced with REPL.
        When Repl is a string, you can use \id or \g<id>, \g<name> reference grouping, but you cannot use number 0.
        When Repl is a method, this method should only accept one parameter (the match object) and return a string for substitution (the returned string cannot be referenced in the grouping).
        Count is used to specify the maximum number of replacements, not all when specified.

        Importre P= Re.compile (r'(\w+) (\w+)') s='I say, hello world!' PrintP.sub (R'\2 \1', s)deffunc (m):returnM.group (1). Title () +' '+ M.group (2). Title ()PrintP.sub (func, s)## # output # # ##say I, World hello!#I Say, Hello world!

        7.SUBN (Repl,string[,count]) |subn (Pattern,repl,string[,count])
        Returns (sub (Repl,string[,count]) |re.sub (Pattern,repl,string[,count]) Number of replacements

Python Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.