Python RE Module

Source: Internet
Author: User

Python RE Module

Toc

    • Introduced
      • Role
    • Regular expression syntax
      • Greed and non-greed
      • Ordinary characters and special characters
    • Re Modul level method
    • Regular Expression Object
    • Matching objects
    • Common examples
    • Precautions

Jamie Zawinski said:

Some People,when confronted with a problem,think, "I know,i ' ll use regular expressions."
Now they has a problem!

When people come across a problem, they think of solving it with regular expressions. Their problems immediately turned into two questions.
The regular expression of ridicule is difficult to learn.

    • Python supports regular expressions through the re-module
    • Regular Expressions is also called res,regex,regexp,regex patterns. When you see these names, you know it's regular expressions.
    • The regex nature is a small, highly specialized programming language. Many high-level languages embed Regex,python just through the RE module to allow Python to support regex.
    • For advanced use, it is necessary to pay careful attention to how the engine would execute a given re, and write the RE In a certain on order to produce bytecode that runs faster.
    • Regular expressions are not all-powerful, and there are some character tasks that cannot be handled.
    • A regular expression is a representation of a type of string. A string conforms to a regular expression rule, and we say that the string matches the regular expression.
    • You can do a lot of things after the match. Regular is often used for processing:
      • Replace: Replace the string with a regular expression
      • Intercept: Intercepts a match from a string. Intercept based on Start,end index
      • Match logic judgment: Match search, match, etc.
      • Count: Use Finditer iterative memory iteration count, other ways are also OK
      • Filtering: Filter by matching the result bool
      • Includes: Search
      • Full match: Fullmatch
      • Divider: Split

Attention:

    1. The strings in Python are supported by escape characters, so pay special attention to the regular expressions that are also identified in Python as strings, and that regular expressions also support transfer characters, so avoid python strings being parsed. It is necessary to indicate to Python that the transfer character in the regular expression string is regular, not a Python string, so R ' Partten ' is required so that the characters in the regular expression have no special function. such as: R ' \ n ' is two characters, ' \ n ' is a character that represents a newline.
    2. Both the regular and the string that needs to be matched are either Unicode string or 8-bit string. The two types are not mixed.
Regular expression syntax
    • Expression-string-expression
    • Complex expressions can consist of simple expressions
    • Greedy greedy and non-greedy non-greedy of regular expressions
      • Say ' + ' ' * '? An equal number of polymorphic numbers specifies a greedy match that matches as large a range as possible.
      • Give these three plus '? ', which is non-greedy and go back to match as small a range as possible.
    • Regular expressions contain special characters and ordinary characters
      • Normal characters
        • What ' A ' B ' C ' d '; what ' last ' next ' hello ' is connected by a normal string.
      • Special characters
        • Like '. ' ' () ' | ' etc.
Special Character means Comments
‘.‘ Dot, representing all characters except newline line break If the Dotall flag ID is set, it will contain the newline
^ Represents the beginning of the string, the start of the string In Multline mode also matches the end of each newline in multi-line modes
$ Match the end of the string or just befor the newline at the end of the string In MULTILINE mode also matches before a newline.
' \b ' does not match any string that matches the beginning or end of a word such as R ' \bfoo\b ' can match ' foo ', ' foo ', ' (foo) ', ' Bar foo bar ', but cannot match ' foobar '
' \b ' does not match any string, it is the inverse of the \b
__‘*‘__ Match 0 or more times repeat this symbol before the RE Greed
+ Match 1 or more times repeat this symbol before the RE Greed, note is the front of a re, an ordinary character is a re not all, is the smallest unit of a re. As ab+ matches ab or abbb ... And not Ababab .
‘?‘ Match 0 or 1 times repeat this symbol before the RE Greed
__ ' *?,??, +? ' __ Cut out *,?, + greed, take the minimum range they can match Non-greedy
' {m} ' Repeat the re in front of this symbol for the specified quantity Non-greedy
' {m,n} ' Specifies the number of M to n repeats the re in front of this symbol Greed, n if not specified, at least m repeats re
' {m,n}? ' Specifies the number of M to n repeats the re in front of this symbol, but to match the minimum number Non-greedy
‘‘ Escape special Characters That is, the literal meaning of the special character
‘[]‘ Used to indicate a set of characters Special characters do not have a special meaning in [], but if the ^ symbol is the first character in [], then ^ is meaningful, not the beginning of the string, but the inverse of the complement set. and in [] except for the ^ worsening of the beginning of the symbol is having its own meaning
| Both sides are re, matching left or right, or relationship Non-greedy, left-to-right matching
' \w '
' \w '
' \d '
' \d '
' \s '

Wait a minute .....

Methods provided by the RE module
  • Re.compile (Partten,flag) returns a Regex object, a Regex object that supports many module-level functions with the same functionality as methods such as Prog.search (str), Prog.match (str) ... Wait a minute. --Returns a regular Expression object
  • Re.search (Partten,string,flag) scans the string, finds the first character that matches the regular expression, matches to and returns the Match object, and none of the matches is returned.
  • Re.match (Partten,string,flag) matches partten from the string and returns none if it matches to the return match object.
  • Re.split (partten,string,maxsplite=0,flag=0) separates strings of all matching partten characters in a string as strings. If one does not match on the return list, only string is a string. If the match is delimited, returns the list of separated elements. MAXSPLIT Specifies the maximum number of times a string can be split. It is also important to note that if a grouping is used in Partten, then the grouping will appear in the last delimited list, between each element. is to add the matches in the group to the list as well.

  • Re.findall (partten,string,flag=0) Returns a list of all the strings that match the total of the pattern on the debut list.
  • Re.sub (Pattern, Repl, String, count=0, flags=0) replaces the string match to Partten with REPL. COUNT specifies the number of matches. Returns the replaced string. Repl can also be a function that takes a string and returns a string. Functions are very useful and can add logical judgments. This is very useful!!!
  • Re.escape (str) shifts the characters in Str that have a special meaning in the regular expression. Returns the transferred string.
  • Re.fullmatch (Partten, String, flags=0) matches the entire string partten, returning the RE object, otherwise none is returned.
  • Re.finditer (Partten, String, flags=0) returns a generator, and each time the next () returns a Match object, the match object is on a regular match.

Regular Expression Object

Regular Expression object:

    • Is the object returned by Re.compile (Partte)
    • If a regular expression is used more than once in the code, it is better to compile it into a regular object so that the code runs more efficiently.
    • A regular object has a method that corresponds to all the functions of the RE module.
    • This expression also has the attribute prog.groups is the number of groups that the regular object contains
    • Prog.groupindex is a dictionary of named group names and group IDs
Match Object

Matching objects:

    • The RE modules are returned by the Search,match,fullmatch and finditer generators, and these will match on all as a match object.
    • The match object stores the string on the matching. And if the regular expressions are grouped, they are logically grouped by grouping, providing group () methods for packet access.
    • The Match.group ([group1,...]) parameter is the offset value for the group, the first group is then 1, the second group is 2, and so on. If no parameter is taken or the parameter offset is 0, the entire regular matched string is returned. If it is an offset from a single group, it is a string of a group. If it is an offset of more than one group, a tuple of multiple groups matching strings is returned.
    • Match.group () If the regular expression is a named group, then the name Access group value is also supported. Indexes are also supported.
    • If the group is greedy, it has been last matched to the value of the group.
    • Match.Groups () returns the tuple of all groups, the parameter of this method is the default value, which is used for any character that is not matched by a master.
    • Match.groupdict () returns a named group dictionary and can also pass in a default value parameter when a group does not have a match on it.
    • Note that group () returns a matching string, even if there is no grouping. Groups () must have a grouping to have a value, or it is an empty tuple, groupdict () returns an empty dictionary
    • Match.start () matches the starting position of the string on
    • Match.end () matches the end position of the string on
Re Example

Refer to the official manual re module example

    • Phone Format 1\d{10}
    • Mailbox Format R "^\w+ (.? \w+)@ (\w+.) \w+$ "
    • Format with Name email address
    • IPV4 Address
Attention
    • Regular expressions, in the case of a variable-length greedy configuration, it is important to consider that the subsequent match will affect the range that the variable-length expression can match.

Python RE Module

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.