Python Regular Expressions

Source: Internet
Author: User

There are two basic operations for regular expressions, namely, matching and substitution.

Matching is the search for a particular expression in a text string;

The substitution is to find and replace a string in a string that matches a particular expression.
1. Basic elements
A regular expression defines a series of special character elements to perform a matching action.

Regular Expression Basic character

character Description
Text Match text string
. Match any single character except for a line break
^ Matches the beginning of a string
$ Matches the end of a string

In regular expressions, we can also use match qualifiers to constrain the number of matches.
Match qualifier

Maximum Match Minimum Match Description
* * Repeat match before expression 0 or more times
+ + Repeat match before expression one or more times


Repeat match before expression 0 or one time
{m} {m} Exact repetition of the pre-expression m times
{m,} {m,} At least repeat the pre-expression m times
{M,n} {M,n} At least repeat the previous expression m times, at most repeat match before expression n times

According to the above, ". *" is the maximum match, can match all the strings that can match the source string. ". *" is the minimum match, matching only the first occurrence of the string. For example: D.*g can match any string starting with D, ending with G, such as "Debug" and "debugging", or even "dog is walking". and d.* G can only match "debug", in "Dog is Walking" string, only match to "dog".

In some more complex matches, we are available to groups and operators.
Groups and operators

Group Description
[...] Matches a character within a set, such as [a-z],[1-9] or [,./; ']
[^...] Matches all characters except the set, which is the equivalent of taking the inverse action
a| B Match expression A or B, equivalent to an OR operation
(...) Expressions are grouped, each pair of parentheses is a group, such as ([a-b]+) ([a-z]+) ([1-9]+)]
\number Match text within the number expression group

There is a special set of character sequences that are used to match specific character types or character contexts. such as \b matches the character boundary, food\b matches "food", "Zoofood", and "foodies" does not match.
Special character Sequences

character Description
\a Match only the beginning of a string
\b Match a word boundary
\b Matches the non-boundary of a word
\d matches any decimal digit character, equivalent to R ' [0-9] '
\d Matches any non-decimal numeric character equivalent to R ' [^0-9] '
\s Match any empty characters (space, tab tab, line feed, carriage return, page break, vertical line symbol)
\s Match any non-whitespace character
\w Match any alphanumeric character
\w Match any non-alphanumeric character
\z Match only the tail of a string
\\ Match backslash character

A set of statements (assertion) declares a specific event.
Regular expression declarations

The The
declaration Description
(ILMSUX) The matches the empty string, and the Ilmsux character corresponds to the regular expression modifier for the following table.
(: ...) matches the expression defined within the parentheses, but does not populate the character Group table.
(p<name>) matches the expression defined within parentheses, but the matching expression can also be used as a symbol group for name identification.
(p=name) matches all text that matches the previously named group of characters.
(# ...) introduces comments, ignoring the contents within parentheses.
(= ...) if the provided text matches the next regular expression element, there is no extra text to match. This allows for advanced operations in an expression without affecting the analysis of the rest of the regular expression. If "Martin" followed by "Brown", then "Martin" =brown only with "Martin" match.
(!...) matches only if the specified expression does not match the next regular expression element, yes (= ...) The inverse of the operation.
(<= ...) if the prefix string for the current position of the string is the given text, the entire expression is terminated at the current position. such as the (<=ABC) def expression matches "abcdef". This match is an exact match for the number of prefix characters.
(<!...) if the prefix string for the current position of the string is not the given body, it matches, yes (<= ...) The inverse of the operation.

Regular expressions also support some processing flags, which can affect the execution of a regular method.
Handling Flags

logo Description
I or ignorecase Ignores the case of an expression to match the text.

2. Operation

With the RE module, we can search, extract, and replace strings in Python using regular expressions. For example, the Re.search () function can perform a basic search operation, and it can return a Matchobject object. The Re.findall () function can return a matching list.

The code is as follows:


>>> Import re
>>> a= "This is my re module test"
>>> obj = Re.search (R '. *is ', a)
>>> Print obj
< _sre. Sre_match Object at 0xb7d7a218>
>>> Obj.group ()
' This is '
>>> Re.findall (R '. *is ', a)
[' This is ']


Matchobject Object Methods


Method Description
Expand (Template) Expands the content defined in the template with backslashes.
M.group ([group,...]) Returns the matched text, which is a tuple. This text is the text that matches the group defined by the given group or by its index number, and all occurrences are returned if there is no group-specific group name.
M.groups ([default]) Returns a tuple that contains the text in the pattern that matches all groups. If the default parameter is given, the default parameter value is the return value of the group that does not match the given expression. The default parameter has a value of none.
M.groupdict ([default]) Returns a dictionary that contains all child groups that match. If the default parameter is given, its value is the return value of those mismatched groups. The default parameter has a value of none.
M.start ([group]) Returns the start position of the specified group, or returns the start position of all matches.
M.end ([group]) Returns the end position of the specified group, or returns the end position of all matches.
M.span ([group]) Returns a two element group that is equivalent to a list of (M.start (group), M.end (group)) for a given group or a complete match expression
M.pos The POS value passed to the match () or the search () function.
M.endpos The Endpos value passed to the match () or the search () function.
M.lastindex
M.lastgroup
M.re Create a regular object for this Matchobject object
M.string A string supplied to the match () or the search () function.

Use the sub () or SUBN () function to perform a substitution operation on a string. The basic lattice R of the sub () function is as follows:

Sub (Pattern,replace,string[,count])
Example

The code is as follows:

>>> str = ' The Dog on my Bed '
>>> rep = re.sub (' dog ', ' cat ', str)
>>> Print Rep
The Cat on my Bed

The Replace parameter can accept the function. You can use the SUBN () function to get the number of replacements. The SUBN () function returns a tuple that contains the substituted text and the number of substitutions.

If we need to do multiple matches with the same regular, we can compile the regular form into internal language and improve the processing speed. The compiled regular is implemented using the compile () function. The basic format of the compile () function is as follows: Compile (Str[,flags])
STR indicates a regular string to compile, and flags is a modifier marker. The regular form is compiled into an object that has several methods and properties.
Regular-Object Methods/Properties

Method/Property Description
R.search (String[,pos[,endpos]) Same as the search () function, but this function allows you to specify the start and end of the search
R.match (String[,pos[,endpos]) With the match () function, but this function allows you to specify the start and end of the search
R.split (String[,max]) The same split () function
R.findall (String) Same FindAll () function
R.sub (Replace,string[,count]) Same sub () function
R.SUBN (Replace,string[,count]) Same subn () function
R.flags Flags defined when creating an object
R.groupindex Map the name of the symbol group defined by R ' (Pid) to the dictionary of the Group ordinal
R.pattern The mode used when creating the object

Escape string with the Re.escape () function.
Getting object references through GetAttr

The code is as follows:

>>> li=[' A ', ' B ']
>>> GetAttr (li, ' append ')
>>> GetAttr (li, ' append ') (' C ') #相当于li. Append (' C ')
>>> Li
[' A ', ' B ', ' C ']
>>> handler=getattr (li, ' append ', None)
>>> Handler
< built-in method Append of list object at 0xb7d4a52c>
>>> handler (' cc ') #相当于li. Append (' cc ')
>>> Li
[' A ', ' B ', ' C ', ' CC ']
>>> result = Handler (' BB ')
>>> Li
[' A ', ' B ', ' C ', ' cc ', ' BB ']
>>> Print Result
None


This article is from the "Big Plum" blog, make sure to keep this source http://n1lixing.blog.51cto.com/11772222/1954242

Python Regular Expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.