Getting started with Python: Regular Expressions and getting started with python

Source: Internet
Author: User
Tags alphanumeric characters expression engine

Getting started with Python: Regular Expressions and getting started with python

Regular expressions have two basic operations: matching and replacement.

Matching is to search for a special expression in a text string;

Replacement is to find and replace a string that matches a special expression in a string.
 
1. Basic Elements
 
Regular Expressions define a series of special character elements for matching.

Regular Expression Basic Characters

Character Description
Text Match text string
. Match any single character except line breaks
^ Matches the start of a string.
$ Matches the end of a string.

In a regular expression, we can also use a match qualifier to limit the number of matching times.
 
Match a qualifier

Max matching Minimum matching Description
* * Expression zero or multiple times before repeat
+ + Repeat the previous expression once or multiple times
The expression used to repeat the match is zero or once.
{M} {M} Exact repeat the previous expression m times
{M ,} {M ,} The expression must be repeated at least m times.
{M, n} {M, n} The pre-expression must be repeated at least m times, and the pre-expression must be repeated at most n times.

As described above, ". *" is the maximum match and can match all matching strings of the source string. ". *" Is the minimum match. It only matches the string that appears for the first time. For example, d. * gcan match any string starting with d and ending with g, such as "debug" and "debugging", or even "dog is walking ". D. * g can only match "debug". In the "dog is walking" string, only "dog" is matched ".
 
In some more complex matches, we can use groups and operators.
 
Groups and operators

Group Description
[...] Match characters in a set, such as [a-z], [1-9], or [,./; ']
[^...] Matches all characters except the set, which is equivalent to a reverse operation.
A | B Matches expression a or expression B, which is equivalent to an OR operation.
(...) Expression group, each pair of parentheses is a group, such as ([a-B] +) ([A-Z] +) ([1-9] +)
\ Number Match the text in the number expression group

There is a special character sequence used to match the specific character type or character environment. For example, if \ B matches the character boundary, food \ B matches "food" and "zoofood", but does not match "foodies.
 
Special Character Sequence

Character Description
\ Match only the start of the string
\ B Match A Word boundary
\ B Match non-boundary of a word
\ D Match any decimal number character, equivalent to R' [0-9]'
\ D Match any non-decimal number character, equivalent to R' [^ 0-9]'
\ S Match any space characters (space character, tab character, line break, carriage return, page break, vertical line character)
\ S Match any non-space characters
\ W Match any letter or Digit
\ W Match any non-alphanumeric characters
\ Z Only matches the end of the string
\\ Match backslash characters

A set of declarations (assertion) are used to declare specific events.
 
Regular Expression Declaration

Statement Description
(ILmsux) Matches an empty string. The iLmsux character corresponds to the regular expression modifier in the following table.
(:...) Matches the expressions defined in parentheses, but does not fill the character group table.
(P <name>) Matches the expressions defined in parentheses, but the matching expressions can also be used as the symbol Group marked by name.
(P = name) Match all text that matches the character group named above.
(#...) Introduce comments to ignore the content in parentheses.
(= ...) If the provided text matches the element of the next regular expression, no additional text will match. This allows advanced operations in an expression without affecting the analysis of the rest of the regular expression. For example, if "Martin" is followed by "Brown", "Martin (= Brown)" only matches "Martin.
(!...) If the specified expression does not match the element of the next regular expression, it is a reverse operation (=.
(<= ...) If the prefix string at the current position of the string is a given text, it matches and the entire expression ends at the current position. For example, the (<= abc) def expression matches "abcdef. This match precisely matches the number of prefix characters.
(<!...) If the prefix string at the current position of the string is not the given body, it will match. It is a reverse operation of (<=.

Regular Expressions also support some processing flags, which affect the execution methods of regular expressions.
 
Processing mark

Flag Description
I or IGNORECASE Ignore the case sensitivity of the expression to match the text.

2. Operation
 

With the re module, we can use the regular expression to search, extract, and replace strings in python. For example, the re. search () function can perform a basic search operation and return a MatchObject object. The re. findall () function returns the matching list.
 
Copy codeThe Code is as follows:
>>> Import re
>>> A = "this is my re module test"
>>> Obj = re. search (R'. * is ',)
>>> Print obj
<_ Sre. SRE_Match object at 0xb7d7a218>
>>> Obj. group ()
'This is'
>>> Re. findall (R'. * is ',)
['This is ']

MatchObject object Method

Method Description
Expand (template) Expand the content defined by backslash in the template.
M. group ([group,...]) Returns the matched text, which is a tuples. This document is a text that matches a given group or a group defined by its index number. If no group name is specified, all matching items are returned.
M. groups ([default]) Returns a tuple that contains the text that matches all groups in the pattern. If the default parameter is given, the default parameter value is the return value of the group that does not match the given expression. The default value is None.
M. groupdict ([default]) Returns a dictionary that contains all matched sub-groups. If the default parameter is given, its value is the return values of the unmatched groups. The default value is None.
M. start ([group]) Returns the starting position of the specified group or all matched starting positions.
M. end ([group]) Returns the end position of the specified group, or returns the end position of all matches.
M. span ([group]) Returns a two-element group. This tuples are equivalent to the (m. start (group), m. end (group) List of a given group or a complete matching expression.
M. pos The pos value passed to the match () or search () function.
M. endpos The endpos value passed to the match () or search () function.
M. lastindex
M. lastgroup
M. re Create the regular expression object of this MatchObject object
M. string String provided to the match () or search () function.

Use the sub () or subn () function to perform the replacement operation on the string. The basic format of the sub () function is as follows:
Sub (pattern, replace, string [, count])
 
Example

 Copy codeThe Code is as follows:
>>> Str = 'the dog on my bed'
>>> Rep = re. sub ('Dog', 'cat', str)
>>> Print rep
The cat on my bed

The replace parameter is an acceptable function. To obtain the number of replicas, you can use the subn () function. The subn () function returns a tuples that contain the replaced text and the number of times of replacement.
 
If you need to use the same regular expression for Multiple matching operations, we can compile the regular expression into an internal language to improve processing speed. Compile the regular expression using the compile () function. The basic format of the compile () function is as follows:
Compile (str [, flags])
 
Str indicates the regular expression string to be compiled, and flags indicates the modifier. The regular expression is compiled to generate an object, which has multiple methods and attributes.
 
Regular Expression object method/attribute

Method/attribute Description
R. search (string [, pos [, endpos]) The same as the search () function. However, this function allows you to specify the start and end points of a search.
R. match (string [, pos [, endpos]) The same as the match () function. However, this function allows you to specify the start and end points of a search.
R. split (string [, max]) Same as the split () function
R. findall (string) Same as findall () function
R. sub (replace, string [, count]) Same as sub () function
R. subn (replace, string [, count]) Same as subn () function
R. flags Identifier defined when an object is created
R. groupindex Maps the name of the symbol group defined by R' (Pid) 'to the dictionary of the group sequence number.
R. pattern Mode used when creating an object

Escape the string using the re. escape () function.
 
Get object reference through getattr
 
Copy codeThe Code is as follows:
>>> Li = ['A', 'B']
>>> Getattr (li, 'append ')
>>> Getattr (li, 'append') ('C') # equivalent to li. append ('C ')
>>> Li
['A', 'B', 'C']
>>> Handler = getattr (li, 'append', None)
>>> Handler
<Built-in method append of list object at 0xb7d4a52c>
>>> Handler ('cc') # equivalent to li. append ('cc ')
>>> Li
['A', 'B', 'C', 'cc']
>>> Result = handler ('bb ')
>>> Li
['A', 'B', 'C', 'cc', 'bb']
>>> Print result
None


Matching usage of Python Regular Expressions

1. test whether the regular expression matches all or part of the string regex = ur "# Regular Expression
If re. search (regex, subject): do_something () else: do_anotherthing () 2. test whether the regular expression matches the entire string regex = ur "/Z" # End with/Z at the end of the regular expression
If re. match (regex, subject): do_something () else: do_anotherthing () 3. create a matching object and obtain the matching details (Create an object with details about how the regex matches (part of) a string) regex = ur "" # Regular Expression
Match = re. search (regex, subject) if match: # match start: match. start () # match end (exclusive): atch. end () # matched text: match. group () do_something () else: do_anotherthing () 4. get the part of a string matched by the regex) regex = ur "# Regular Expression
Match = re. search (regex, subject) if match: result = match. group () else: result = "" 5. get the part of a string matched by a capturing group regex = ur "# Regular Expression
Match = re. search (regex, subject) if match: result = match. group (1) else: result = "" 6. get the part of a string matched by a named group regex = ur "# Regular Expression
Match = re. search (regex, subject) if match: result = match. group "groupname") else: result = "" 7. put all matched substrings in the string into the array (Get an array of all regex matches in a string) result = re. findall (regex, subject) 8. traverse all matched substrings (Iterate over all matches in a string) for match in re. finditer (r "<(. *?) /S *.*? // 1> ", subject) # match start: match. start () # match end (exclusive): atch. end () # matched text: match. group () 9. use a regular expression string to create a regular expression object (re ...... remaining full text>

Python Regular Expressions

You are wrong. R "2x \ + 5y" indicates that "\" in the string is not escaped;
In a regular expression, "\ +" indicates escaping "+" in a regular expression, because "+" has a special meaning in a regular expression, this is irrelevant to the escape of strings.

More clearly, you write "\ +" or "r" \ + "in the program and save a" \ "and a" + "in the memory ", as long as the Regular Expression Engine reads a continuous "\" and "+" from the memory, it will understand that you want to match the character "+.

Therefore, if you do not write r before the string, the regular expression string should be written as follows:
"2x \ + 5y | 7y-3z"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.