Python learns "four": Regular expressions

Last Update:2018-02-11 Source: Internet

Author: User

Tags aliases locale setting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the regular Expression Foundation 1. Introduction

Regular expressions are not part of Python. Regular expressions are powerful tools for working with strings, with their own unique syntax and an independent processing engine, which may not be as efficient as Str's own approach, but very powerful. Thanks to this, in the language that provides the regular expression, the syntax of the regular expression is the same, except that the number of grammars supported by different programming languages is different; but don't worry, the unsupported syntax is usually the less common part. If you've already used regular expressions in other languages, simply take a look and get started.

The approximate matching process for regular expressions is to take out the expression and compare the characters in the text, and if each character matches, the match succeeds; If a match is unsuccessful, the match fails. If there are quantifiers or boundaries in an expression, the process can be slightly different, but it is also well understood.

Lists the regular expression metacharacters and syntax supported by Python (image from Network):

2. Attention to Detail

　　(1) Greedy match and non-greedy match

When we use the * +? number, the default is to take a greedy match (that is, the maximum number of matches), when you add one after these symbols or directly use {M,n} to take a non-greedy match (plus? Indicates the minimum number of times, {} means matching on demand)

(2) Matching mode

#匹配模式re. I  #匹配中大小写不敏感re. L  #使预定字符类 \w \w \b \b \s \s depends on the current locale setting re. M  #多行匹配, affecting ^ and $ symbol RE. S  #包含换行在内的所有字符, affecting. metacharacters

Re. U  #使预定字符类 \w \w \b \b \s \s \d \d Depending on Unicode-defined character attributes

Re. X  #详细模式. In this mode, the regular expression can be multiple lines, ignore whitespace characters, and can be added to comments.

(3) Native string

As with most programming languages, "\" is used as an escape character in regular expressions, which can cause a backslash to be plagued. If you need to match the character "\" in the text, then 4 backslashes "\\\\" will be required in the regular expression expressed in the programming language: the first two and the last two are used to escape the backslash in the programming language, converted to two backslashes, and then escaped in the regular expression into a backslash. The native string in Python solves this problem well, and the regular expression in this example can be expressed using R "\ \". Similarly, a "\\d" that matches a number can be written as r "\d". With the native string, you no longer have to worry about missing the backslash, and the expression is more intuitive.

Second, the use of the RE module 1, the function in re

Re.match (pattern, string, flags =0)#Finds the substring of the string that precedes it, returns the match object#flags indicates a matching patternRe.search (pattern, string, flags=0)#finds a matching substring, returns the match objectRe.findall (pattern, string, flags=0)#find all matching substrings, return to listre.sub (Pattern, REPL, string, Max=0)#replaces a substring in a string, Max represents the number of substitutionsre.subn#the number of times the substitution has been replacedre.compile (pattern)#compiling a rule string into a pattern object, reusable, is the factory method of the pattern classRe.finditer#creating an Iterative object

In this procedure, the function returns the match object and the pattern object, and we look at these classes.

#Propertiesstring: The text to use when matching. Re: The pattern object to use when matching. POS: The index in which the text expression begins the search. The value is the same as the parameter with the same name as the Pattern.match () and Pattern.seach () methods. Endpos: The index of the end-of-search text expression. The value is the same as the parameter with the same name as the Pattern.match () and Pattern.seach () methods. Lastindex: The index of the last captured grouping in the text. If there are no captured groupings, it will be none. Lastgroup: The alias of the last captured group. If the group has no aliases or no captured groupings, it will be none. #MethodGroup ([Group1, ...]): Gets the string that is intercepted by one or more groups, and returns a tuple when multiple parameters are specified. Group1 can use numbers or aliases; number 0 represents the entire matched substring; returns Group (0) when no parameters are filled; Groups that have not intercepted a string return none; The group that intercepted multiple times returns the last substring intercepted. Groups ([default]): Returns the string intercepted by all groups as a tuple. Equivalent to calling group (The,... last). Default indicates that a group that does not intercept a string is replaced with this value, which defaults to none. Groupdict ([default]): Returns the alias of the group with the alias as the key, the substring intercepted by the group as the value of the dictionary, the group without the alias is not included. The default meaning is the same. Start ([group]): Returns the starting index of the substring intercepted by the specified group in string (the index of the first character of the substring). The group default value is 0. End ([group]): Returns the end index of the substring intercepted by the specified group in string (the index of the last character of the substring+1). The group default value is 0. span ([group]): Returns (Start (group), End (group)). Expand (Template): Substituting the matched grouping into the template and then returns. You can use \id or \g in the template<id>, \g<name> reference grouping, but cannot use number 0. \id is equivalent to \g<id>, but \10 will be considered a 10th subgroup if you want to express \1 after the character'0', you can only use \g<1>0.

Match Object

# The pattern object is a compiled regular expression that can be matched to the text by a series of methods provided by pattern.  #pattern cannot be instantiated directly and must be constructed using Re.compile ().  #pattern provides several readable properties for getting information about an expression:pattern: The expression string used at compile time. Flags: The matching pattern used at compile time. Digital form. Groups: The number of groupings in an expression. Groupindex: The alias of the group with the alias in the expression is the key, the dictionary with the number corresponding to that group, and the group without the alias is not included.

Pattern Object

2. How to use

1 #Import re Module2 ImportRe3 4 #Generate Pattern Object5PA = Re.compile (r'ZSS')6 7Ma = Pa.search ('Hello zss world!')8 9 ifMA:Ten     Print(Ma.group ())#to get the elements in the match object through the group function One  A #out# - #ZSS

Python provides support for regular expressions through the RE module. The general step for using re is to compile the string form of the regular expression into a pattern instance, then use the pattern instance to process the text and get the matching result (a match instance), and finally use the match instance to get the information and do other things.

Third, example 1, IP address detection

#!/usr/bin/env pythonImportRedefiptest (IPSTR):#Create a regular rule stringRegexstr = R'([[10]?\d?\d|2[0-4]\d|25[0-5]) \.) {3} ([10]?\d?\d|2[0-4]\d|25[0-5])PA =re.compile (REGEXSTR)#Find the Iteration object that conforms to the ruleIPS =Pa.finditer (IPSTR)ifIPs: forIpinchIPs:Print(Ma.group ())Else:        Print(error!) return0defMain (): IP='192.168.1.1zss245.255.234zsssss1.1.1.1'iptest (Ipmain ()#Operation Result:#192.186.1.1#245.255.234.1#1.1.1.1

Let's analyze the rule string regexstr:

Python learns "four": Regular expressions

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More