Python Automation Development Learning "sixth day"

Last Update:2016-06-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Import re # regular matches only string wildcard characters:

. can match any character (except for line breaks) and can match only one character
\ backslash, escape character
^ The opening symbol needs to be escaped, match the beginning of the string, or the negation operator must not be escaped
$ match End of string
\w match letters or numbers or underscores or kanji
\w matches non-alphanumeric numbers
\s matches any whitespace character, equivalent to [\t\n\r\f]
\s matches any non-null character
\d Matching numbers
\d Match any number
\a Match string start
\z matches the end of the string, if there is a newline, matches only to the end string before the line break
\z Match string End
\g match the last match completed position
\b Match the beginning or end of a word
| Select the item character, or the meaning of
* Allow repeat 0 or more times
+ Allow repeat 1 or more times
? Appear once or never appear at all
{m, n} repeat m~n times
{m} repeats m times
{m,} repeats m or more times
() parentheses are to extract a matching string, and there are several parentheses in the expression that have several corresponding matching strings
(\s*) A string representing contiguous spaces
[] The brackets denote the range of characters that define the match, [a-za-z0-9] means matching letters and numbers, [\s*] means spaces or * numbers

(0-9) Match ' 0-9 ' itself [0-9]* match Number (note there is *, can be empty) [0-9]+ match Number (note there is +, cannot be empty) {1-9} curly braces, the wrong wording [0-9]{0,9} represents a numeric string of length 0 to 9 MatchMatch (pattern,string,flags=0) # pattern: Regular Model # string: String to match # Flags: Matching pattern match method will match at the beginning of a given stringReturns none if the match does not succeed, and the match returns a matching object, which can use the group method to give the matching string. >>> ret = Re.match (r "ABC", "Ab1c123") # match from scratch >>> print (ret) # Direct Print to nonenone>>> Re.match (r "ABC", "abc123") <_sre. Sre_match object; Span= (0, 3), match= ' abc ' > #span指的是匹配到的字符在字符串中的位置下标, corresponding to start and end respectively, excluding end>>> obj = Re.match (r "ABC", "Abc123 ") >>> Obj.group () # Returns the value matched to by the group () method ' abc ' >>> obj = Re.mat CH (r "ABC", "abc123") >>> Obj.start () 0>>> obj.end () 3>>> Obj.span () (0, 3) >>> Obj.group () ' ABC ' SearchSearch (pattern, string, flags=0) The search method is to find the pattern within a string, returns the first match of a string。 If the match is unsuccessful, returns none, and the match returns a matching object that can use the group method to give the matching string. >>> obj = Re.search (r "ABC", "123abc456abc789") >>> obj<_sre. Sre_match object; Span= (3, 6), match= ' abc ' >>>> Obj.group () # and match same as the group () method, returns the string that matches to ' ABC ' >>> Obj.start () 3 >>> obj.end () 6>>> Obj.span () (3, 6) FindAllThe difference between findall (pattern,string,flags=0) FindAll and Match,search is that the first two are single-value matches and that it is Match All, it's The return value is a list of matches to a string。 If there is no match to the object, then an empty list is returned, all FindAll without the group method, and no start,end,span. >>> obj = Re.findall (r "ABC", "123abc456abc789") >>> obj[' abc ', ' ABC '] # The return value is a list that contains the string to which it is matched &G T;>> Obj.group () # does not exist Group method Traceback (most recent call last): File "<pyshell#37>", line 1, in <module& Gt Obj.group () Attributeerror: ' List ' object has no attribute ' group ' >>> obj = Re.findall (r "ABC", "123abc456abc789" ) >>> print (obj) # returns an empty list if there is no match to the string [] SplitSplit (pattern, String, maxsplit=0, flags=0) # pattern: Regular Model # string: String to match # Maxsplit: Specify number of Splits # Flags: Match pattern Split function The split method is similar to the string type, both use specific characters to separate strings。 But the RE module's split can use regular expressions, which makes it more flexible and more powerful. >>> s = "8+7*5+6/3" >>> tmp = Re.split (r "[\+\-\*\/]", s) # match pattern is any of the subtraction four operators, by S Plit separates strings into a single digit >>> tmp[' 8 ', ' 7 ', ' 5 ', ' 6 ', ' 3 '] split has a parameter MaxsplitFor Specify the number of splits:>>> tmp = Re.split (r "[\+\-\*\/]", s, Maxsplit = 2) >>> tmp[' 8 ', ' 7 ', ' 5+6/3 '] using the concept of grouping, the split function can also save the This feature is important for the delimiter that is provided. >>> tmp = Re.split (r "([\+\-\*\/])", s) >>> tmp[' 8 ', ' + ', ' 7 ', ' * ', ' 5 ', ' + ', ' 6 ', '/', ' 3 '] SubSub (pattern,repl,string,count=0,flags=0) # pattern: Regular Model # REPL: string to replace or executable # String: string to Match # count: Specify number of matches # Flags: match The mode sub function is similar to the Replace function of a string, with the specified content replace match to character, you can specify the number of substitutions. >>> s = "Hello world, I am pythoner!" >>> s = re.sub (r "O", "X", s) # replace "O" in string s with "X" >>> s ' hellx wxrld, I am pythxner! ' The sub function has an advanced usage, "group Reference": First set up a grouping in the regular expression with parentheses, and then refer to "\1" in the string to be replaced to refer to the contents of this grouping match,>>> r = re.sub (r "(World)", R "<em>\1<em>" , s) >>> R ' Hello <em>world<em>, I am pythoner! ' >>> r = re.sub (r "(World)", R "%%%%\1%%%%%", s) >>> R ' Hello%%%%world%%%%%, I am pythoner! FlagThe Python re module designs the flag parameter for its main functions, called Compile Flags, such as I, M, S and so on. The compile flags can modify some of the regular expressions to run, using full names such as: IGNORECASE, or abbreviations such as I to refer to them. Multiple flags can be combined, such as re. I | Re. M is set to the I and M signs.

I (ignoring case) Ignogrecase: case-insensitive matching
L (native mode) locale: Makes \w, \w, \b, \b, \d, \d dependent on local settings
M (Multiline mode) MULTILINE: Allows the ^ and $ symbols of regular expressions to fit into a string of multiline patterns

For example the string s = "\nabc\n" Actually it has three lines >>> import re>>> s = "\nabc\n" >>> Re.search (r "^abc$", s) >> >>>> Re.search (r "abc$", s) <_sre. Sre_match object; Span= (1, 4), match= ' abc ' > >>> re.search (r "^abc", s) >>>>>> Re.search (r "ABC", s) <_sre. Sre_match object; Span= (1, 4), match= ' abc ' >>>>>>> Re.search (r "^abc$", S,re. M) <_sre. Sre_match object; Span= (1, 4), match= ' abc ' >>>>>>> Re.search (r "^abc", S,re. M) <_sre. Sre_match object; Span= (1, 4), match= ' abc ' >>>>>>> Re.search (r "abc$", S,re. M) <_sre. Sre_match object; Span= (1, 4), match= ' abc ' >

S (Match newline character)
Dotall: Make. "Special characters exactly match any character, including line breaks, and if there is no such flag,". "can only match any character outside the line break
X (annotation mode)
VERBOSE: When the flag is specified, blank in the Regular expression string, tab, NewLine characters are ignored unless the whitespace is in a character class or after a backslash, which allows you to organize and indent expressions more clearly. It allows you to write comments to expressions, which are ignored by the engine; comments are denoted by the "#" sign, but the symbol cannot be followed by a string or backslash. Its role: one is to make complex and difficult to understand the expression more readable, 20 to add comments to the expression.

The following Pat is equivalent to R "\* ([^\*]+) \*" >>> Pat = Re.compile (r "' ' & nbsp \* # escape an asterisk &N Bsp ( & nbsp # opening parenthesis represents the start of a group &NB Sp [^\*]+ # Capture any non-asterisk characters ) # closing parenthesis represents the end of the group &NB Sp \* & nbsp;# Escape An asterisk &NBSP; ", Re. VERBOSE) >>> obj = Pat.search ("Hi, this is a *something*!") >>> obj.group () ' *something* '

U UNICODE: Compatibility mode. is ignored in string mode (the default mode) and is forbidden in byte mode. After Python3, string and bytes are separated into two different data types. In the RE module, you cannot use bytes to match a string or string to match a bytes, only string matching string,bytes match bytes.

Import re s = "Halo" b = bytes (s, encoding= "Utf-8") Pat = bytes (' Ha ', encoding= "Utf-8") print ("string s:%s"% s) print ("Byte b:%s"% b) Print ("byte-type regular expression pat:%s"% pat) obj_s = Re.search (Pat, b) print ("Match result:%s"% Obj_s.group ()) Run Result: string S:halo byte b:b ' Halo ' Byte type regular expression pat:b ' ha ' match result: B ' ha when using Unicode mode, the use of the bytes type will be enforced and an error will be encountered once used. In string type, Unicode is the default setting.

A ASCII: For string mode, make \w, \w, \b, \b, \d, \d only match the ACSII code character set, not the entire Unicode character set (default). For bytes mode, this compilation flag is the default setting and does not require special designation. Usually we don't care about this, but be careful with code that is frequently used in various languages or strings.

Python Automation Development Learning "sixth day"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Automation Development Learning "sixth day"

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support