Python Regular Expression Basics

Last Update:2016-01-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I haven't written a blog in another time.

I've been writing pygame for some time.

and ignored the blog

Today, I want to help my friend revise the movie subtitles

Used regular expressions.

I learned it again.

I'm going to have to practice more.

To be mastered gradually.

Python Regular Expressions

1.1 Introduction

Regular expressions are not part of Python. Regular expressions are powerful tools for working with strings, with their own unique syntax and an independent processing engine, which may not be as efficient as Str's own approach, but very powerful. Thanks to this, in the language that provides the regular expression, the syntax of the regular expression is the same, except that the number of grammars supported by different programming languages is different; but don't worry, the unsupported syntax is usually the less common part.

A regular expression is a special sequence of characters that can help you easily check whether a string matches a pattern. Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern. The RE module enables the Python language to have all the regular expression functionality.

1.2 To know the various uses

A pattern string uses a special syntax to represent a regular expression:

Letters and numbers denote themselves. The letters and numbers in a regular expression pattern match the same string. Most letters and numbers have a different meaning when they are put in front of a backslash. Punctuation marks only match themselves if they are escaped, otherwise they represent special meanings. Backslashes themselves need to be escaped with backslashes.

Because regular expressions usually contain backslashes, you might want to use the original string to represent them. The pattern element (such as R '/t ', equivalent to '//t ') matches the corresponding special character.

The following table lists the special elements in the regular expression pattern syntax. If you use the pattern while providing optional flag parameters, the meaning of some pattern elements will change.

Of course, there are a lot of these usages that will be given in the usual usage, and it will be understandable to try more.

Mode

Mode	Description
^	Matches the beginning of a string
$	Matches the end of the string.
.	Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.
[...]	Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '
[^...]	Characters not in []: [^ABC] matches characters other than a,b,c.
Tel	Matches 0 or more expressions.
Tem	Matches 1 or more expressions.
Re?	Matches 0 or 1 fragments defined by a preceding regular expression, not greedy
re{N}
re{N,}	Exact match n preceding expression.
re{N, m}	Matches N to M times the fragment defined by the preceding regular expression, greedy way
a\| B	Match A or B
(RE)	The G matches the expression in parentheses, and also represents a group
(? imx)	The regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.
(?-imx)	The regular expression closes I, M, or x optional flag. Affects only the areas in parentheses.
(?: RE)	A similar (...), but does not represent a group
(? imx:re)	Use I, M, or x optional flag in parentheses
(?-imx:re)	I, M, or x optional flags are not used in parentheses
(?#...)	Comments.
(? = re)	Forward positive qualifiers. If a regular expression is included, ... Indicates that a successful match at the current position succeeds or fails. But once the contained expression has been tried, the matching engine is not improved at all, and the remainder of the pattern attempts to the right of the delimiter.
(?! Re)	Forward negative qualifier. As opposed to a positive qualifier, when the containing expression cannot match the current position of the string
(?> re)	Match the standalone mode, eliminating backtracking.
\w	Match Alpha-Numeric
\w	Match non-alphanumeric numbers
\s	Matches any whitespace character, equivalent to [\t\n\r\f].
\s	Match any non-null character
\d	Match any number, equivalent to [0-9].
\d	Match any non-numeric
\a	Match string start
\z	Matches the end of the string, if there is a newline, matches only the ending string before the line break. C
\z	Match string End
\g	Matches the position where the last match was completed.
\b	Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b	Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\ n, \ t, et.	Matches a line break. Matches a tab character. such as
\1...\9	A sub-expression that matches the nth grouping.
\10	Matches the sub-expression of the nth grouping if it is matched. Otherwise, it refers to an expression of octal character code.

character class

Example	Description
[Pp]ython	Match "python" or "python"
Rub[ye]	Match "Ruby" or "Rube"
[Aeiou]	Match any one of the letters within the brackets
[0-9]	Match any number. Similar to [0123456789]
[A-z]	Match any lowercase letter
[A-z]	Match any uppercase letter
[A-za-z0-9]	Match any letters and numbers
[^aeiou]	All characters except the Aeiou letter
[^0-9]	Matches characters except for numbers

Special Character Classes

Example	Description
.	Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
\d	Matches a numeric character. equivalent to [0-9].
\d	Matches a non-numeric character. equivalent to [^0-9].
\s	Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s	Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\w	Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w	Matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.

1.3re.match function

Re.match attempts to match a pattern from the starting position of the string, and if the match is not successful, match () returns none.

　　Re.match (pattern, string, flags = 0)

Pattern Regular Expression

String that matches strings

Flags flag, used to control the matching method,

Directly on the program:

1 ImportString,re2 3R1 ="ABC"        #Regular Expressions4 5 ifRe.match (R1,"ABC"):#Match6     Print ' Done'    7 Else:8     Print 'defeat'

Results:

Done

can be used in accordance with the above tables given the use of more practice:

1 ImportString,re2  3R1 ="A.C"#正则表达式. Matches any character, except the newline character, when re. When the Dotall tag is specified, it can match any character that includes a line feed.4 5 ifRe.match (R1,"ABC"):     6     PrintRe.match (R1,"ABC")7     Print ' Done'      8 Else:9     Print 'defeat'

Results:

<_sre. Sre_match Object at 0x01dd6158>

Done

Note that this is not a string that shows a successful match, Re.match () returns an object that does not successfully return none.

We can get a matching expression through the group (NUM) or groups () matching object function.　

Matching Object Methods	Description
Group (num=0)	A string that matches the entire expression, group () can enter more than one group number at a time, in which case it returns a tuple that contains the corresponding values for those groups.
Groups ()	Returns a tuple containing all the group strings, from 1 to the included group number.

Program:

1 ImportString,re2  3R1 ="A.C"         4 5 ifRe.match (R1,"ABC"):     6line = Re.match (R1,"ABC")7     PrintLine.group ()8      9 Else:Ten     Print 'defeat'

Results:
Abc

1.3re.search function

Re.search () scans the entire string and returns the first successful match

　　Re. Search(pattern, string, flags=0)

Pattern Regular Expression

String that matches strings

Flags flag bit, for controlling the matching method

as with Re.match (), the matching successful Re.search method returns a matching object, otherwise none is returned.

Directly on the program:

1 Import String,re 2  3 " ABC "         4 ' AACAWCABC ' 5 if Re.search (r1,s):      6     line = Re.search (r1,s)7     print line.group ()

Results:

Abc

Attention:
The difference between Re.match () and Re.search ():

Re.match matches only the beginning of the string, if the string does not begin to conform to the regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

1.4re.sub function

　　The Re.sub () function is used to replace matches.

Re.sub (Pattern,repl,string,max = 0)

Pattern Regular Expression

REPL Replace Item

String that matches strings

The maximum number of times that count is replaced by default is 0 to replace all matches

The returned string is replaced by a match that is not repeated on the leftmost side of the re in the string. If the pattern is not found, the character will be returned unchanged.

Program:
　　

1 Import String,re 2 3 ' \d ' 4 " ! "         5 ' 123456789ABCDEFG '  6 7 line = re.sub (pattern,repl,s)89Print

Results:

!!!!!!!!! Abcdefg

1.5 Regular expression modifier-optional flag

Let's say what is the flag bit:

A regular expression can contain some optional flag modifiers to control the pattern that is matched. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|). such as Re. I | Re. M is set to the I and M flags:

modifier	Description
Re. I	Make the match case insensitive
Re. L	Do localization identification (locale-aware) matching
Re. M	Multiline match, affecting ^ and $
Re. S	Make. Match all characters, including line breaks
Re. U	Resolves characters based on the Unicode character set. This sign affects \w, \w, \b, \b.
Re. X	This flag is given by giving you a more flexible format so that you can write regular expressions much easier to understand.

Program:

1 Import String,re 2 3 ' [AA][BB][CC][DD] '    4 ' AbCd '  5 if Re.match (pattern,s): 6     line = Re.match (pattern,s)7     print line.group ()

Results:

Abcd

The above program can be implemented by selecting a flag bit:

1 Import String,re 2 3 ' ABCD '    4 ' AbCd '  5 if Re.match (pattern,s,re. I):6line     = re.match (pattern,s,re. I)7     print line.group ()

Results

Abcd

1.6re.compile function

The general step for using RE is to use the Re.compile () function, compile the string form of the regular expression into a pattern instance, then use the pattern instance to process the text and get the matching result (a match instance), and finally use the match instance to get the information. To do other things.

Program:

1 Import String,re 2 3 pattern = re.compile ('\d+')    4'  11223344AABBCCDD'5  6if  pattern.match (s):  7Line     = Pattern.match (s)8     print line.group ()

Results:

11223344

Python Regular Expression Basics

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Regular Expression Basics

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support