The basics of Python regular expressions--regular expressions

Source: Internet
Author: User
Tags character classes numeric modifiers

A regular expression is a powerful tool for handling strings, and it is not part of Python.

There are also the concepts of regular expressions in other programming languages, which differ only in the number of syntax implementations supported by different programming languages.

It has its own unique syntax and a separate processing engine, and the syntax of regular expressions is the same in languages that provide regular expressions.

The following illustration shows a process that uses regular expressions to match:

1.1 Introduction

Regular expressions are not part of Python. Regular expressions are powerful tools for handling strings, with their own unique syntax and an independent processing engine that may not be as efficient as the STR-band approach, but powerful. Thanks to this, in a language that provides regular expressions, the syntax of regular expressions is the same, except that the number of grammars supported by different programming languages is different, but don't worry, unsupported syntax is usually a less common part.

A regular expression is a special sequence of characters that can help you easily check whether a string matches a pattern. Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern. The RE module enables the Python language to have full regular expression functionality.

1.2 The various uses to know

A pattern string uses a special syntax to represent a regular expression:

Letters and numbers represent themselves. The letters and numbers in a regular expression pattern match the same string. Most letters and numbers have different meanings when they are added with a backslash. Punctuation matches itself only when they are escaped, otherwise they represent a special meaning. The backslash itself needs to be escaped with a backslash.

Because regular expressions usually contain backslashes, you might want to use the original string to represent them. Pattern elements (such as R '/t ', equivalent to '//t ') match the corresponding special characters.

The following table lists the special elements in the regular expression pattern syntax. If you use patterns and provide optional flag parameters, the meaning of some schema elements will change.

Of course, these uses are many, will give out often use of usage, try to understand more.

Mode

Mode Description
^ Match the beginning of a string
$ Matches the end of a string.
. Matches any character, except for line breaks, when re. When the Dotall tag is specified, you can match any character that includes a line feed.
[...] Used to represent a set of characters, listed separately: [AMK] matches ' a ', ' m ' or ' K '
[^...] Characters not in []: [^ABC] matches characters other than a,b,c.
re* Matches 0 or more expressions.
Re+ Matches 1 or more expressions.
Re? Matches 0 or 1 fragments defined by the preceding regular expression, not in a greedy way
re{N}
re{N,} Exact match n preceding expression.
re{N, m} Matches N to m times by fragments defined by the preceding regular expression, greedy way
a| B Match A or B
(RE) G matches the expression in parentheses and also represents a group
(? imx) A regular expression consists of three optional flags: I, M, or X. Affects only the areas in parentheses.
(?-imx) The regular expression closes the I, M, or x optional flag. Affects only the areas in parentheses.
(?: RE) Similar (...), but does not represent a group
(? imx:re) Use the I, M, or x optional flag in parentheses
(?-imx:re) Do not use the I, M, or x optional flag in parentheses
(?#...) Comments.
(? = re) The forward affirmative-defined character. If a regular expression is included to ... Indicates that success occurs when the current location succeeds, or fails. But once the expression has been tried, the matching engine does not improve at all, and the rest of the pattern tries to define the right side of the symbol.
(?! Re) Forward negative definition character. In contrast to affirmative-defined characters; successful when the containing expression cannot match the current position of the string
(?> re) Matching independent mode, omitting backtracking.
\w Match letter Numbers
\w Match non-alpha-numeric
\s Matches any whitespace character, equivalent to [\t\n\r\f].
\s Match any non-null character
\d Matches any number, equivalent to [0-9].
\d Match any non-numeric
\a Match string start
\z Matches the end of the string and, if there is a newline, matches only the ending string before the newline. C
\z Match string End
\g Matches the location of the final match completion.
\b Matches a word boundary, which is the position between the word and the space. For example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\ n, t, et. Matches a line feed character. Matches a tab character. Wait
\1...\9 A subexpression that matches the nth grouping.
\10 A subexpression that matches the nth grouping if it is matched. Otherwise, the expression of the octal character code is indicated.

Character class

instance Description
[Pp]ython Match "python" or "python"
Rub[ye] Match "Ruby" or "Rube"
[Aeiou] Match any one of the letters in brackets
[0-9] matches any number. Similar to [0123456789]
[A-z] Match any lowercase letter
[A-z] Match any uppercase letters
[A-za-z0-9] Match any letter or number
[^aeiou] All characters except the Aeiou letter
[^0-9] Match characters except numbers

Special character Classes

instance Description
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\s Matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-white-space character. equivalent to [^ \f\n\r\t\v].
\w Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w Matches any non word character. Equivalent to ' [^a-za-z0-9_] '.

1.3re.match function

Re.match attempts to match a pattern from the starting position of the string, and match () returns none if the starting position match succeeds.

Re.match (pattern, string, flags = 0)

Pattern Regular Expressions

string-Matching strings

Flags flag bits, which are used to control the matching method, which is described below

Directly on the program:

Import string,re
r = "ABC" #正则表达式
if Re.match (R, "abc"): #匹配
print ' Done ' 
else:

Results:

Done

Can be based on the use of the above table, more practice:

Import string,re
r = "A.C" #正则表达式. matches any character, except for line breaks, when re. When the Dotall tag is specified, you can match any character that includes a line feed.
if Re.match (R, "abc"): 
print Re.match (R, "abc")
print ' Done ' 
else:

Results:

<_sre. Sre_match Object at 0x01dd6158>

Done

Note This is not a string that shows a matching success, Re.match () returns an object and none of the unsuccessful returns.

We can get matching expressions by using Group (NUM) or groups () to match object functions.

Matching Object Method Description
Group (num=0) string of the entire expression that matches, group () can enter more than one group number at a time, in which case it will return a tuple containing the corresponding values of those groups.
Groups () Returns a tuple containing all the group strings, from 1 to the included group number.

Program:

 Import string,re
 r = "A.C" 
 if Re.match (R, "abc"): Line 
 = Re.match (R, "abc")
 print line.group ()
 Else :
 

Results:

Abc

1.3re.search function

Re.search () scans the entire string and returns the first successful match

Re.search (Pattern, string, flags=0)

Pattern Regular Expressions

string-Matching strings

Flags flag bits, used to control the matching method

As with Re.match (), the matching successful Re.search method returns a matching object, otherwise none is returned.

Directly on the program:

Import string,re
r = "abc" 
s = ' aacawcabc '
if Re.search (r,s): Line 
= Re.search (r,s)

Results:

Abc

Attention:

The difference between Re.match () and Re.search ():

Re.match matches only the beginning of a string, if the string does not start with a regular expression, the match fails, the function returns none, and Re.search matches the entire string until a match is found.

1.4re.sub function

The Re.sub () function replaces matches.

Re.sub (Pattern,repl,string,max = 0)

Pattern Regular Expressions

REPL Replacement Items

string-Matching strings

The maximum number of count substitution defaults is 0 to replace all matches

The returned string is substituted in the string with the left-most-repeated match in the RE. If the pattern is not found, the character is returned unchanged.

Program:

Import string,re Pattern
= ' \d ' 
repl = "!" 
s = ' ABCDEFG ' line 
= Re.sub (pattern,repl,s)

Results:

!!!!!!!!! Abcdefg

1.5 Regular expression modifiers-optional flags

Let's say what is the flag bit:

A regular expression can contain some optional flag modifiers to control the matching pattern. Modifiers are specified as an optional flag. Multiple flags can be specified by a bitwise OR (|). such as Re. I | Re. M is set to the I and M flags:

modifiers Description
Re. I Make matching not sensitive to case
Re. L Do localization recognition (locale-aware) matching
Re. M Multiple lines matching, affecting ^ and $
Re. S Make. Matches all characters, including line wraps
Re. U Resolves characters based on the Unicode character set. This sign affects \w, \w, \b, \b.
Re. X The sign gives you a more flexible format so that you can write regular expressions easier to understand.

Program:

Import string,re Pattern
= ' [Aa][bb][cc][dd] ' 
s = ' AbCd ' 
if Re.match (pattern,s): Line
= Re.match ( Pattern,s)

Results:

Abcd

The above program can be implemented by selecting a flag bit:

Import string,re Pattern
= ' ABCD ' 
s = ' abcd ' 
if Re.match (pattern,s,re. I): Line
= Re.match (pattern,s,re. I)

Results

Abcd

1.6re.compile function

The general step in using the RE is to use the Re.compile () function to compile the string form of the regular expression into a pattern instance, and then to use a pattern instance to process the text and get the matching result (a match instance) and finally use the match instance to get the information. Do other things.

Program:

Import string,re Pattern
= re.compile (' \d+ ') 
s = ' AABBCCDD ' 
if Pattern.match (s): line
= Pattern.match (s)
print Line.group ()

Results:

11223344

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.