Introduction and usage of Python Regular Expressions

Source: Internet
Author: User
1. Regular Expression Introduction

A regular expression (re) is a small, highly specialized programming language embedded in Python and implemented through the re module.

You can specify rules for the corresponding string set to be matched.

This string set may contain English statements, email addresses, commands, or anything you want to do.

Can I ask, "Does this string match this pattern"

"Is there a part of this string that matches this pattern"

You can also use re to modify or split strings in various ways.

The regular expression mode is compiled into a series of bytecode and then executed by the matching engine written in C.

Regular Expression speech is relatively small and limited (limited function)

Not all string processing can be completed using regular expressions

Character match:

Common characters

Most letters and characters generally match themselves.

Metacharacters:

1. []

It is often used to specify a character set: [ABC]; [A-Z]; [0-9]

Metacharacters do not work in character sets: [AKM $]

Character Set matching that is not in the range: [^ 5]; [^ ABC]


2. ^

Match the beginning of the line. Unless the multiline flag is set, it only starts to match the string. In multiline mode, it can directly match each line break in the string.


3. $

Match the end of a line. The end of a line is defined as either the end of a string or any position after a newline character.


4 .\

Different characters can be added after the backslash to indicate different special meanings.

It can also be used to cancel all metacharacters: \ [] or \\

\ D matches any-decimal number; it is equivalent to [0-9]

\ D matches any non-numeric character, which is equivalent to [^ 0-9]

\ S matches any blank character, which is equivalent to [\ t \ n \ r \ f \ v]

\ S matches any non-blank characters, which is equivalent to [^ \ t \ n \ r \ f \ v]

\ W matches any alphanumeric character, which is equivalent to [a-zA-Z0-9 _]

\ W matches any non-alphanumeric character, which is equivalent to [^ a-zA-Z0-9 _]

5. Repeated

The first function of a regular expression is to match character sets with an indefinite length. Another function is to specify the number of repetitions of a regular expression.

C {8} indicates repeated 8 times C; \ D {8} indicates 8 digits

6 .*

Specify that the previous character can be matched 0 or multiple times, instead of only once. The matching engine tries to repeat as many times as possible (up to 2 billion in the range defined by the integer)

7. +

Indicates matching once or more times,

* The difference between "and" + ": * matches 0 or more times, and + matches one or more times.

8 .?

Match once or zero: you can think it is used to indicate that something is optional.

? If the parent repeats (6, 7), the minimum matching can be performed. *? Matches 0 times A, A +? Indicates matching a once
9 .*?, + ?, ??

Non-Greedy match with the least matching

<*> Match '

10. {m, n}

Both m and n are decimal integers. This delimiter indicates at least m duplicates and at most N duplicates. A \ {1, 3} B

If M is ignored, the lower boundary is 0. If n is ignored, the upper boundary is infinite (actually 2 billion)

{0,} is equivalent to *; {1,} is equivalent to +; {0, 1} is equivalent ?; If yes, it is best to use * + or?

11. |

Or, match only one of the expressions,

A | B. If a matches, B is no longer searched, and vice versa.

12 .(...)

Match any regular expression in brackets

13 .(? #...)

Comment, ignore the content in the brackets

14 .(? = ...)

The string before the expression,

In the 'pythonretest' string (? = Test) will match 'pythonre'

15 .(?!...)

A string not followed by the expression,

If 'pythonre' is not followed by 'test', then (?! Test) will match 'pythonre'

16 .(? <= ...)

Following the expression '... 'The string following the brackets matches the Regular Expression

Regular Expression '(? <= ABC) def 'matches 'def' in 'abcdef'

17 .(? <!...)

The regular expression following the brackets does not follow '... '

2. How to use regular expressions to express Regular Expressions '(? <= ABC) def 'matches 'def type re module in 'abcdef' to provide an interface for the Regular Expression Engine, you can compile restring into objects and use them to match the compiled regular expression: Import Re P = Re. compile (rule) Example:
>>> r1 = r"\d{3,4}-?\d{8}$">>> p_tel = re.compile(r1)>>> p_tel<_sre.SRE_Pattern object at 0x01BEFA30>>>> p_tel.findall("056179882523")['056179882523']

Re. Compile () can also accept optional flag parameters, which are often used to implement different special functions and syntax changes. Re. I must be added during case-insensitive compilation.

>>> c_rel = re.compile(r"ahnu",re.I)>>> c_rel.findall("Ahnu")['Ahnu']>>> 

Regular Expression matching

Match (): determines whether the re matches at the starting position of the string.

Search (): scan the string and find the matching position of this re.

Findall (): finds all substrings matching the RE and returns them as a list

Finditer (): finds all substrings matching the RE and returns them as an iterator.

If no match is successful, the match () and search () Methods return a none value. If the match () method is successful, a matchobject object is returned.

Matchobject instance method:

Group (): returns the string matched by the RE.

Start (): returns the position where the matching starts.

End (): returns the position at which the matching ends.

Span (): returns the position where a tuples contain a match (START, end ).

In actual programs, the most common method is to save the matchobject in a variable, and then check whether it is none.

p = re.compile(...)m = p.match("string goes here")if m :    print "Match found:",m.group()else:    print"match not found"

Module-level functions:

The re module also provides top-level function calls such as match (), search (), sub (), subn (), split (), sindall (), etc.

Sub and subn are replacement functions,

Re. sub (pattern, REPL, string [, Count, flags])

Find all substrings matching the Regular Expression Pattern in string and replace them with another string REPL. If no string matching pattern is found, the unmodified string is returned. Repl can be a string or a function. For regexobject:

Sub (repl, string [, Count = 0])

Examples of this syntax include:

>>> l = "ahnu ahnn ahun sshh ahhh">>> res = "ah..">>> re.sub(res,"python",l)'python python python sshh python'>>> re.subn(res,"python",l)('python python python sshh python', 4)>>> 

Split () is the split function.

>>> res = r"[\+\-\*/]">>> l = "1+6-6*7/9">>> re.split(res,l)['1', '6', '6', '7', '9']>>> 


Compilation flag:

Dotall, S: Make. Match All characters including line breaks

>>> r1 = r"csvt.net">>> re.findall(r1,"csvt.net")['csvt.net']>>> re.findall(r1,"csvtonet")['csvtonet']>>> re.findall(r1,"csvt\nnet")[]>>> re.findall(r1."csvt\tnet")SyntaxError: invalid syntax>>> re.findall(r1,"csvt\tnet",re.S)['csvt\tnet']>>> 

Ignorecase, I: Make the matching case insensitive

Multline, M: multi-row matching, affect ^ and $

>>> s = """ahnu hellohello ahnuahnu i love lyoui love you ahnu""">>> r = r"^ahnu">>> re.findall(r,s)[]>>> re.findall(r,s,re.M)['ahnu', 'ahnu']>>> 

Verbose, X: the verbose status of Res can be used. Ignore spaces and # comments to make it clearer and easier to understand.

>>> s = """\d{3,4}-?\d{7}""">>> re.findall(s,"05617988252")[]>>> re.findall(s,"05617988252",re.X)['05617988252']>>> 

Locale, L: local recognition, matching local special syntax

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.