Symbol Analysis of Common Python Regular Expressions and python Regular Expressions

Source: Internet
Author: User
Tags expression engine

Symbol Analysis of Common Python Regular Expressions and python Regular Expressions

The understanding of Regular Expressions in Python is mainly about the understanding of symbols. This article analyzes the regular expression symbols commonly used in Python. The main symbols are:

.
By default, one character is matched, excluding line breaks. If DOTALL is set, the line breaks are matched.

^
Match the beginning of a row

$
Match the end of a row

*
Matches 0 or multiple duplicates.

+
Match one or more duplicates

?
Matches one or zero duplicates.

*?, + ?,??
Match by non-Greedy Mode

{M}, {m, n}, {m, n }?
Match m duplicates, m to n duplicates, and m to n duplicates in non-Greedy mode.

\
Escape

[]
[Abc], [a-z] [^ a-z]
|
Or match 'a | B'
(...)
Matching Group

(? ILmsux )(? :...)(? P <name>...) >>> re. match ('(? P <name> abc) {2} ', 'abcabc'). groupdict () {'name': 'abc '}(? P = name) >>> re. match (R '(? P <name> abc )----(? P = name) ', 'abc ---- abc'). group () 'abc ---- abc '(? #...) # The following content is a comment (? = ...)

The content following the matched string needs to be matched.

>>> re.match(r'phone(?=\d{3})','phone123').group()'phone'#(?!...)

The matched characters cannot match

>>> re.match(r'phone(?!\d{3})','phoneabc123').group()'phone'(?<=...)

Match before the matched string

(? <!...) Matched characters cannot match

(? (Id/name) yes-pattern | no-pattern)
\ Number
\ A matches the start of A string
\ B match word boundary

\ B
\ B's antsense

\ D indicates [0-9]
\ D indicates [^ 0-9]
\ S indicates [\ t \ r \ n \ f \ v]
\ S is a non-white space character
\ W is equivalent to [a-zA-Z0-9]
\ W \ w antsense

\ Z: End of matching string


Retrieving text information using Python Regular Expressions

Import re
Text = open(r'file named log.txt '). read () # read the text first
Sys_bok = text. split ('bck ') # split sys and bok into two parts
Syss = sys [0]. split ('\ n ')
Bcks = sys [1]. split ('\ n ')
Print 'sys'
For sys in syss:
S = re. findall (R' [0-9] + ', sys)

Print ''. join (s)

Print 'bck'
For bck in bcks:
B = re. findallre. findall (R' [0-9] + ', bck)
Print ''. join (B)

In this way, the following data format is output:
Sys
20 12 79
20 13 81
20 14 12
Bck
20 12 164
20 13 278
20 14 128

Python Regular Expressions

You are wrong. R "2x \ + 5y" indicates that "\" in the string is not escaped;
In a regular expression, "\ +" indicates escaping "+" in a regular expression, because "+" has a special meaning in a regular expression, this is irrelevant to the escape of strings.

More clearly, you write "\ +" or "r" \ + "in the program and save a" \ "and a" + "in the memory ", as long as the Regular Expression Engine reads a continuous "\" and "+" from the memory, it will understand that you want to match the character "+.

Therefore, if you do not write r before the string, the regular expression string should be written as follows:
"2x \ + 5y | 7y-3z"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.