Python Basic Learning Regular expression 1 (rule)

Source: Internet
Author: User

Regular Expressions:

* Regular expression (or re) is a small, highly specialized programming language (in Python) that is embedded in Python and implemented through the re module .

-You can specify a rule for the corresponding string you want to match

-The string set may contain English statements, e-mail addresses, commands, or anything you want

-Can you ask "does this string match this pattern?" ”

-"Is there a part of this string that matches this pattern?" ”

-You can also use re to modify or delimit strings in various ways

* The regular expression pattern is compiled into a sequence of bytecode, which is then executed by a matching engine written in C

* Regular expression language is relatively small and limited (limited functionality)

-Not all string processing can be done with regular expressions

* Character Matching

-Normal characters

-most letters and characters usually match themselves

-If the regular expression test will match the string "test" exactly

-Meta characters

. ^ $ * + ? {} \ | ( )

=============================================================

. []

-Used to specify a character set: [ABC]; [A-z]

-Metacharacters do not work in character sets: "akm$"

-complement matches characters that are not within range: "^5"

Example:

>>> import re# in Python If you want to use regular expressions, you need to import the RE module, there are many methods in the RE module # Defining a regular expression is actually defining a string: s = "abc", but this doesn't make any sense # usually when you define an expression, you add an R : s = r "ABC" to define the string of this regular expression >>> s = r "abc" #这个可以理解为规则 >>> re.findall (S, ' aaaaaaa ') [] #re. FindAll method will rule S ratio Compared to the string ' aaaaaaa ', because there is no ABC, so return to null >>> Re.findall (S, ' abcddabc ') [' abc ', ' abc '] #最原始的匹配, by ordinary character matching to itself >>> St = "Top tip tqp Twp Tep" >>> s = r "Top" >>> re.findall (s,st) [' top ']>>> s = R ' t[oe]p ' >>> r E.findall (s,st) [' Top ', ' tep ']# ' develop a character set ' t begins with P end, the middle is an O or E in a >>> s = r "T[^oe]p" >>> re.findall (s,st) [' Tip ', ' tqp ', ' Twp '] "^ matches characters not in interval range", take counter

.^

-Matches the beginning of the line. Unless you set the multiline flag, it is just the beginning of the match string, and in multiline mode it can also directly match each line break in the string.

>>> s = R ' Hello ' >>> b = ' Hello World,hello boy ' >>> re.findall (s,b) [' Hello ', ' hello ']>>& Gt s = R ' ^hello ' >>> re.findall (s,b) [' Hello ']>>> b = ' World,hello boy ' >>> re.fi Ndall (S,B) [] "^ denotes the beginning of the line"


.$

-Matches the end of the line, and the end of the line is defined as either the end of the string or any position following a newline character.

>>> s = R ' boy$ ' >>> re.findall (s,b) [' Boy ']

Meta character metacharacters appear in the character set #注意 "" does not work

>>> t = R ' t[abc$] ' #匹配以a or B or C for the end of the string >>> re.findall (t, ' ta ') [' Ta ']>>> re.findall (t, ' t$ ') [' t$ '] #未生效 >>> t = ' t[^ab] ' >>> re.findall (t, ' AB ') [] #取 Anti >>> r = ' t[abc^] ' >>> re.findall (R, ' t^ ') [' t^ ']

more usage: match a range , notation [0-9], [a-za-z]\[a-za-z0-9]

>>> s = R ' x[0123456789]x ' >>> l = ' x1x x22x xxx ' >>> re.findall (s,l) [' x1x ']>>> s = R ' x[0 -9]x ' >>> re.findall (s,l) [' x1x ']

. \ Escape character

-The backslash can be followed by a different character utilises to indicate different special meanings.

-can also be used to cancel all metacharacters: \[or \ \

\d matches any decimal number: equivalent to class [0-9]

\d matches any non-numeric character: equivalent to class [^0-9]

\s matches any whitespace character: it is equivalent to class [\t\n\r\f\v]

\s with any non-whitespace character: it is equivalent to a class [^\t\n\r\f\v]

\w matches any alphanumeric character: it is equivalent to a class [a-za-z0_9_]

\w matches any alphanumeric character: it is equivalent to a class [^a-za-z0_9_]

Example:

>>> s = R ' x\dx ' >>> l = ' x1x x22x xxx ' >>> re.findall (s,l) [' x1x '] #如有多次重复可以使用 {times}>>> r= R "010-\d\d\d\d\d\d\d\d" >>> Re.findall (R, ' 010-87568745 ') [' 010-87568745 ']>>> r=r ' 010-\d{8} ' > >> Re.findall (R, ' 010-87568745 ') [' 010-87568745 ']>>> re.findall (R, ' 010-8756874 ') []

. Repeat {times}

-Regular Expressions The first function is to be able to match an indefinite length of character set, and another function is that you can specify the number of repetitions of a regular expression part

>>> import re>>> S =r ' ^010-\d{8} ' >>> Re.findall (S, ' 010-36854625 ') [' 010-36854625 ']

.*

-Specifies that the previous character can be matched 0 or more times instead of only once. The matching engine tries to repeat as many times as possible (no more than integer scoping, 2 billion)

>>> a=r ' ab* ' >>> re.findall (A, ' abbbbbb ') [' abbbbbb ']>>> re.findall (A, ' a ') [' a ']

.+

-Indicates a match one or more times

-Note the difference between * and +; * 0 times to several times. + at least once;

>>> a=r ' ab+ ' >>> re.findall (A, ' ACCCCCCCCB ') []>>> re.findall (A, ' accccccccbab ') [' AB ']

.?

-Indicates that the previous symbol was repeated 0 or 1 times

#用于表示某事物是可选的.

>>> s=r ' ^010-?\d{8}$ ' >>> re.findall (S, ' 010-12345678 ') [' 010-12345678 ']>>> re.findall (S, ' 01012345678 ') [' 01012345678 ']>>> re.findall (S, ' 01012345678ABC ') #此处同时需要注意 $ used

+

-Minimum matching mode

>>> s=r ' ab ' >>> re.findall (S, ' Abbbbababb ') [' AB ', ' ab ', ' AB ']

. {M,n}

-where m and n are decimal integers. The qualifier means that there is at least m repetition, up to n repetitions.

>>> s=r ' ab ' >>> re.findall (b, ' Abbbbbababb ') [' abbb ', ' ab ', ' ABB ']

-ignoring m would think that the lower boundary is 0, while ignoring n will result in an infinity on the upper boundary (actually 2 billion)

-{0,} equals *,{1,} equals +, and {0,1} is associated with? Same

>>> s=r ' ab{1,} ' >>> d=r ' ab+ ' >>> re.findall (S, ' Abbbbababb ') [' abbbb ', ' ab ', ' ABB ']>> > Re.findall (d, ' abbbbababb ') [' abbbb ', ' ab ', ' ABB ']


This article from "thinking More than technology" blog, declined reprint!

Python Basic Learning Regular expression 1 (rule)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.