jul_31 PYTHON REGULAR EXPRESSIONS

Source: Internet
Author: User

1.Special Symbols and characters

1.1 Single Regex 1

. , Match any character (except \ n)

^, Match start of string

$, Match end of string

*, Match 0 or more occurrences preceding regex

+, Match 1 or more occurrences preceding regex

? , Match 0 or 1 occurrence preceding regex

{N}, Match N occurrences preceding regex

{m,n}, Match from M to N occurrences preceding regex

[...] , Match any single character from character class

[..  X-y:] , Match any of character in the range from X to Y; ["-a],in an ASCII system,all characters that fall between '" ' and "a", which is, between ordinals and 97.

[^...] , do not match any character from character class, including any ranges, if present

(*|+|?|  {})? , Apply "Non-greedy" versiongs of above occurrence/repetition symbols; By default * +? {} are greedy patterns, followed by a '? ' is a non-greedy pattern.

(...) , Match enclosed regex and save as subgroup.

1.2 Single Regex 2

\d, match any decimal digit, same as [0-9] (\d was inverse of \d:do not Match any numeric digit)

\w, Match any alphanumeric character,same as [a-za-z0-9] (\w is inverse of \w)

\s, Match any whitespace character,same as [\n\t\r\v\f] (\s is inverse of \s)

\b, Match any word boundary (\b is inverse of \b)

\ n, Match saved subgroup N (see (...) above); Exam:print (\1,\3,\16)

\c, transferred meaning, without its special meaning;exam:\.,\\,\*

\a (\z), Match start (end) fo string (also see ^ and $ above)

1.3 Complex Regex

(?=...) , forward positive assertion. If the current containing regular expression (here with ... Indicates a successful match at the current position, which means success or failure. Once this part of the regular expression is tried by the matching engine, the match is not continued, and the remaining pattern continues to be attempted where this assertion begins. For example: Love (? =fishc) matches only the string that follows FISHC.

(?!  ...) , forward negative assertions. This is opposite to the positive assertion (the mismatch indicates success, and the match indicates failure). Example: FISHC (?! \.com) matches only the string Fish that is not .com& behind.

(? <= ...) , with a positive assertion of the latter. Just like the positive assertion, the opposite is the direction. Example:(? <=love) FISHC matches only the string fishc that follows love.

(?<!...) , and back to negative assertions. As opposed to a negative assertion, just in the opposite direction. Example: (? <! FISHC) \.com only matches strings that are not fishc. com.

(?:) , the string that matches the subgroup cannot be fetched from behind.

(? (id/name) Yes-pattern|no-pattern), 1. If the number or name of the subgroup exists, try to yes-pattern the matching pattern; otherwise try to no-pattern the matching pattern;

2. No-pattern is optional

Example: (<)? (\[email protected]\w+ (?: \. \w+) +) (? ( 1) >|$ is a regular expression that matches the message format and can match <[email protected]>; and ' [email protected] ', but does not match ' <[email protected ' or ' [ Email protected]> '

1.4 Example of matching email address

Import re

data = ' [email protected] '
data1 = ' <[email protected]> '
data2 = ' <[email protected] '
data3 = ' [Email protected]> '
P1 = ' (<)? (\[email protected]\w+ (?: \. \w+) +) (? ( 1) >|$) '
P2 = ' \[email protected]\w+\.\w+ '
P3 = ' (<)? \[email protected]\w+\.\w+ (? ( 1) >|$) '
M1 = Re.match (P3, data3)
Print (M1.group ())
In the PS:P1 (?: \. \w+) The string representing "\.\w+" in this case will not be fetched later; P1 "(? 1) ">|$" indicates that if there is a "<" in front of it, then ">" is matched here, and if there is no "<" before it, then this matches the Terminator "$", "(1)" in front of the first bracketed string, which is "(<)"; P1 and P3 act like P2 cannot exclude cases where only "<" or ">" is available.

1.5 The RE Modules:core functons and Methods

Match (pattern,string,flags=0), attempt to match pattern to string with optional Flags;return match object on Success,none On Failure;it was start of the string to match.

Search (pattern,string,flags=0), search for first occurrence of the pattern within string with optional Flags;return match obj ECT on Success,none on Failure;it are start of the string to match.

FindAll (Pattern,string[,flags=0]), look for all occurrences of the pattern in String;return a list of matches.

Finditer (Pattern,string[,flags=0]), same as FindAll (), except returns an iterator instead of a list;for each match,the ite Rator returns a Match object.

Split (pattern,string,max=0), split string into a list according to Regex pattern delimiter and return list of successful Matches,aplitting at More Max Times (split all occurrences is the default)

1.6 The usage of "? I" and "? m"

>>> Import Re
>>> Re.findall (R ' (? i) Yes ', ' yes ' yes ')
[' Yes ', ' yes ', ' yes ']
>>> Re.findall (R ' (? i) th\w+ ', ' the quickest-the-through to this tunnel. ')
[' The ', ' through ', ' this ']
>>> Re.findall (R ' (? im) (^th[\w]+) ', ')
... this line is the first,
. .. another line,
... that line,it 's best.
... ")
[' This's the first ', ' That's line ']
>>> Re.findall (R ' (? i) (^th[\w]+) ', ' "
... this line is the first,
. .. another line,
... that line, it's the best.
... ")
[]
>>> Re.findall (R ' (? i) (^th[\w \n,]+) ', ' "
... this line is th,
. .. ANONJKL Line,
... that line,it .
... ")
[]

By using "Multiline" we can perform the search across multiple lines of the target string rather than treating the entire String as a single entity.

1.7 The usage of spilt

Re.split (R ' \s\s+ ', eachline), at least, whitespace.

Re.split (R ' \s\s+|\t ', Eachline.rstrip ()), at least both whitespace or one tablekey;rstrip (), delete the ' \ n '.

1.8 One example

From random import Randrange,choice
From string import Ascii_lowercase as LC
From sys import MAXSIZE
From time import CTime

TLDs = (' com ', ' org ', ' net ', ' gov ', ' edu ')


For I in range (Randrange (5,11)):
Dtint= Randrange (1469880872)
Dstr = CTime (dtint)
Llen = Randrange (4,8)
Login = '. Join (choice (LC) for J in Range (Llen))
Dlen = Randrange (llen,13)
Dom = '. Join (choice (LC) for J in Range (Dlen))
Print ('%s::%[email protected]%s.%s::%d-%d-%d '% (Dstr,login,dom,choice (TLDs), Dtint,llen,dlen))

Result

Sat Nov 7 01:09:06 1998::[email protected]::910372146-5-12
Sat Oct 09:27:56 2015::[email protected]::1445045276-7-9
Sun Nov 06:10:07 1979::[email protected]lyjej.org::311724607-7-8
Wed Jul 17:23:03 1986::[email protected]::522490983-6-9
Tue Feb 02:15:27 1998::[email protected]::888257727-5-8
Thu June 1 14:20:55 1989::[email protected]::612681655-6-10
Mon Mar 6 14:36:59 1978::[email protected]::258014219-7-12
Sun APR 15:01:56 1982::[email protected]::387356516-4-12

1.9 Matching A string

Import re
data = ' Wed 08:42:15 2015::[email protected]::1437525736-347-28 '
#pat_old = ' ^mon|^tue|^wed|^thu|^fri|^sta|^sun '
Pat = ' ^ (mon| tue| wed| thu| fri| sta| Sun) '
m = Re.match (PAT, data)
Print (Type (m))
Print (M.group (0))

Pat2 = ' ^ (\w{3}) '
M2 = Re.match (pat2, data)
Print (Type (m2))
Print (M2.group (1))

PA3 = '. + (\d+-\d+-\d+) '
M3 = Re.search (PA3, data)
Print (Type (m3))
Print (M3.group ())
M4 = Re.match (PA3, data)
Print (M4.group (1))

PA4 = '. +? (\d+-\d+-\d+) '
M5 = Re.match (PA4, data)
Print (M5.group (1))

PA5 = '. +::(\d+-\d+-\d+) '
M6 = Re.match (PA5, data)
Print (M6.group (1))

Result

<class ' _sre. Sre_match ' at 0x89df00>
Wed
<class ' _sre. Sre_match ' at 0x89df00>
Wed
<class ' _sre. Sre_match ' at 0x89df00>
Wed Jul 08:42:15 2015::[email protected]::1437525736-347-28
6-347-28//greedy

1437525736-347-28//because the '? ' behind of '. + ', so none-greedy; (see above in 1.1)

1437525736-347-28

1.10 Greedy and No-greedy

'. + ' is greedy; '. +? ' is not greedy.

jul_31 PYTHON REGULAR EXPRESSIONS

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.