Python regular-expression learning notes

Last Update:2017-01-13 Source: Internet

Author: User

Tags regular expression first row

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Personally, the main use of it to do some complex string analysis, extract the desired information
Learning principles: Enough on the line, when needed in depth

The summary is as follows:

Special symbols in regular expressions:

"." Table any character
"^" Table string Start
"$" Table string end
"*" "+" "?" followed by the characters, 0--Multiple, 1--Multiple, 0, or one
*?, +?, ?? Match the conditions of the case, matching the minimum//limit *,+,? match the greedy sex
{m} matches previous characters, repeat m times
{m,n} m to n times, M,n can omit

For example, ' a.*b ' is an arbitrary string representing the beginning of a and the end of B.
A{5} matches 5 consecutive a

[] Table A series of characters [ABCD] table a,b,c,d [^a] represents non-a
| a| b denotes a or B, AB is any regular expression in addition to | is greedy if a matches, then does not find B
(...) The function of this bracket is to be understood in conjunction with an instance to extract information

d [0-9]
D Non D
s represents a null character
S non-null character
w [a-za-z0-9_]
W Non w

One: Several functions of RE

1:compile (pattern, [flags])
Generate regular Expression objects from regular expression string pattern and optional flags

Generate regular Expression objects (see two)

Where flags have the following definition:
I denotes case insensitive
L make some special character sets that depend on the current environment
M Multi-line mode makes ^ $ match the start and end of a line in addition to the end of a string
S "." matches any character, including ' n ', otherwise. does not include ' n '
U make W, W, B, B, D, D, s and s dependent on the Unicode character properties database
X This is mainly to say, in order to write regular expressions, more toxic, will ignore some space and # after the comments

where S is more commonly used,
Application forms are as follows
Import re
Re.compile (..., re.) S

2:match (Pattern,string,[,flags])
Let string match, pattern, followed by flag the same as compile parameters
Returns the Matchobject object (see Third)

3:split (Pattern, string[, Maxsplit = 0])
To separate a string with pattern
>>> re.split (' w+ ', ' Words, Words, Words. ')
[' Words ', ' Words ', ' Words ', ']
Bracket ' () ' has special function in pattern, please check the manual

4:findall (pattern, string[, flags])
More commonly used,
Find the pattern-compliant expression from within a string and return to the list of lists

5:sub (Pattern, REPL, string[, Count])
Repl can be time string, also can function
When Repl is a string,
is to replace the string of strings that match the pattern with the REPL

When Repl is a function, it does not overlap with each other within a string, matching pattern
SUBSTRING, call REPL (substring), and then replace the substring with the return value

>>> re.sub (R ' defs+ ([a-za-z_][a-za-z_0-9]*) s* (s*): ',
... r ' static pyobject*npy_1 (void) n{',
... ' Def myfunc (): ')
' Static pyobject*npy_myfunc (void) n{'

>>> def dashrepl (matchobj):
... if matchobj.group (0) = = '-': return '
... else:return '-'
>>> re.sub ('-{1,2} ', Dashrepl, ' pro--gram-files ')
' Pro–gram files '

Two: Regular Expression objects (Regular Expression Objects)

Generation mode: Via Re.compile (Pattern,[flags]) back

Match (string[, pos[, Endpos]); return String[pos,endpos] Match
Matchobject of pattern (see III)

Split (string[, maxsplit = 0])
FindAll (string[, pos[, Endpos])
Sub (repl, string[, Count = 0])
These functions are the same as in the RE module, except that the calling form is somewhat different

Re. Several functions of several functions and regular expression objects have the same function, but if the same program
These function functions are used many times, and some functions of regular expression objects are more efficient.

Three: Matchobject

Through the Re.match (...) and re.compile (...). Match return

The object has the following methods and properties:

Method:
Group ([Group1, ...])
Groups ([default])
Groupdict ([default])
Start ([group])
End ([group])

The best way to illustrate these functions is to give an example

Matchobj = Re.compile (r) (? pd+). (d*) ")
m = Matchobj.match (' 3.14sss ')
#m = Re.match (?) pd+). (d*) ", ' 3.14sss ')

Print M.group ()
Print M.group (0)
Print M.group (1)
Print M.group (2)
Print M.group (1,2)

Print M.group (0,1,2)
Print m.groups ()
Print m.groupdict ()

Print M.start (2)
Print m.string

The output is as follows:
3.14
3.14
3
14
(' 3′, ' 14′ ')
(' 3.14′, ' 3′, ' 14′ ')
(' 3′, ' 14′ ')
{' int ': ' 3′}
2
3.14sss

So group () and group (0) return, matching the entire expression of the string
In addition group (i) is a regular expression with the first "()" in the matching content
(' 3.14′, ' 3′, ' 14′ ') can best explain the problem.

String substitution
1. Replace all matching substrings
Replaces all substrings in the subject with the regular expression regex with newstring

result, Number = RE.SUBN (regex, newstring, subject) 2. Replace all matching substrings (using regular Expression objects)
Reobj = Re.compile (regex)
result, Number = REOBJ.SUBN (newstring, subject) string split
1. String splitting
result = Re.split (regex, subject) 2. String splitting (using regular expression objects)
Reobj = Re.compile (regex)
result = Reobj.split (subject) match

Several matching usages of the Python regular expression are listed below:

1. Test whether the regular expression matches all or part of the string
Regex=ur "..." #正则表达式
If Re.search (Regex, subject):
Do_something ()
Else
Do_anotherthing ()

2. Test whether the regular expression matches the entire string
Regex=ur "... Z "#正则表达式末尾以Z结束
If Re.match (Regex, subject):
Do_something ()
Else
Do_anotherthing ()

3. Create a matching object, and then get the matching details through the object
Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group ()
Do_something ()
Else
Do_anotherthing ()

4. Gets the substring that the regular expression matches
(Get the part a string matched by the regex)

Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
result = Match.group ()
Else
result = ""

5. Gets the substring that the capturing group matches
(Get the part of a string matched by a capturing group)

Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
result = Match.group (1)
Else
result = ""

6. Gets the substring that the named group matches
(Get the part of a string matched by a named group)

Regex=ur "..." #正则表达式
Match = Re.search (regex, subject)
If match:
result = Match.group ("groupname")
Else
result = ""

7. Put all the matching substrings in the string into the array
(Get an array of all regex matches in a string)

result = Re.findall (regex, subject) 8. Traverse All matching substrings
(Iterate over all matches in a string)

For the match in Re.finditer (R) < (. *?) S*.*?/1> ", subject)
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group () 9. Create a regular expression object from a regular expression string
(Create an object to use the same regex for many operations)

Reobj = Re.compile (regex)

10. The regular Expression object version of Usage 1
(Use Regex object with If/else branch whether (part of) A string can is matched)

Reobj = Re.compile (regex)
If Reobj.search (subject):
Do_something ()
Else
Do_anotherthing ()

11. The regular Expression object version of Usage 2
(Use Regex object for If/else branch whether a string can be matched entirely)

Reobj = Re.compile (r "z") # End of regular expression ends with Z
If Reobj.match (subject):
Do_something ()
Else
Do_anotherthing ()

12. Create a regular Expression object, and then get the matching details through the object
(Create an object and details about how the Regex object matches (part of) a string)

Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group ()
Do_something ()
Else
Do_anotherthing ()

13. Get matching substring with regular expression object
(Use Regex object to get the part of a string matched by the regex)

Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
result = Match.group ()
Else
result = ""

14. Get the substring matched by the capturing group with the regular expression object
(Use Regex object to get the part of a string matched by a capturing group)

Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
result = Match.group (1)
Else
result = ""

15. Get the substring matched by the named group with the regular expression object
(Use Regex object to get the part of a string matched by a named group)

Reobj = Re.compile (regex)
Match = Reobj.search (subject)
If match:
result = Match.group ("groupname")
Else
result = ""

16. Get all matching substrings and put them in the array with the regular expression object
(Use Regex object-get "an array of ' all" regex matches in a string)

Reobj = Re.compile (regex)
result = Reobj.findall (subject) 17. Traversal of all matching substrings through a regular expression object
(Use Regex object to iterate the all matches in a string)

Reobj = Re.compile (regex)
For match in Reobj.finditer (subject):
# match Start:match.start ()
# Match End (exclusive): Match.end ()
# matched Text:match.group ()

Non-greedy, multiple-line matching regular expression examples

Tips for some regular:

1 Non-greedy flag

>>> Re.findall (R "A (d+?)", "a23b")
[' 2 ']
>>> Re.findall (R "A (d+)", "a23b")
[' 23 '] Note the comparison of this situation:

>>> Re.findall (R "A (d+) b", "a23b")
[' 23 ']
>>> Re.findall (R "a" (d+?) B "," a23b ")
[' 23 ']

2 If you want to match multiple lines, then add re. S and RE.M logo
Re. S:. Will match line breaks, default. Do not match newline characters

>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b")
[]
>>> Re.findall (R "A (d+) b.+a (d+) b", "a23bna34b",

Re. S
[(' 23 ', ' 34 ')]
>>>re. The m:^$ flag will match each row, and the default ^ and $ will only match the first row

>>> Re.findall (r "^a (d+) b", "a23bna34b")
[' 23 ']
>>> Re.findall (r "^a (d+) b", "a23bna34b", re. M
[' 23 ', ' 34 '] but, if there is no ^ sign,

>>> Re.findall (R "A (d+) b", "a23bna23b")
[' 23 ', ' 23 '] visible, is not required re. M

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More