There are a bunch of regular expression rules online, skipped first.
For beginners, it is recommended to use python in vs2010. The debugging function is very useful and you can easily see everything.
It is difficult to understand the difference between the greedy mode and the non-Greedy mode.
Sample Code:
Block = Re. sub (R' (. + ?) ', R'hello \ your void ()', R' * ABC. EFG * ') # non-Greedy mode, matching as few as possible
# Output result block = 'Hello * _ void () helloa_void () hellob_void () helloc_void () Hello. _ void () helloe_void () hellof_void () hellog_void () HELLO * _ void ()'
Block = Re. sub (R' (. +) ', r'hello \ your void ()', R' * ABC. EFG * ') # Greedy mode, matching as much as possible
# Output result block = 'Hello * ABC. EFG * _ void ()'
The above code has obvious differences between greedy and non-greedy.
Code:
line = re.sub(r'\*(.+?)\*', r'<em>\1</em>', 'adfdf*i am a worker.*dfd')
Matches the characters between two * signs and replaces them with <em> I am a worker. <em>
Looking back today, we found that some examples are still very beneficial. The following code can roughly describe the usage of Regular Expressions:
import re
m = re.match(r'(\w+) (\w+)(?P<sign>.*)', 'hello world!')
print "m.string:", m.string
print "m.re:", m.re
print "m.pos:", m.pos
print "m.endpos:", m.endpos
print "m.lastindex:", m.lastindex
print "m.lastgroup:", m.lastgroup
print "m.group(1,2):", m.group(1, 2)
print "m.groups():", m.groups()
print "m.groupdict():", m.groupdict()
print "m.start(2):", m.start(2)
print "m.end(2):", m.end(2)
print "m.span(2):", m.span(2)
print r"m.expand(r'\2 \1\3'):", m.expand(r'\2 \1\3')
Output:
M. String: Hello world!
M. Re: <_ SRE. sre_pattern object at 0x024d93d8>
M. POS: 0
M. endpos: 12
M. lastindex: 3
M. lastgroup: Sign
M. group (1, 2): ('hello', 'World ')
M. Groups (): ('hello', 'World ','! ')
M. groupdict (): {'sign ':'! '}
M. Start (2): 6
M. End (2): 11
M. span (2): (6, 11)
M. Expand (R' \ 2 \ 1 \ 3 '): World Hello!
For compiled regular expressions, use re. Compile
import re
p = re.compile(r'(\w+) (\w+)(?P<sign>.*)', re.DOTALL)
print "p.pattern:", p.pattern
print "p.flags:", p.flags
print "p.groups:", p.groups
print "p.groupindex:", p.groupindex
Output:
P. Pattern: (\ W + )(? P <sign> .*)
P. Flags: 16
P. Groups: 3
P. groupindex: {'sign': 3}
Match (string [, POS [, endpos]) | re. Match (pattern, string [, flags]):
This method will try to match pattern from the string POS subscript; If pattern can still be matched at the end, a match object will be returned; If pattern cannot match during the matching process, or if the match is not completed and the endpos is reached, none is returned.
The default values of POs and endpos are 0 and Len (string), respectively. Re. Match () cannot specify these two parameters. The flags parameter is used to specify the matching mode when compiling pattern.
Note: This method does not fully match. When pattern ends, if the string contains any remaining characters, the operation is still considered successful. To perform a full match, you can add the boundary match '$' At the end of the expression '.
For an example, see section 2.1.
Search (string [, POS [, endpos]) | re. Search (pattern, string [, flags]):
This method is used to search for substrings that can be matched successfully in a string. Match pattern from the POs subscript of string. If pattern can still be matched at the end, a match object is returned. If it cannot be matched, add POs to 1 and try again; if the Pos = endpos still does not match, none is returned.
The default values of POs and endpos are 0 and Len (string) respectively. Re. Search () cannot specify these two parameters. The flags parameter is used to specify the matching mode when compiling pattern.
# encoding: UTF-8
import re
# Compile a regular expression into a pattern object
pattern = re.compile(r'world')
# Search for matched substrings using search (). If no matched substrings exist, none is returned.
# In this example, match () cannot be successfully matched.
match = pattern.search('hello world!')
if match:
print match.group()
World
Split (string [, maxsplit]) | re. Split (pattern, string [, maxsplit]):
Split string by matching substrings and return to the list. Maxsplit is used to specify the maximum number of splits. If not specified, all splits are performed.
import re
p = re.compile(r'\d+')
print p.split('one1two2three3four4')
Output:
['One', 'two', 'three ', 'four', '']
Findall (string [, POS [, endpos]) | re. findall (pattern, string [, flags]):
Search for strings and return all matching substrings in the form of a list.
import re
p = re.compile(r'\d+')
print p.findall('one1two2three3four4')
['1', '2', '3', '4']
Finditer (string [, POS [, endpos]) | re. finditer (pattern, string [, flags]):
Returns an iterator that accesses each matching result (match object) sequentially.
import re
p = re.compile(r'\d+')
for m in p.finditer('one1two2three3four4'):
print m.group(),
1 2 3 4
Sub (repl, string [, Count]) | re. sub (pattern, REPL, string [, Count]):
Use repl to replace each matched substring in the string, and then return the replaced string.
When repl is a string, you can use \ ID, \ G <ID>, \ G <Name> to reference the group, but cannot use number 0.
When repl is a method, this method should only accept one parameter (match object) and return a string for replacement (the returned string cannot reference the group ).
Count is used to specify the maximum number of replicas. If not specified, all replicas are replaced.
import re
p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.sub(r'\2 \1', s)
def func(m):
return m.group(1).title() + ' ' + m.group(2).title()
print p.sub(func, s)
Say I, World Hello!
I say, hello World!
Subn (repl, string [, Count]) | re. sub (pattern, REPL, string [, Count]):
Returns (sub (repl, string [, Count]), replacement times ).
import re
p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.subn(r'\2 \1', s)
def func(m):
return m.group(1).title() + ' ' + m.group(2).title()
print p.subn(func, s)
('Say I, World Hello! ', 2)
('I say, hello World! ', 2)
Reference: http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html