Introduction to the Python regular expression module
Python has added the RE module since version 1.5, which provides a Perl-style regular expression pattern. Previous versions of Python 1.5 provide EMACS-style mode through the Regex module. Emacs style mode is less readable and functionally weak, so try not to use the Regex module when writing new code, but occasionally you may find it in the old code.
In essence, a regular expression (or RE) is a small, highly specialized programming language (in Python) that is embedded in Python and implemented through the RE module. With this small language, you can specify rules for the corresponding set of strings you want to match, which may contain English statements, e-mail addresses, Tex commands, or whatever you want to fix. Then you can ask, "Does this string match the pattern?" "or" does a part of this string match the pattern? ”。 You can also use RE to modify or split strings in various ways.
The regular expression pattern is compiled into a sequence of bytecode, which is then executed by a matching engine written in C. In advanced usage, you might also want to pay careful attention to how the engine executes the given re, and how to write the re in a specific way to make the bytecode run faster. This article does not involve optimization because it requires that you have a good grasp of the internal mechanism of the matching engine.
Regular expression languages are relatively small and limited (limited functionality), so not all string processing can be done with regular expressions. Of course, some tasks can be done with regular expressions, but the final expression becomes unusually complex. In these situations, it may be better to write Python code for processing, although Python code is much slower than a neat regular expression, but it is easier to understand.
A more common use of regular expressions is to find all pattern-matched strings and replace them with different strings. The Sub method provides a replacement value, which can be a string or a function, and a string to be processed.
Grammar:
Re.sub (Pattern, REPL, string[, Count])
Returns the replaced string after each matched substring in string is replaced with REPL.
When Repl is a string, you can use \id or \g, \g reference grouping, but you cannot use number 0.
When Repl is a method, this method should only accept one parameter (the match object) and return a string for substitution (the returned string cannot be referenced in the grouping).
Count is used to specify the maximum number of replacements, not all when specified.
RE.SUBN (Pattern, REPL, string[, Count])
Returns (Sub (REPL, string[, Count]), number of replacements).
Case:
#coding =utf-8import restr = "https://i.cnb1logs.co2m/Edi3tPosts.asp4x?opt=999" Pattern=re.compile (R ' (\.) ') print ' \. : ', re.sub (Pattern, '-', str) pattern=re.compile (R ' \ \ ([^*]+)/') print '/([^*]+]/: ', Re.sub (Pattern,r ') <em>\1<em> ', str) pattern = Re.compile (R ' (\w+) (\w+) (\d+) ') #先切片测试print re.split (pattern,str) Print re.sub ( Pattern,r ' \3 \1 ', str) #subn统计sub替换次数print re.subn (pattern,r ' \3 \1 ', str)
Output
\. : https://i-cnb1logs-co2m/edi3tposts-asp4x?opt=999\/([^*]+) \ \: https:<em>/i.cnb1logs.co2m<em> edi3tposts.asp4x?opt=999[' https://i ', ' cn ', ' B ', ' 1 ', ' logs ', ' C ', ' O ', ' 2 ', ' m/', ' Ed ', ' I ', ' 3 ', ' tposts. ', ' as ', ' P ', ' 4 ', ' x?opt= ', ' 9 ', ' 9 ', ' 9 ', ']HTTPS://I.1 cnlogs.2 cm/3 edtposts.4 asx?opt=9 9 (' Https://i.1 cnlogs.2 cm/3 edtposts.4 A Sx?opt=9 9 ', 5) ***repl closed***
More technical articles related to Python in: http://www.cnblogs.com/yoyonow/
Python Regular expression re.sub & re.subn