Python -- re module and regular expression, pythonre
I. re Module
- Common Methods in re Module
-
- Additional:
- Almost all methods in the re module carry the flags parameter. flags indicates the matching mode, which may include:
- Re. I (re. IGNORECASE): case insensitive
- Re. M (MULTILINE): In MULTILINE mode, the behavior of '^' and '$' is changed.
- Re. S (DOTALL): Any point matching mode, changing the behavior '.'
- Re. L (LOCALE): Make the pre-defined character class \ w \ W \ B \ B \ s \ S depends on the current region settings
- Re. U (UNICODE): Make the predefined character class \ w \ W \ B \ B \ s \ S \ d \ D depend on the character attribute defined by unicode
- Re. X (VERBOSE): VERBOSE mode. In this mode, the regular expression can be multiple rows, ignore blank characters, and add comments.
- Re. findall (pattern, string, flags = 0) is used to filter all matching content in string based on pattern. The returned value is a list.
# Normal matching s = "zhangsan lisi wanger mazi" ret = re. findall ('A', s,) # returns the matched content print (ret) # list: ['any', 'any ', 'any'] # flagss = "zhAngsan lisi wAnger mazi" ret = re. findall ('A', s, re. i) print (ret) # ['any', 'any', 'any']
Findall priority:
Import reret = re. findall ('www. (baidu | oldboy ). com ', 'www .oldboy.com') print (ret) # ['oldboy '] This is because findall will first return the content in the matching result group. If you want to match the result, if you cancel the permission, ret = re. findall ('www. (? : Baidu | oldboy). com ', 'www .oldboy.com') print (ret) # ['www .oldboy.com ']
- Re. match (pattern, string, flags = 0): it is the same as search, but it starts matching at the start of the string, but it is similar to startswith, And the return value is
Object. You need. group () to obtain the value. NOTE: If no matching content exists, None is returned.
S = "zhangsan lisi wAnger mazi" ret = re. match ('zh ', s) # Return Value, <_ sre. SRE_Match object; span = (0, 2), match = 'zh '> print (ret. group () s = "zhangsan lisi wAnger mazi" ret = re. match ('A', s) print (ret) # Noneprint (ret. group () # error
- Re. subn (pattern, repl, string, count = 0, flags = 0) to replace the matched content. The returned value is a tuples (string, replacement times)
S = "zhAngsan lisi wAnger mazi" ret = re. subn ('A', 'A', s, 1) #1-count specify the number of replicas print (ret) # ('zhangsan lisi wanger mazi', 2) s = "zhAngsan lisi wAnger mazi" def demo (): return 'A' ret = re. subn ('A', demo (), s) # The repl function can return the print (ret) # ('zhangsan lisi wanger mazi', 2)
- Re. sub (pattern, repl, string, count = 0, flags = 0) is consistent with subn, but the returned value is different. The returned value of sub is a string.
- Re. complie (pattern, flags = 0) function: similar to the built-in function complie, which compiles regular expression rules into a regular expression object.
Obj = re. compile ('\ d {3}') # compile a regular expression into a regular expression object. The rule matches three numbers ret = obj. search ('abc123eeee ') # The regular expression object calls search. The parameter is the string to be matched, print (ret. group () # Result 123
- Re. finditer (pattern, string, flags = 0) function: filter all matching content based on pattern. Similar to findall, the return value is different. The finditer return value is an iterator.
S = "zhangsan lisi wanger mazi" ret = re. finditer ('A', s) print (ret) # <callable_iterator object at 0x028A85B0> # print (next (ret) # <_ sre. SRE_Match object; span = (2, 4), match = 'any'> print (next (ret ). group () # after an next. group () for itr in ret: print (itr. group () #
- Re. search (pattern, string, flags = 0) function: filter the first match in string based on pattern and return. Returned value: If a matching item can be found, the returned value is an object. If no matching item is found
Match. The returned value is None.
s = "zhangsan lisi wanger mazi"print(re.search('an', s).group()) # an
- Re. split (pattern, string, maxsplit = 0, flags = 0) function: splits string based on pattern. The returned value is a list.
S = "zhangsan | lisi | wanger | mazi" ret = re. split (R' \ | ', s) print (ret) # ['hangsan', 'lisi', 'wanger', 'mazi'] ret = re. split (R' \ | ', s, 1) # maxsplit = 1, specify the number of splits print (ret) # ['hangsan', 'lisi | wanger | mazi']
Re. split () Priority:
S = "zhangsan | lisi | wanger | mazi" ret1 = re. split (R' \ | ', s) print (ret1) # ['hangsan', 'lisi', 'wanger', 'mazi'] does not retain matching items used for segmentation. Ret2 = re. split (R' (\ |) ', s) print (ret2) # ['hangsan',' | ', 'lisi',' | ', 'wanger ', '|', 'mazi'] # matching rules are enclosed in parentheses, and the matching items used for segmentation can be retained.
Ii. Regular Expression
- Metacharacters
#. (Point) match any character except the line break # \ w match letters, numbers, or underscores # \ s match any blank character # \ d match digits # \ W match non-letters, numbers, or underscores #\ S matching non-blank characters # \ D matching any non-numbers # \ n matching a linefeed # \ t matching a tab # \ B matching the end of a word # ^ matching the start of a string # $ matching the end of a string # a | B matches a or B #(..) matches the expressions in parentheses, which also indicates a group # [...] match the characters in the character group # [^...] match characters in a non-character group
- Quantifiers
- Greedy match, non-Greedy match
- The above quantifiers are greedy by default.
- Non-Greedy match: add? You can change to a non-Greedy mode.
#*? Repeat any time, but as few as possible # +? Repeat once or more times, but as few as possible #?? Repeated 0 or 1 times, but as few as possible # {n, m }? Repeat n to m times, but repeat as few times as possible # {n ,}? Repeated more than n times, but as few as possible
- .*? Usage
- Escape Character
- \ Transfer special characters
- ? Usage