Learn Regular Expression notes (3), learn Regular Expression notes
Python re module: core functions and methods 1. Use the compile () function to compile regular expressions
After importing the re module, compile the Regular Expression in compile (), for example, pattern = re. compile ('regular expression', re. s), and then we can use pattern for matching.
In compile, you can also include the module attributes, such as re. S, re. I, re. L, re. M, and re. X.
2. Matching object and group () and groups () Methods
There are two main methods for matching objects: group () and group (). The object returned by calling match () or search () is a matching object. group () either returns the entire matching object or the special sub-group as required.
Groups () returns only one tuples that contain a unique or all sub-groups. If no sub-group is required, when group () still returns the entire match, groups () returns an empty tuples.
3. Use the match () method to match strings
The match () function tries to match the pattern from the starting part of the string. If the match succeeds, a matching object is returned. If the match fails, None is returned. The group () method of the matching object can be used to display the successful match.
1 html = '14 # ('title', '123 ')
>>> re.match('foo', 'food on the table').group() 'foo'
4. Use search () to search for a string)
The search () method is exactly the same as match,The difference is that search () uses its string parameters.To search for the first matching condition for the given regular expression mode at any location.
If a successful match is found, a matching object is returned. Otherwise, None is returned.
The difference between search () and match () is that search () Searches the middle part of a string.
>>> M = re. match ('foo', 'seafood ') # match failed >>> m = re. search ('foo', 'seafood ') # use search () Instead> if m is not None: m. group ()... 'foo' # search succeeded, but match failed. Search for foo in seafood.
5. repetition, special characters, and grouping
Use a regular expression that matches the email address as an example. (\ W + @ \ w + \. com). This regular expression can only match simple addresses.
To add support for host names before a domain name, such as www.xxx.com, you need to use ?, \ W + @ (\ w + \.)? \ W + \. com, so that (\ w + \.) is optional.
>>> pattern = '\w+@(\w+\.)?\w+\.com' >>> re.match(pattern, 'nobody@xxx.com').group() 'nobody@xxx.com' >>> re.match(pattern, 'nobody@www.xxx.com').group()
'nobody@www.xxx.com'
This example is further extended to allow the existence of any number of intermediate subdomains. Put? Change. \ W + @ (\ w + \.) * \ w + \. com
>>> patt = '\w+@(\w+\.)*\w+\.com' >>> re.match(patt, 'nobody@www.xxx.yyy.zzz.com').group()
'nobody@www.xxx.yyy.zzz.com'
Use parentheses to match and save sub-groups for later processing.
>>> M = re. match ('(\ w)-(\ d)', 'abc-123 ')> m. group () # complete match 'abc-123 '>>> m. group (1) # Sub-group 1 'abc'> m. group (2) # Sub-group 2 '20140901'> m. groups () # all sub-groups ('abc', '123 ')
Group () is usually used to display all matching parts in a normal way, but can also be used to obtain matched sub-groups. You can use the groups () method to obtain a tuples that contain all matching substrings.
6. search and replace using sub () and subn ()
There are two functions/methods for searching and replacing: sub () and subn (). The two are almost the same. They both replace all the matching regular expressions in a string in some form.
The part to be replaced is usually a string, but it may also be a function that returns a string to be replaced.
The difference between subn () and sub () Is that subn () returns a total number of replicas, the string after replacement and the number indicating the total number of replicas are returned as a tuples with two elements.
>>> re.sub('X', 'Mr. Smith', 'attn: X\n\nDear X,\n') 'attn: Mr. Smith\012\012Dear Mr. Smith,\012' >>> >>> re.subn('X', 'Mr. Smith', 'attn: X\n\nDear X,\n') ('attn: Mr. Smith\012\012Dear Mr. Smith,\012', 2) >>> >>> print(re.sub('X', 'Mr. Smith', 'attn: X\n\nDear X,\n'))attn: Mr. Smith Dear Mr. Smith, >>> re.sub('[ae]', 'X', 'abcdef') 'XbcdXf' >>> re.subn('[ae]', 'X', 'abcdef') ('XbcdXf', 2)
7. Extended symbols
By using (? ILmsux) series options. You can specify one or more tags in a regular expression instead of using compile () or other re-module functions.
The following are some examples of using re. I/IGNORECASE. The last example implements multi-row mixing in re. M/MULTILINE:
>>> Re. findall (R '(? I) yes ', 'Yes? Yes. YES !! ')#(? I) Case Insensitive ['yes', 'yes', 'yes']> re. findall (R '(? I) th \ w + ', 'the quickest way is through this tunnel. ') ['the', 'pass', 'this']> re. findall (R '(? Im) (^ th [\ w] + )',"""... this line is the first ,... another line ,... that line, it's the best... ") ['This line is the first ', 'that Line']