Python Study Notes 4-regular expressions, python Study Notes 4-
Import re -- all functions related to regular expressions are included in the re module.
Re. sub () -- string replacement
>>> import re>>> s= "100 NORTH BROAD ROAD">>> re.sub('ROAD$','RD.',s)'100 NORTH BROAD RD.'>>> s = "100 BROAD">>> re.sub('\\bROUAD$','RD.',s)'100 BROAD'>>> s='100 BROAD ROAD APT. 3'>>> re.sub(r'\bROAD$','RD.', s) '100 BROAD ROAD APT. 3'>>> re.sub(r'\bROAD\b', 'RD.',s)'100 BROAD RD. APT. 3'
Note:
1) \ B indicates that there must be a separator on the left.
2) the 'R' in front of the regular expression tells python that there are no characters in the string to be escaped. Eg., '\ T' is a tab, and R' \ T' is a character' \ 'followed by a character 'T'
Re. search () -- use a regular expression to match a string. If the match is successful, a matching object is returned. If the match is not successful, none is returned.
>>> import re>>> pattern = '^M?M?M?$'>>> re.search(pattern, 'M')<_sre.SRE_Match object; span=(0, 1), match='M'>>>> re.search(pattern,'MM')<_sre.SRE_Match object; span=(0, 2), match='MM'>>>> re.search(pattern,'MMM')<_sre.SRE_Match object; span=(0, 3), match='MMM'>>>> re.search(pattern,'MMMMM')>>> re.search(pattern,'')<_sre.SRE_Match object; span=(0, 0), match=''>
>>> import re>>> pattern = '^M?M?M?$'>>> re.search(pattern, 'M')<_sre.SRE_Match object; span=(0, 1), match='M'>>>> re.search(pattern,'MM')<_sre.SRE_Match object; span=(0, 2), match='MM'>>>> re.search(pattern,'MMM')<_sre.SRE_Match object; span=(0, 3), match='MMM'>>>> re.search(pattern,'MMMMM')>>> re.search(pattern,'')<_sre.SRE_Match object; span=(0, 0), match=''>
Note:
1 )? -- Indicates that the matching is optional.
2) M {0, 3} -- indicates matching 0 ~ 3 times M
Loose regular expression:
1. Blank characters are ignored. Spaces, tabulation, and carriage return are not matched in the regular expression. To match these characters, you must add the Escape Character '\'.
2. The comment information (starting with # until the end of the line) is ignored.
3. When using a loose regular expression, you must pass the re. VERBOSE parameter.
>>> pattern = '''^ #beginning of stringM{0,3} #thousands - 0 to 3 Ms(CM|CD|D?C{0,3}) #hundreds - 900(CM), 400(CD),0-300 (0 to 3 Cs) or 500-800 #(D, followed by 0 to 3 Cs ) (XC|XL|L?X{0,3}) #tens - 90(XC), 40(XL), 0-30 (0 TO 3 Xs), or 50~80 #(L, followed by 0 to 3 Xs)(IX|IV|V?I{0,3}) #ones - 9 (IX), 4(IV),0-3 (0 to 3 Is), #or 5~8 (V,followed by 0 to 3 Is)$ #end of string'''>>> re.search(pattern, 'M', re.VERBOSE)<_sre.SRE_Match object; span=(0, 1), match='M'>>>> re.search(pattern, 'MCMLXXXIX', re.VERBOSE)<_sre.SRE_Match object; span=(0, 9), match='MCMLXXXIX'>
Case: matching phone number
\ D: -- match all 0-9 Numbers
\ D: -- match all characters except numbers
+: -- Match once or multiple times
*: -- Match 0 or multiple times
>>> phonePattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')>>> phonePattern.search('work 1-(800) 555.1212 #1234').groups()('800', '555', '1212', '1234')
Regular Expression symbols and their meanings:
$ -- End of a string
^ -- Start with a string
X? -- Match 0 times or once x characters
X +: -- match one or more x characters
X *: -- match 0 or multiple x characters
X {m, n} -- indicates matching m ~ N times x characters
X {n} -- match n times x characters
(A | B | c) -- indicates matching a, B, or c
(X) -- this is a combination. The matched string is stored and the matched value is obtained using the groups () method of the returned object in re. search ().
\ D: -- match all 0-9 Numbers
\ D: -- match all characters except numbers
\ B: -- match a word boundary