Python's regular expression operation--re module

Source: Internet
Author: User

Re module

Preface:

The RE module is used to manipulate the regular expressions of Python

'.' The default match is any character except \ n, if you specify flag Dotall, match any character, including the newline ' ^ ' match character beginning, and if you specify flags MULTILINE, this can also match (r "^a", "\nabc\neee", Flags=re. MULTILINE) ' $ ' matches the end of the character, or E.search ("foo$", "BFOO\NSDFSF", Flags=re. MULTILINE). Group () can also ' * ' match the character before the * number 0 or more times, Re.findall ("ab*", "Cabb3abcbbac") results for [' ABB ', ' ab ', ' a '] ' + ' matches the previous character 1 times or more, re.     FindAll ("ab+", "Ab+cd+abb+bba") results [' AB ', ' ABB '] '? '     Match previous character 1 times or 0 times ' {m} ' match previous character M times ' {n,m} ' matches previous character N to M times, Re.findall ("ab{1,3}", "ABB ABC abbcbbb") Results ' ABB ', ' AB ', ' ABB '] ' | ' Match | left or | Right character, re.search ("abc| ABC "," ABCBABCCD "). Group () Results ' ABC ' (...) ' Group match, Re.search (" (ABC) {2}a (123|456) C "," abcabca456c "). Group () results Abcabca456c ' [A-z] ' matches a to Z any one character ' [^ ()] ' matches any character except () R ' escape quotes characters for \ Character Details view ⑦ ' \a ' only matches from the beginning of the character, Re.search ("\aabc", "Alex ABC ") is not matched to the ' \z ' match character end, the same as $ ' \d ' matches the number 0-9 ' \d ' matches the non-numeric ' \w ' match [a-za-z0-9] ' \w ' matches non-[a-za-z0-9] ' \s ' matches whitespace characters, \ t, \ n, \ R, Re.search ("\s+", "Ab\tc1\n3"). Group () result ' \ t ' (? P<name&gt, ...) ' Group Matching Re.search (? P<province>[0-9]{4}) (? P<city>[0-9]{2}) (?P&LT;BIRTHDAY&GT;[0-9]{4}) "," 371481199306143242 "). Groupdict (" city ") result {' Province ': ' 3714 ', ' City ': ' Bayi ', ' birthday ' : ' 1993 '}re. IGNORECASE ignores Case re.search (' (\a|\s) Red (\s+|$) ', I,re. IGNORECASE)

The flag bit is the pattern modifier, does not change the regular expression in the case, through the pattern modifier changes the meaning of the regular expression, so as to achieve some matching results adjustment and other functions:

# Flagsi = IGNORECASE = Sre_compile. Sre_flag_ignorecase # Ignore case matches when ignoring casing L = LOCALE = Sre_compile. Sre_flag_locale # Assume current 8-bit LOCALE   makes localization recognition match u = UNICODE = Sre_compile. Sre_flag_unicode # Assume Unicode locale is       based on Unicode characters and the parse character m = MULTILINE = Sre_compile. Sre_flag_multiline # Make anchors look for newline   multiline match s = Dotall = Sre_compile. Sre_flag_dotall # make dot match newline      let. Matches include line breaks, that is, when the mode is corrected, "." Matches can match any character x = VERBOSE = Sre_compile. Sre_flag_verbose # ignore whitespace and comments

Greedy mode, lazy mode:

Import RERESULT1 =  re.search ("P.*y", "Abcdfphp435pythony_py")   # Greedy mode print (RESULT1) # <_sre. Sre_match object; Span= (5, +), match= ' php435pythony_py ' >result2 =  re.search ("P.*?y", "Abcdfphp435pythony_py")   # Lazy Mode print (RESULT2) # <_sre. Sre_match object; Span= (5,), match= ' php435py ' >

  

Match

From the starting position, match the specified content to the model to the string:

#matchimport re                               obj = Re.match (' \d+ ', ' 123uua123sf ')       #从第一个字符开始匹配一个到多个数字print (obj)                                #<_sre. Sre_match object; Span= (0, 3), match= ' 123 ' >if obj:                                   #如果有匹配到字符则执行, NULL does not execute    print (Obj.group ())                    #打印匹配到的内容 #123

Match IP Address:

Import Reip = ' 255.255.255.253 ' result=re.match (R ' ^ ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \. ( [1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \. '                R ' ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \. ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) $ ', IP) print (result) # <_sre. Sre_match object; span= (0,), match= ' 255.255.255.253 ' >

Search

Matches the specified content (not necessarily the first position) according to the model to match the first

#searchimport  reobj = Re.search (' \d+ ', ' a123uu234asf ')     #从数字开始匹配一个到多个数字print (obj) #<_sre. Sre_match object; Span= (1, 4), match= ' 123 ' >if obj:                                   #如果有匹配到字符则执行, NULL does not execute    print (Obj.group ())                    #打印匹配到的内容 #123import  Reobj = Re.search (' \ ([^ ()]+\) ', ' Sdds (A1fwewe2 (3UUSFDSF2) 34as) F ')     #匹配最里面 () the content of print (obj) #<_sre. Sre_match object; span=, Match= ' (3uusfdsf2) ' >if obj:                                   #如果有匹配到字符则执行, NULL does not execute    print (Obj.group ())                    #打印匹配到的内容 # ( 3UUSFDSF2)

The difference between group and groups:

#group与groups的区别import  rea = "123abc456" b = Re.search ("([0-9]*) ([a-z]*] ([0-9]*)", a) print (b) #<_sre. Sre_match object; span= (0, 9), match= ' 123abc456 ' >print (B.group ()) #123abc456print (B.group (0)) #123abc456print (B.group (1)) # 123print (B.group (2)) #abcprint (B.group (3)) #456print (B.groups ()) # (' 123 ', ' abc ', ' 456 ')

FindAll

Both of these methods are used to match single values, that is, only one of the strings can be matched, and if you want to match all eligible elements in a string, you need to use Findall;findall without group usage

#findallimport  reobj = Re.findall (' \d+ ', ' a123uu234asf ')     #匹配多个if obj:                                   #如果有匹配到字符则执行, NULL does not execute    print (obj )                             #生成的内容为列表 #[' 123 ', ' 234 ']

Sub

Used to replace matching strings (pattern, Repl, String, count=0, flags=0)

#subimport  recontent = "123abc456" new_content = re.sub (' \d+ ', ' ABC ', content) print (new_content) #ABCabcABC

Split

Grouping according to specified matches (pattern, string, maxsplit=0, flags=0)

#splitimport  recontent = "1-2 * ((60-30+1* (9-2*5/3+7/3*99/4*2998+10*568/14))-( -4*3)/(16-3*2))" new_content = Re.split (' \* ', content)       #用 * Split, split into List print (new_content) #[' 1-2 ', ' ((60-30+1 ', ' (9-2 ', ' 5/3+7/3 ', ' 99/4 ', ' 2998+ (' 568/14 ')-( -4 ', ' 3)/(16-3 ', ' 2) ') ']content = ' 1-2 * ((60-30+1* (9-2*5/3+7/3*99/4*2998+10*568/14))-( -4*3)/(16-3*2)  ) ' "new_content = Re.split (' [\+\-\*\/]+ ', content) # new_content = Re.split (' \* ', content, 1) print (new_content) #[" ' 1 ", ' 2  ', ' ((a), ' ', ' 1 ', ' (9 ', ' 2 ', ' 5 ', ' 3 ', ' 7 ', ' 3 ', ' A ', ' 4 ', ' 2998 ', ' ten ', ' 568 ', ' + ') ', #  ' (', ' 4 ', ' 3 ') ', ' (16 ', ' 3 ', ' 2 ') ' "]INPP = ' 1-2* ((60-30 + ( -40-5) * (9-2*5/3 + 7/3*99/4*2998 +10 * 568/14))-( -4*3)/(16-3*2)) ' INPP = Re.sub (' \s * ', ', INPP)                #把空白字符去掉print (inpp) new_content = Re.split (' \ ([\+\-\*\/]?\d+[\+\-\*\/]?\d+) {1}\) ', INPP, 1) print ( new_content) #[' 1-2* ((60-30+ ', '-40-5 ', ' * (9-2*5/3+7/3*99/4*2998+10*568/14))-( -4*3)/(16-3*2)) ']

Supplemental R ' Escape:

fdfdsfds\fdssfdsfds& @$
lzl.py

The first thing to know is that when the program reads the \ characters in the file, it isadded to the list \ \:

Import Re,sysli = []with open (' Lzl.txt ', ' R ', encoding= "Utf-8") as file: For line in    file:        li.append (line) print ( LI)                   # Note: The single slash in the file will turn into a double slash # [' fdfdsfds\\fds\n ', ' sfdsfds& @$ ']print (li[0])                # Print when printed or single slash # fdfdsfds\ Fds

The meaning of the R character , escaping the character \ ,\ Only appears as a character :

Import Re,sysli = []with open (' Lzl.txt ', ' R ', encoding= "Utf-8") as file: For line in    file:        print (Re.findall (R ' s\\ F ', line))  #第一种方式匹配        # Print ("Re.findall (' \\\\ ', line)")  #第二种方式匹配        li.append (line) print (LI)                   # Note: A single slash in the file, read out will turn into a double slash # [' s\\f ']# []# [' fdfdsfds\\fds\n ', ' sfdsfds& @$ ']

add : After reading the following code you may be more ignorant

Import rere.findall (R ' \ \ ', line)  # can only be written in this way can not be written as R ' \ ' so print (R ' \ \ ')            # can only be written as R ' \ ' \ can only be even number # \ \        Result # If you want to print The individual \ Write the following print (' \ \ ')             # can only be a single # \         result

Summary: The single slash in the file \, read out into the program is a double slash \\,print print out is a single slash \; regular match file but slash \ When, with R ' \ \ ' Double slash to match, or do not use R directly with ' \\\\ ' four slash to match

Compile function:

Description

Python provides support for regular expressions through the RE module. The general step for using RE is to use the Re.compile () function, compile the string form of the regular expression into a pattern instance, then use the pattern instance to process the text and get the matching result (a match instance), and finally use the match instance to get the information. To do other things.

To give a simple example, look for all the English characters in a string:

Import Repattern = Re.compile (' [a-za-z] ') result = Pattern.findall (' As3siopdj#@23awe ') print (result) # [' A ', ' s ', ' s ', ' I ' , ' O ', ' P ', ' d ', ' j ', ' A ', ' w ', ' e ']

Match IP address (255.255.255.255):

Import Repattern = Re.compile (R ' ^ ([[1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) \.) {3} ([1-9]?\d|1\d\d|2[0-4]\d|25[0-5]) $ ') result = Pattern.match (' 255.255.255.255 ') print (Result) # <_sre. Sre_match object; span= (0,), match= ' 255.255.255.255 ' >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.