How to extract strings using regular expressions in python

Source: Internet
Author: User
Preface the basic knowledge of regular expressions is not mentioned. if you are interested, click here to extract strings from a single position in the text, the other is to extract strings at multiple consecutive positions. This problem occurs in log analysis. next I will explain the corresponding methods separately. 1. we can use the regular expression (. + & amp; #63;) to extract strings from a single position. Preface

If you are interested, you can click here to extract strings from a single position, the other is to extract strings at multiple consecutive positions. This problem occurs in log analysis. next I will explain the corresponding methods separately.

1. string extraction at a single position

In this case, we can use (. + ?) This regular expression is used for extraction. For example, if we want to extract the value 123 between the AB strings "a123b", we can use findall in combination with the regular expression to return a list containing the matching values.

The code is as follows:

Import restr = "a123b" print re. findall (r "a (. + ?) B ", str) # output ['123']



1.1 greedy and non-greedy match

If we have a string "a123b456b" and we want to match all values between a and the last B instead of values between a and the first B, can we use? To control the regular expression greedy and non-greedy matching.

The code is as follows:

Import restr = "a123b456b" print re. findall (r "a (. + ?) B ", str) # output ['20140901'] #? The control only matches 0 or 1, so only print re will be output for matching with the nearest B. findall (r "(. +) B ", str) # output ['123b456'] print re. findall (r "(. *) B ", str) # output ['123b456']



1.2 multi-row matching

If you want to match multiple rows, you need to add the re. S and re. M signs. Add re. S. The line break is matched. by default, the line break is not matched.

The code is as follows:

Str = "a23b \ na34b" re. findall (r "a (\ d +) B. + a (\ d +) B ", str) # output [] # re. findall (r "a (\ d +) B. + a (\ d +) B ", str, re. s) # s output [('23', '34')]



After re. M is added, the ^ $ flag matches each row. by default, ^ and $ match only the first row.

The code is as follows:

Str = "a23b \ na34b" re. findall (r "^ a (\ d +) B", str) # output ['23'] re. findall (r "^ a (\ d +) B", str, re. m) # output ['23', '34']



2. string extraction at multiple consecutive locations

In this case, we can use

(?P
 
  …)
 

This regular expression is used for extraction. For example, if we have a line of webserver access logs:

'192.168.0.1 25/Oct/2012:14:46:34 "GET /api HTTP/1.1" 200 44 "http://abc.com/search" "Mozilla/5.0"'

You can write multiple

(?P
 
  expr)
 

The name can be changed to the variable named for the string at the position, and the expr can be changed to the regular expression at the position.

The code is as follows:

import reline ='192.168.0.1 25/Oct/2012:14:46:34 "GET /api HTTP/1.1" 200 44 "http://abc.com/search" "Mozilla/5.0"'reg = re.compile('^(?P
 
  [^ ]*) (?P
  
   [^ ]*) "(?P
   
    [^"]*)" (?P
    
     [^ ]*) (?P
     
      [^ ]*) "(?P
      
       [^"]*)" "(?P
       
        [^"]*)"')regMatch = reg.match(line)linebits = regMatch.groupdict()print linebitsfor k, v in linebits.items() : print k+": "+v
       
      
     
    
   
  
 



The output result is:

status: 200referrer: request: GET /api HTTP/1.1user_agent: Mozilla/5.0date: 25/Oct/2012:14:46:34size: 44remote_ip: 192.168.0.1



Summary

The above is all about this article. I hope this article will help you in your study or work. if you have any questions, please leave a message.

The above is the detailed content of the method for extracting strings using regular expressions in python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.