Preface the basic knowledge of regular expressions is not mentioned. if you are interested, click here to extract strings from a single position in the text, the other is to extract strings at multiple consecutive positions. This problem occurs in log analysis. next I will explain the corresponding methods separately. 1. we can use the regular expression (. + & amp; #63;) to extract strings from a single position.
Preface
If you are interested, you can click here to extract strings from a single position, the other is to extract strings at multiple consecutive positions. This problem occurs in log analysis. next I will explain the corresponding methods separately.
1. string extraction at a single position
In this case, we can use (. + ?) This regular expression is used for extraction. For example, if we want to extract the value 123 between the AB strings "a123b", we can use findall in combination with the regular expression to return a list containing the matching values.
The code is as follows:
Import restr = "a123b" print re. findall (r "a (. + ?) B ", str) # output ['123']
1.1 greedy and non-greedy match
If we have a string "a123b456b" and we want to match all values between a and the last B instead of values between a and the first B, can we use? To control the regular expression greedy and non-greedy matching.
The code is as follows:
Import restr = "a123b456b" print re. findall (r "a (. + ?) B ", str) # output ['20140901'] #? The control only matches 0 or 1, so only print re will be output for matching with the nearest B. findall (r "(. +) B ", str) # output ['123b456'] print re. findall (r "(. *) B ", str) # output ['123b456']
1.2 multi-row matching
If you want to match multiple rows, you need to add the re. S and re. M signs. Add re. S. The line break is matched. by default, the line break is not matched.
The code is as follows:
Str = "a23b \ na34b" re. findall (r "a (\ d +) B. + a (\ d +) B ", str) # output [] # re. findall (r "a (\ d +) B. + a (\ d +) B ", str, re. s) # s output [('23', '34')]
After re. M is added, the ^ $ flag matches each row. by default, ^ and $ match only the first row.
The code is as follows:
Str = "a23b \ na34b" re. findall (r "^ a (\ d +) B", str) # output ['23'] re. findall (r "^ a (\ d +) B", str, re. m) # output ['23', '34']
2. string extraction at multiple consecutive locations
In this case, we can use
(?P
…)
This regular expression is used for extraction. For example, if we have a line of webserver access logs:
'192.168.0.1 25/Oct/2012:14:46:34 "GET /api HTTP/1.1" 200 44 "http://abc.com/search" "Mozilla/5.0"'
You can write multiple
(?P
expr)
The name can be changed to the variable named for the string at the position, and the expr can be changed to the regular expression at the position.
The code is as follows:
import reline ='192.168.0.1 25/Oct/2012:14:46:34 "GET /api HTTP/1.1" 200 44 "http://abc.com/search" "Mozilla/5.0"'reg = re.compile('^(?P
[^ ]*) (?P
[^ ]*) "(?P
[^"]*)" (?P
[^ ]*) (?P
[^ ]*) "(?P
[^"]*)" "(?P
[^"]*)"')regMatch = reg.match(line)linebits = regMatch.groupdict()print linebitsfor k, v in linebits.items() : print k+": "+v
The output result is:
status: 200referrer: request: GET /api HTTP/1.1user_agent: Mozilla/5.0date: 25/Oct/2012:14:46:34size: 44remote_ip: 192.168.0.1
Summary
The above is all about this article. I hope this article will help you in your study or work. if you have any questions, please leave a message.
The above is the detailed content of the method for extracting strings using regular expressions in python. For more information, see other related articles in the first PHP community!