Objective
The basic knowledge of regular expressions do not say, interested can click here, extraction is generally divided into two situations, one is to extract the text to extract a single position of the string, the other is to extract a number of consecutive positions in the string. Log analysis will encounter this situation, I will explain the corresponding method.
String extraction at a single location
In this case we can use (. +?) This regular expression to extract. For example, a string "a123b", if we want to extract a value of 123 between AB, you can use FindAll to match the regular expression, which returns a list that contains the matching condition.
The code is as follows:
import re
str = "a123b"
print Re.findall (r "a (. +?) B ", str) #
output [' 123 ']
1.1 Greed and non-greedy match
If we have a string "a123b456b", if we want to match all the values between A and the last B rather than the value between a and the first occurrence of B, you can use the to control the case of both the regular greedy and the non greedy match.
The code is as follows:
import re
str = "a123b456b"
print Re.findall (r "a (. +?) B ", str"
#输出 [' 123 ']#? Control matches only 0 or 1, so only the match between output and the nearest B is printed
Re.findall (r "a (. +) b", str)
#输出 [' 123b456 ']
Print Re.findall (r "a (. *) b", str)
#输出 [' 123b456 ']
1.2 Multiple line matching
If you want to match multiple lines, then you need to add re. S and RE.M logo. Add Re. S after. A newline character will be matched, default. Line breaks will not be matched.
The code is as follows:
str = "a23b\na34b"
Re.findall (R "A (\d+) b.+a (\d+) b", str)
#输出 []
#因为不能处理str中间有 \ n Line Change
Re.findall (R "a (\d+) B.+a (\d+) b ", str, re. S)
#s输出 [(' 23 ', ' 34 ')]
Plus re. After M, the ^$ flag will match each row, and the default ^ and $ will only match the first row.
The code is as follows:
str = "a23b\na34b"
Re.findall (r "^a (\d+) b", str)
#输出 [']
Re.findall (r "^a (\d+) b", str, re. M)
#输出 [' 23 ', ' 34 ']
Second, string extraction from multiple consecutive locations
In this case we can use (?P<name>…)
this regular expression to extract. For example, if we have a row of webserver access logs: '192.168.0.1 25/Oct/2012:14:46:34 "GET /api HTTP/1.1" 200 44 "http://abc.com/search" "Mozilla/5.0"'
We want to extract all the content in this line of log, write multiple (?P<name>expr)
to extract, where name can be changed to the variable you named for the location string, and expr to the correct location.
The code is as follows:
Import re line
= ' 192.168.0.1 25/oct/2012:14:46:34 ' get/api http/1.1 "Http://abc.com/search" "mozilla/"
5.0 "'
reg = Re.compile (' ^ (?) p<remote_ip>[^]*) (? p<date>[^]*) "(?) p<request>[^ "]*"
(?) p<status>[^]*) (? p<size>[^]*) "(?) p<referrer>[^ "]*" "" (?) p<user_agent>[^ "]*")
Regmatch = Reg.match (line)
linebits = regmatch.groupdict ()
print linebits
for K, v. in Linebits.items ():
print K + ":" +v
The results of the output are:
status:200
referrer:
request:get/api http/1.1
user_agent:mozilla/5.0
date:25/oct/ 2012:14:46:34size:44
remote_ip:192.168.0.1
Summarize
The above is the entire content of this article, I hope the content of this article for everyone's study or work can bring certain help, if you have questions you can message exchange.