Python log processing (ii) using regular expressions to handle Nginx logs

Source: Internet
Author: User
Tags timedelta

Using regular expressions to handle Nginx logs

One

Group regular matches for a single row of logs, returning the matching results (in dictionary format):

From datetime import datetimeimport re# single-line log logline = "183.60.212.153--[19/feb/2013:10:23:29 +0800]" get/o2o/media.ht Ml?menu=3 http/1.1 "16691"-"mozilla/5.0" (compatible; Easouspider; +http://www.easou.com/search/spider.html) "'" #对每行匹配正则, extract the matched dictionary def Extract (line):    pattern = ' ' (? P<remote_addr>[\d\.] {7,}) - - (?:\ [(? p<datetime>[^\[\]]+) \]) "(? p<request>[^ "]+)" (? p<status>\d+) (? p<size>\d+) "(?: [^"]+) "" (? p<user_agent>[^ "]+" "    regex = re.compile (pattern)    Matcher = Regex.match (line)    return Matcher.groupdict () #日志格式key与对应的处理函数 # Write a new dictionary, Key,valueprint (extract (Logline))

Output Result:

{' request ': ' get/o2o/media.html?menu=3 http/1.1 ', ' size ': ' 16691 ', ' remote_addr ': ' 183.60.212.153 ', ' status ': ' 200 ', ' DateTime ': ' 19/feb/2013:10:23:29 +0800 ', ' user_agent ': ' mozilla/5.0 (compatible; Easouspider; +http://www.easou.com/search/spider.html) '}

  

Two

The above results are then subdivided into parts of the content, such as the following four parts:

' Request ': ' Get/o2o/media.html?menu=3 http/1.1 '
' Size ': ' 16691 '
' Status ': ' 200 '
' DateTime ': ' 19/feb/2013:10:23:29 +0800 '

Request can be subdivided by the method, request address (URL), protocol version (Protocol)
Size can be converted directly to an integer instead of a string
Status can also convert bit integers
DateTime can be converted to other formats (2013-02-19 10:23:29+08:00)

Time Format parsing string

%a weekday English abbreviation Sun, Mon, ..., Sat
%A weeks of English spelling Sunday, Monday, ..., Saturday
%w the number of days of the week represents the format, 0 is Sunday, 1 is Monday ... 6 is Saturday.
%d days 01, 02, ..., 31
%b month English abbreviation Jan, Feb, ..., Dec
%Y 4-bit decimal integer year 0001, 0002, ..., 2013, 2014, ..., 9998, 9999
%H hours Hour (24 hour) 00, 01, ..., 23
%I Hours Hour (12 hours) 01, 02, ..., 12
%M minutes of 0 filled decimal integers Minute (01,02,03...59)
0 populated decimal integer Second (01,02,03...59) for%s seconds
%z time zone offset UTC time zone offset size (empty), +0000,-0400, +1030

From datetime import datetimeimport re# single-line log logline = "183.60.212.153--[19/feb/2013:10:23:29 +0800]" get/o2o/media.ht Ml?menu=3 http/1.1 "16691"-"mozilla/5.0" (compatible; Easouspider; +http://www.easou.com/search/spider.html) "'" #对每行匹配正则, extract the matched dictionary def Extract (line): pattern = ' ' (? P<remote_addr>[\d\.] {7,}) - - (?:\ [(? p<datetime>[^\[\]]+) \]) "(? p<request>[^ "]+)" (? p<status>\d+) (? p<size>\d+) "(?: [^"]+) "" (? p<user_agent>[^ "]+" "regex = Re.compile (pattern) Matcher = Regex.match (line) return matcher.groupdict () # Requests are cut into request mode (method), request address (URL), protocol version (Protocol) def Convert_request (Request): Return Dict ((' method ', ' url ', ' Protocol '), Request.split ())) def convert_time (timestr): Formatstr = '%d/%b/%y:%h:%m:%s%z ' ts = Datetime.strptime (tim ESTR,FORMATSTR) return ts# log format key with corresponding handler function, further to the Log format processing ' request ': ' get/o2o/media.html?menu=3 http/1.1 ' log_format_   Func = {' request ': convert_request, ' size ': int, ' status ': int, ' DateTime ': Convert_time} #写入新字典, key,valued = {}for k,v in extract (Logline). Items (): # print (k,v) d[k] = Log_format_f Unc.get (K,lambda x:x) (v) print (d)

Output Result:

{' request ': {' method ': ' GET ', ' protocol ': ' http/1.1 ', ' url ': '/o2o/media.html?menu=3 '}, ' remote_addr ': ' 183.60.212.153 ', ' datetime ': Datetime.datetime (2, 0, Max, Max, Max, Tzinfo=datetime.timezone (Datetime.timedelta), ' size ' ': 16691, ' status ': $, ' user_agent ': ' mozilla/5.0 (compatible; Easouspider; +http://www.easou.com/search/spider.html) '}

  

Three

Request and DateTime processing functions are shortened to lambda expressions

From datetime import Datetimeimport relogline = "183.60.212.153--[19/feb/2013:10:23:29 +0800]" Get/o2o/media.html?me Nu=3 http/1.1 "16691"-"mozilla/5.0" (compatible; Easouspider; +http://www.easou.com/search/spider.html) "" ' Def Extract (line): pattern = ' ' (? P<remote_addr>[\d\.] {7,}) - - (?:\ [(? p<datetime>[^\[\]]+) \]) "(? p<request>[^ "]+)" (? p<status>\d+) (? p<size>\d+) "[^"]+ "" (? p<user_agent>[^ "]+" "regex = Re.compile (pattern) Matcher = Regex.match (line) if Matcher:return {    K:ops.get (k, Lambda x:x) (v) for K, V in Matcher.groupdict (). Items ()} else:raise Exception (' No match ') Ops = { ' datetime ': Lambda timestr:datetime.strptime (timestr, "%d/%b/%y:%h:%m:%s%z"), ' request ': Lambda request:dict (Zip (  (' method ', ' URL ', ' protocol '), Request.split ()), ' status ': int, ' size ': int}if __name__ = = ' __main__ ': Log_pro = Extract (logline) print (Log_pro) # for K, V in Log_pro.items (): # print (k, V 

Output Result:

{' remote_addr ': ' 183.60.212.153 ', ' request ': {' url ': '/o2o/media.html?menu=3 ', ' method ': ' GET ', ' protocol ': ' http/1.1 ' }, ' status ': $, ' size ': 16691, ' datetime ': Datetime.datetime (2, +, ten, Max, Tzinfo=datetime.timezone (datetime). Timedelta (0, 28800)), ' user_agent ': ' mozilla/5.0 (compatible; Easouspider; +http://www.easou.com/search/spider.html) '}remote_addr:183.60.212.153request: {' url ': '/o2o/media.html?menu=3 ', ' Method ': ' GET ', ' protocol ': ' http/1.1 '}status:200size:16691datetime:2013-02-19 10:23:29+08:00user_agent:mozilla/ 5.0 (compatible; Easouspider; +http://www.easou.com/search/spider.html)

  

Python log processing (ii) using regular expressions to handle Nginx logs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.