Sharing a small script for real-time log analysis in python

Source: Internet
Author: User

Sharing a small script for real-time log analysis in python

Preface

As we all know, Web O & M always focuses on the Real-Time 2xx/s, 4xx/s, 5xx/s, response time, bandwidth, and other indicators of relevant domain names. The previous logs are separated by five minutes, we can simply use awk. Now, because we want to push logs to ELK, we have to split the logs one day if we continue to split the logs in the previous five minutes. After changing to a one-day split, it is obviously not appropriate to continue using Shell, so I wrote it in Python.

The method is as follows:

The script mainly uses the seek and tell functions of the file. The principles are as follows:

1. Add crontab and execute it every 5 minutes.

2. only analyze the logs from the end position of the last log file read to the end of the file read.
You can use zabbix_sender to send the result to zabbix server or directly use zabbix agent to read the data from this file. The Code is as follows:

#! /Usr/bin/env python # coding: utf-8from _ ure _ import divisionimport osLOG_FILE = '/data0/logs/nginx/xxxx-access_log' POSITION _ FILE = '/tmp/position. log 'status _ FILE = '/tmp/http_status' # crontab execution time CRON_TIME = 300def get_position (): # first read log FILE, POSITION_FILE is empty if not OS. path. exists (POSITION_FILE): start_position = str (0) end_position = str (OS. path. getsize (LOG_FILE) fh = open (POSITION_FILE, 'w') fh. Write ('start _ position: % s \ n' % start_position) fh. write ('end _ position: % s \ n' % end_position) fh. close () OS. _ exit (1) else: fh = open (POSITION_FILE) se = fh. readlines () fh. close () # The POSITION_FILE content is not two rows if len (se) due to other unexpected conditions )! = 2: OS. remove (POSITION_FILE) OS. _ exit (1) last_start_position, last_end_position = [item. split (':') [1]. strip () for item in se] start_position = last_end_position end_position = str (OS. path. getsize (LOG_FILE) # start_position> end_position # print start_position, end_position if start_position> end_position: start_position = 0 # elif start_position = end_position: OS. _ exit (1) # print start_position, end_position fh = open (POSITION_FILE, 'w') fh. write ('start _ position: % s \ n' % start_position) fh. write ('end _ position: % s \ n' % end_position) fh. close () return map (int, [start_position, end_position]) def write_status (content): fh = open (STATUS_FILE, 'w') fh. write (content) fh. close () def handle_log (start_position, end_position): log = open (LOG_FILE) log. seek (start_position, 0) status_2xx, status_403, status_404, status_500, status_502, clerk, clerk, status_all, rt, bandwidth = 0, 0, 0, 0, 0, 0, 0, 0 while True: current_position = log. tell () if current_position> = end_position: break line = log. readline () line = line. split ('') host, request_time, time_local, status, bytes_sent = line [1], line [3], line [5], line [10], line [11] # print host, request_time, time_local, status, bytes_sent status_all + = 1 try: rt + = float (request_time.strip ('s') bandwidth + = int (bytes_sent) failed T: pass if status = '000000' or status = '000000': status_2xx + = 1 elif status = '000000 ': status_403 + = 1 elif status = '000000': status_404 + = 1 elif status = '000000': status_500 + = 1 elif status = '000000 ': status_502 + = 1 elif status = '000000': status_503 + = 1 elif status = '000000': status_504 + = 1 log. close () # print "Maid: % s \ nstatus_403: % s \ nstatus_404: % s \ nstatus_500: % s \ nstatus_502: % s \ nstatus_503: % s \ nstatus_504: % s \ nstatus_all: % s \ nrt: % s \ nbandwidth: % s \ n "% (status_2xx/CRON_TIME, status_403/CRON_TIME, status_404/CRON_TIME, status_500/CRON_TIME, response/CRON_TIME, status_503/CRON_TIME, status_504/CRON_TIME, status_all/CRON_TIME, rt/status_all, bandwidth/CRON_TIME) write_status ("status_2xx: % s \ response: % s \ nstatus_500: % s \ nstatus_502: % s \ nstatus_503: % s \ nstatus_504: % s \ nstatus_all: % s \ nrt: % s \ nbandwidth: % s \ n "% (status_2xx/CRON_TIME, response/CRON_TIME, response/CRON_TIME, status_500/CRON_TIME, response/CRON_TIME, response/CRON_TIME, status_504/CRON_TIME, status_all/CRON_TIME, rt/status_all, bandwidth/CRON_TIME) if _ name _ = '_ main _': start_position, end_position = get_position () handle_log (start_position, end_position)

View the analysis result:

cat /tmp/http_statusstatus_2xx: 17.3333333333status_403: 0.0status_404: 1.0status_500: 0.0status_502: 0.0status_503: 0.0status_504: 0.0status_all: 20.0rt: 0.0782833333333bandwidth: 204032.0

Later, I found a problem. start_position and end_position may have a problem when comparing strings, as shown below:

In [5]: '99772400' > '100227572'Out[5]: TrueIn [6]: int('99772400') > int('100227572')Out[6]: False

Therefore, the correction is:

# Start_position> end_position # print start_position, end_positionif int (start_position)> int (end_position): start_position = 0 # elif int (start_position) when the log stops rolling) = int (end_position): OS. _ exit (1)

Summary

The above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, please leave a message, thank you for your support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.