Python processing Nginx log, and statistical analysis---I write the processing time is not high, there are good ways, please correct me

Source: Internet
Author: User
Tags python script


In the actual work, just need to deal with an nginx log, do a simple analysis:


Introduction:

The development already has the log analysis platform and the tool, but in order to investigate one problem, needs to analyze the original log.


Requirements:

In the case where the second-to-last field of the original log is not empty and is not '-', the count of the bottom fourth field is not empty and is not '-' and does not repeat.


The Python script is as follows:


#!/usr/bin/env  python#encoding=utf-8# nginx_log_analysis.pyfilehd = open (' Aaa.com_ access.log-20160506 ', ' R ') Filetext = filehd.readlines () filetexttemp = []filetexttempsplit  = []aaa_uid = []filehd.close () For i in range (Len (FileText)): Filetexttemp.append (Filetext[i])     filetexttempsplit.append (FileTextTemp[i].split ('   ')) For i in range (Len (filetexttempsplit)): For j in range (Len (FileTextTempSplit[i ]): Length = len (Filetexttempsplit[i])                  if FileTextTempSplit[i][length-2] !=  '-'                  and len (FileTextTempSplit[i ][length-2])  != 0                 and filetexttempsplit[i][length-4] !=  '-'                  and len (filetexttempsplit[i][length-4])  != 0:                      aaa_ Uid.append (filetexttempsplit[i][length-4]) "This aaa_uid statistic is not heavy stats_fd = open (' Stats.txt ', ' W ') for  aaa_uid in aaa_uid:    stats_fd.writelines (aaa_uid+ ' \ n ') STATS_FD.close () " "This is aaa_uid to redo the statistics ' count = 0stats_fd = open (' Stats_uniq.txt ', ' W ') aaa_uid_uniq =  list (Set (AAA_UID)) For aaa_uid in aaa_uid_uniq:    stats_fd.writelines (aaa_uid+ ' \ n ')     count += 1stats_fd.close () Print count


This processes a log that is less than 280MB and time runs the script:

Time nginx_log_analysis.py

Requires more than 14 seconds (one resource is 2 cores, 4GB of memory runs on a virtual machine)

This article is from the Linux and networking blogs, so be sure to keep this source http://khaozi.blog.51cto.com/952782/1771183

Python processing Nginx log, and statistical analysis---I write the processing time is not high, there are good ways, please correct me

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.