In the actual work, just need to deal with an nginx log, do a simple analysis:
Introduction:
The development already has the log analysis platform and the tool, but in order to investigate one problem, needs to analyze the original log.
Requirements:
In the case where the second-to-last field of the original log is not empty and is not '-', the count of the bottom fourth field is not empty and is not '-' and does not repeat.
The Python script is as follows:
#!/usr/bin/env python#encoding=utf-8# nginx_log_analysis.pyfilehd = open (' Aaa.com_ access.log-20160506 ', ' R ') Filetext = filehd.readlines () filetexttemp = []filetexttempsplit = []aaa_uid = []filehd.close () For i in range (Len (FileText)): Filetexttemp.append (Filetext[i]) filetexttempsplit.append (FileTextTemp[i].split (' ')) For i in range (Len (filetexttempsplit)): For j in range (Len (FileTextTempSplit[i ]): Length = len (Filetexttempsplit[i]) if FileTextTempSplit[i][length-2] != '-' and len (FileTextTempSplit[i ][length-2]) != 0 and filetexttempsplit[i][length-4] != '-' and len (filetexttempsplit[i][length-4]) != 0: aaa_ Uid.append (filetexttempsplit[i][length-4]) "This aaa_uid statistic is not heavy stats_fd = open (' Stats.txt ', ' W ') for aaa_uid in aaa_uid: stats_fd.writelines (aaa_uid+ ' \ n ') STATS_FD.close () " "This is aaa_uid to redo the statistics ' count = 0stats_fd = open (' Stats_uniq.txt ', ' W ') aaa_uid_uniq = list (Set (AAA_UID)) For aaa_uid in aaa_uid_uniq: stats_fd.writelines (aaa_uid+ ' \ n ') count += 1stats_fd.close () Print count
This processes a log that is less than 280MB and time runs the script:
Time nginx_log_analysis.py
Requires more than 14 seconds (one resource is 2 cores, 4GB of memory runs on a virtual machine)
This article is from the Linux and networking blogs, so be sure to keep this source http://khaozi.blog.51cto.com/952782/1771183
Python processing Nginx log, and statistical analysis---I write the processing time is not high, there are good ways, please correct me