Python+pandas Analysis of Nginx log instances

Source: Internet
Author: User
Below for everyone to share an example of Python+pandas analysis Nginx log, with a good reference value, I hope to be helpful to everyone. Come and see it together.

Demand

By analyzing the Nginx access log, we get the maximum response time, minimum, average and number of accesses for each interface.

Implementation principle

The Nginx log uriuriupstream_response_time field is stored in the dataframe of pandas, which is then implemented by grouping and data statistic functions.

Realize

1. Preparatory work


#创建日志目录 for storing the log mkdir/home/test/python/log/log# create file for storing $uri $upstream _response_time fields extracted from the Nginx log touch/home/test /python/log/log.txt# installation related modules Conda Create-n science numpy scipy matplotlib pandas# installation generate EXECL table related modules pip install XLWT


2. Code implementation


#!/usr/local/miniconda2/envs/science/bin/python#-*-coding:utf-8-*-#统计每个接口的响应时间 # Please create Log.txt in advance and set Logdirimport Sysimport Osimport Pandas as Pdmulu=os.path.dirname (__file__) #日志文件存放路径logdir = "/home/test/python/log/log" # Log related fields required to store statistics Logfile_format=os.path.join (Mulu, "Log.txt") print "read from logfile \ n" for Eachfile in Os.listdir (logdir ): Logfile=os.path.join (Logdir,eachfile) with open (logfile, ' R ') as Fo:for line in Fo:spline=line.split () #过滤字段中异常部 Sub if spline[6]== "-": Pass elif spline[6]== "GET": Pass elif spline[-1]== "-": Pass Else:with Open (LOGFI  Le_format, ' a ') as Fw:fw.write (Spline[6]) fw.write (' \ t ') Fw.write (Spline[-1]) fw.write (' \ n ') print "output Panda "#将统计的字段读入到dataframe中reader =pd.read_table (logfile_format,sep= ' t ', engine= ' python ', names=[" interface "," Reponse_time "], header=none,iterator=true) loop=truechunksize=10000000chunks=[]while loop:try:chunk=reader.get_ Chunk (chunksize) chunks.append (chunk) except Stopiteration:loop=false print "Iteration is stopped. " Df=pd.concat (chunks) #df =df.set_index ("interface") #df =df.drop (["GET", "-"]) df_groupd=df.groupby (' interface ') df_ Groupd_max=df_groupd.max () df_groupd_min= df_groupd.min () df_groupd_mean= Df_groupd.mean () df_groupd_size= df_ Groupd.size () #print df_groupd_max#print df_groupd_min#print df_groupd_meandf_ana=pd.concat ([df_groupd_max,df_ groupd_min,df_groupd_mean,df_groupd_size],axis=1,keys=["Max", "Min", "average", "count"]) print "Output Excel" Df_ Ana.to_excel ("Test.xls")


3. The printed form is as follows:

Points

1. If the log file is large, read not to use ReadLines (), ReadLine (), will read all the logs to memory, resulting in memory full. Therefore, the use of the in-line-fo iteration in this way, basically does not account for memory.

2. Read the Nginx log, you can use Pd.read_table (Log_file, sep= ", iterator=true), but here we set the SEP does not match the normal segmentation, so first the Nginx split with split, And then deposit Pandas.

3. Pandas provides IO tools to read large file blocks, use different tile sizes to read and then call Pandas.concat connections Dataframe

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.