Python+pandas Analysis of Nginx log instances

Last Update:2018-04-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Below for everyone to share an example of Python+pandas analysis Nginx log, with a good reference value, I hope to be helpful to everyone. Come and see it together.

Demand

By analyzing the Nginx access log, we get the maximum response time, minimum, average and number of accesses for each interface.

Implementation principle

The Nginx log uriuriupstream_response_time field is stored in the dataframe of pandas, which is then implemented by grouping and data statistic functions.

Realize

1. Preparatory work

#创建日志目录 for storing the log mkdir/home/test/python/log/log# create file for storing $uri $upstream _response_time fields extracted from the Nginx log touch/home/test /python/log/log.txt# installation related modules Conda Create-n science numpy scipy matplotlib pandas# installation generate EXECL table related modules pip install XLWT

2. Code implementation

#!/usr/local/miniconda2/envs/science/bin/python#-*-coding:utf-8-*-#统计每个接口的响应时间 # Please create Log.txt in advance and set Logdirimport Sysimport Osimport Pandas as Pdmulu=os.path.dirname (__file__) #日志文件存放路径logdir = "/home/test/python/log/log" # Log related fields required to store statistics Logfile_format=os.path.join (Mulu, "Log.txt") print "read from logfile \ n" for Eachfile in Os.listdir (logdir ): Logfile=os.path.join (Logdir,eachfile) with open (logfile, ' R ') as Fo:for line in Fo:spline=line.split () #过滤字段中异常部 Sub if spline[6]== "-": Pass elif spline[6]== "GET": Pass elif spline[-1]== "-": Pass Else:with Open (LOGFI  Le_format, ' a ') as Fw:fw.write (Spline[6]) fw.write (' \ t ') Fw.write (Spline[-1]) fw.write (' \ n ') print "output Panda "#将统计的字段读入到dataframe中reader =pd.read_table (logfile_format,sep= ' t ', engine= ' python ', names=[" interface "," Reponse_time "], header=none,iterator=true) loop=truechunksize=10000000chunks=[]while loop:try:chunk=reader.get_ Chunk (chunksize) chunks.append (chunk) except Stopiteration:loop=false print "Iteration is stopped. " Df=pd.concat (chunks) #df =df.set_index ("interface") #df =df.drop (["GET", "-"]) df_groupd=df.groupby (' interface ') df_ Groupd_max=df_groupd.max () df_groupd_min= df_groupd.min () df_groupd_mean= Df_groupd.mean () df_groupd_size= df_ Groupd.size () #print df_groupd_max#print df_groupd_min#print df_groupd_meandf_ana=pd.concat ([df_groupd_max,df_ groupd_min,df_groupd_mean,df_groupd_size],axis=1,keys=["Max", "Min", "average", "count"]) print "Output Excel" Df_ Ana.to_excel ("Test.xls")

3. The printed form is as follows:

Points

1. If the log file is large, read not to use ReadLines (), ReadLine (), will read all the logs to memory, resulting in memory full. Therefore, the use of the in-line-fo iteration in this way, basically does not account for memory.

2. Read the Nginx log, you can use Pd.read_table (Log_file, sep= ", iterator=true), but here we set the SEP does not match the normal segmentation, so first the Nginx split with split, And then deposit Pandas.

3. Pandas provides IO tools to read large file blocks, use different tile sizes to read and then call Pandas.concat connections Dataframe

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python+pandas Analysis of Nginx log instances

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python+pandas Analysis of Nginx log instances

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support