The log logs are formatted as follows
113.221.56.131-[05/feb/2015:18:31:19 +0800] "ab.baidu.com get/media/game/a_.jpg http/1.1" 169334 . ybgj01.com/"" mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Qqwubi 133) "" 113.120.80.216, 113.21.213.35-[05/feb/2015:18:33:22 +0800] "Ab.baidu.net get/media/game/a_.jpg http/1.1 "169334" http://a155622.ybgj7.net/"" mozilla/5.0 (Linux; U Android 4.1.2; ZH-CN; gt-p3100 build/jzo54k) applewebkit/533.1 (khtml, like Gecko) version/4.0 mqqbrowser/5.3 Mobile safari/533.1 V1_AND_SQ_ 5.0.0_146_yyb2_d qq/5.0.0.2215 ""
I wanted to use the shell script.
awk ' {arr[$8]+=$11}end {for (i in arr) print I "\ T" arr[i]} ' Access.log
But it's not easy to match, but also to exercise python skills. So I wrote a little Python script. Please spray lightly.
#!/usr/bin/env python#coding=utf-8# view the log URL directory for Nginx logs and calculate the obtained link size import os,re,sys,datetimereload (sys ) sys.setdefaultencoding (' Utf-8 ') Lastday = datetime.date.today () - datetime.timedelta (days= 1) yesterday = lastday.strftime ('%y-%m-%d ') nginx_log_path = "/usr/local/nginx/logs/ Access.log "+yesterdaypattern_path = re.compile (R ' get\s* (. *) \s*http ') pattern_size = Re.compile (R ' http/1.1 "\s\?*\d{3}\s\?* (\d*)") Def path_size (Log_path): dic = {} f= file (Log_path) for line in f: m_size = pattern_size.search (line) m_path = pattern_path.search (line) if m_path and m_size: size = int (M_size.group (1)) path = m_ Path.group (1) if path in dic: #如果之前有匹配, then initialize the size to the previous value size_init = int (Dic[path]) else: size_init = 0 size = size + size_init dic[path] = size f.close () return dicdef run (): pa_si = path_size ( Nginx_log_path) sor_l = sorted (Pa_si.iteritems (),key = lambda X:x[1] ,reverse = true) #按照url文件的大小倒序 filename = '/tmp/nginx_log_check.log ' + yesterday f = open (filename, ' A + ') for k,v in sor_l: a = '%s\t\t\t%s '% (k,v) print >>f,aif __name__ == ' __main__ ': run ()
This article from the "Road to Learning Prospects" blog, reproduced please contact the author!
A simple Python script statistics the URL and size in the Nginx log