Web
log Filestatistical analysis, Netstat
command LineStatistical analysis
How to compare awk and Python processing
1. Web Log content---file Form
[email protected]:/var/log/nginx# Cat access.log192.168.1.3--[04/feb/2018:19:59:42 +0800] "get/http/1.1" 200 39 6 "-" "mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/64.0.3282.140 safari/537.36 "192.168.1.3--[04/feb/2018:19:59:43 + 0800] "Get/favicon.ico http/1.1" 404 208 "http://192.168.1.111/" "mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/64.0.3282.140 safari/537.36 "192.168.1.3--[04/feb/2018:19:59:54 + 0800] "get/png http/1.1" 404 208 "-" "mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/64.0.3282.140 safari/537.36 "192.168.1.3--[04/feb/2018:19:59:58 + 0800] "get/a.jpg http/1.1" 404 208 "-" "mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/64.0.3282.140 safari/537.36 "192.168.1.3--[04/feb/2018:20:06:56 + 0800] "get/a.jpg http/1.1" 404 208 "-" "mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (Khtml,Like Gecko) chrome/64.0.3282.140 safari/537.36 "
1.1 Processing with awk
Where $7 is the file name, $ $ is the size. $[$7]+=$10 indicates that the same file name size accumulates. The awk array can also be understood as a Python dictionary.
++B[$7] The number of files with the same name is cumulative.
[email protected]:/var/log/nginx# awk ‘{a[$7]+=$10;++b[$7];total+=$10}END{for(x in a)print b[x],x,a[x]}‘ access.log 1 /png 2082 /a.jpg 4161 / 3961 /favicon.ico 208
1.2 Processing with Python
The counter is used here, and the assignment of 0 for the first time can be summed directly. The additive method is basically consistent with awk.
#/usr/bin/env python3#author infaaffrom collections import Counterc=Counter()s=Counter()with open(‘netstat.txt‘) as f: for line in f: key=line.split()[6] value=line.split()[9] c[key]+=1 s[key]+=int(value)print("次数: %s"%c)print("大小: %s"%s)for i in c: print(i.center(30),c[i],s[i])
Results
2. TCP Status Statistics---command and pipeline form
Where TCP6 and TCP are counted together
2.1 awk Processing
NF indicates number of filed because number accumulates to the last, that is, the last column.
(awk in N number F field R row S split, such as Rs=row split, nr= number of row numbers)
[email protected]:~# netstat -an | awk ‘/^tcp/{++s[$NF]}END{for(i in s){print i,s[i]}}‘LISTEN 6ESTABLISHED 1
2.2 Python Processing
Counter is not used here to determine that when key does not appear obsolete, it needs to be manually initialized to 1.
This receives the Linux STDOUT pipeline form, utilizes the Fileinput library
File net.py
#!/usr/bin/env python3import fileinputd={}for line in fileinput.input(): if line.split()[0].startswith(‘tcp‘): key=line.split()[5] if key in d: d[key]+=1 else:
Pipe call net.py on the server
Description: 2 ways to actually think the same, using the (awk array or Python dictionary) dictionary Way
Number of occurrences: The same element was found to accumulate 1
Size: Find the same element cumulative size
Awk wins in the short and concise. In complex cases, statements are complex. awk syntax complex, long time not to use easy to forget ...
Python wins is simple and straightforward. The code is slightly longer, but good maintenance.
Compare awk python: [File]web log information statistics. [Command]netstat command status statistics