Nginx Log Example
172.16.1.1 - - [04/Feb/2015:23:40:01 +0800] "POST /api/message/query HTTP/1.1" 200 52 "-" "Apache-HttpClient/4.2 (java 1.5)" "-" "message.test.com" "172.16.3.159" "-" "0.116" "-" "0.116" "-" remote_addr_ac_logon
Remove the time, URL, request size, remove the sub(/\[/,"",$4)
brackets in the period, and sub(/Feb/,"2",$4)
replace the Fed with 2
awk
sub
function is used to replace the string, the statement is used alone, if the assignment statement is used, if a=sub(/Feb/,"2",$4)
a=1
so, returns the number of replacements
cat message-access.log | awk ‘BEGIN {print "time,url,size"} {sub(/\[/,"",$4);sub(/Feb/,"2",$4);print $4","$7","$10}‘ > message-time.log
Then use the R language to import this file, using ggplot2
paint, in the R language by using ddply
functions to do statistical grouping, ddply
see: ddply use
But the R language to do statistical grouping efficiency is low, the file size is above 1G, the memory pressure is very large, then use uniq
or awk
do grouping more appropriate
sort
And
uniq
Statistics grouping
Count the number of requests per second and use the space at the beginning of each line in SED last year's final result
cat message-time.log | awk -F‘,‘ ‘{print $1}‘| sort -rn | uniq -c | sed ‘s/^[][ ]*//g‘> message-time-count.log
But uniq
can only count the number of occurrences, cannot count the accumulated value
awk
Statistics grouping
Request packet size per minute, group by time,sum (size), take the first 16 characters of the Time field, remove the last second: substr($1,0,16)
print the results in the end statement, and set the seconds to 0
cat message-time.log | awk -F"," ‘{a[substr($1,0,16)]+=$3}END{for(i in a) print i"0",a[i]}‘ > message-time-size.log
Drawing
Load Package
library(ggplot2)library(scales)
R language Read file, as.is=TRUE
character does not gofactor
message = read.csv(‘e:/R/message-time-size.log‘, as.is=TRUE, header=FALSE, sep = ",", col.names=c(‘time‘,‘size‘))
Convert to TIME type
message$time = as.POSIXlt(strptime(message$time,"%d/%m/%Y:%H:%M:%S"))
Byte to kb
message$size<- message$size / 1024
Draw "Time-flow graph", x-axis display a value every 1 hours, display format only show hours
ggplot(message,aes(x=time,y=size)) + geom_line() + labs(title="时间-流量图",y=‘size(KB)‘) + scale_x_datetime(breaks=date_breaks("1 hour"),labels= date_format("%H"))
Save picture
ggsave(filename=‘e:/R/时间-流量图-分.jpg‘,width=15,height=8)
Statistics URL Access times graph
ggplot(message)+ geom_bar(aes(x=url)) + coord_flip() + labs(x=‘url‘,y=‘count‘)
Pie chart
ggplot(message)+ geom_bar(aes(x=factor(1),fill=url)) + coord_polar(theta=‘y‘) + labs(x=‘‘,y=‘‘)ggsave(filename=‘e:/R/url-饼状图.jpg‘)
Attach a flow chart
R Language Analysis Nginx log