R Language Analysis Nginx log

Source: Internet
Author: User
Tags ggplot

Nginx Log Example
172.16.1.1 - - [04/Feb/2015:23:40:01 +0800] "POST /api/message/query HTTP/1.1" 200 52 "-" "Apache-HttpClient/4.2    (java 1.5)" "-" "message.test.com" "172.16.3.159" "-" "0.116" "-" "0.116" "-" remote_addr_ac_logon

Remove the time, URL, request size, remove the sub(/\[/,"",$4) brackets in the period, and sub(/Feb/,"2",$4) replace the Fed with 2

awksubfunction is used to replace the string, the statement is used alone, if the assignment statement is used, if a=sub(/Feb/,"2",$4) a=1 so, returns the number of replacements

cat message-access.log | awk ‘BEGIN {print "time,url,size"} {sub(/\[/,"",$4);sub(/Feb/,"2",$4);print $4","$7","$10}‘ > message-time.log

Then use the R language to import this file, using ggplot2 paint, in the R language by using ddply functions to do statistical grouping, ddply see: ddply use

But the R language to do statistical grouping efficiency is low, the file size is above 1G, the memory pressure is very large, then use uniq or awk do grouping more appropriate

sortAnd uniqStatistics grouping

Count the number of requests per second and use the space at the beginning of each line in SED last year's final result

cat message-time.log | awk -F‘,‘ ‘{print $1}‘| sort -rn | uniq -c | sed ‘s/^[][ ]*//g‘> message-time-count.log

But uniq can only count the number of occurrences, cannot count the accumulated value

awkStatistics grouping

Request packet size per minute, group by time,sum (size), take the first 16 characters of the Time field, remove the last second: substr($1,0,16) print the results in the end statement, and set the seconds to 0

cat message-time.log | awk -F"," ‘{a[substr($1,0,16)]+=$3}END{for(i in a) print i"0",a[i]}‘ > message-time-size.log
Drawing

Load Package

library(ggplot2)library(scales)

R language Read file, as.is=TRUE character does not gofactor

message = read.csv(‘e:/R/message-time-size.log‘,                   as.is=TRUE,                   header=FALSE,                   sep = ",",                   col.names=c(‘time‘,‘size‘))

Convert to TIME type

message$time = as.POSIXlt(strptime(message$time,"%d/%m/%Y:%H:%M:%S"))

Byte to kb

message$size<- message$size / 1024

Draw "Time-flow graph", x-axis display a value every 1 hours, display format only show hours

ggplot(message,aes(x=time,y=size)) +  geom_line() +  labs(title="时间-流量图",y=‘size(KB)‘) +  scale_x_datetime(breaks=date_breaks("1 hour"),labels= date_format("%H"))

Save picture

ggsave(filename=‘e:/R/时间-流量图-分.jpg‘,width=15,height=8)

Statistics URL Access times graph

ggplot(message)+  geom_bar(aes(x=url)) +  coord_flip() +  labs(x=‘url‘,y=‘count‘)

Pie chart

ggplot(message)+  geom_bar(aes(x=factor(1),fill=url)) +  coord_polar(theta=‘y‘) +  labs(x=‘‘,y=‘‘)ggsave(filename=‘e:/R/url-饼状图.jpg‘)

Attach a flow chart



R Language Analysis Nginx log

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.