The daily analysis of the log of the brothers really can't afford to, often need to give pv,uv, independent IP and some statistics, use C/c++,java can write, the process is such, read the file, scan by line, put the value of the tag into the data structure, row weight to arrive at the final result, In fact, Linux itself has a very powerful text processing capabilities, can be used shell + some text gadgets to produce results.
The Nngix output of the access log file is as follows:
Log file Code
Copy Code code as follows:
192.168.1.166--119272312 [05/nov/2011:16:06:59 +0800] "get/index.html http/1.1" 370 "http://192.168.1.201/" "Chrom e/15.0.874.106 ""-"
192.168.1.166--119272312 [05/nov/2011:16:06:59 +0800] "get/poweredby.png http/1.1" 3034 "http://192.168.1.201/" "C hrome/15.0.874.106 ""-"
192.168.1.177--1007071650 [05/nov/2011:16:06:59 +0800] "Get/favicon.ico http/1.1" 404 3650 "-" "chrome/15.0.874.106" " -"
192.168.1.178--58565468 [05/nov/2011:16:17:40 +0800] "get/http/1.1" 3700 "-" "mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ""-"
192.168.1.166--119272312 [05/nov/2011:16:17:40 +0800] "get/nginx-logo.png http/1.1" 370 "http://192.168.1.201/" "C hrome/15.0.874.106 ""-"
PV is very simple, roughly is to count the number of visits to a URL, such as the number of visits to statistics/index.html
Copy Code code as follows:
grep "/index.html"/var/log/nginx/access.log–c
UV, we according to user identification (fourth column), first you need to intercept the string, use the Cut command, split the space symbol,-D "", and then take the fourth column-F 4, then you need to row weight, need to use the Uniq tool, Uniq fast, but based on the nearest row weight, the previous one will be the same weight, The interval between the different, it is not, this must use the sort tool to sort the identifiers, sorted and then use the Uniq tool to achieve the purpose, between us with the pipe symbolic link, and finally the Wc–l output statistics
For example, we have visited the/index.html page of the UV:
Copy Code code as follows:
grep "/index.html"/var/log/nginx/access.log | Cut–d "" –f 4| Sort | Uniq | Wc–l
Standalone IP:
Suppose we want to count the independent IP of the whole station, then we do not need to use grep to match the specific page, only need to use cat output:
Copy Code code as follows:
Cat/var/log/nginx/access.log | Cut–d "" –f 1 | Sort| Uniq | Wc-l
All wood has the use of powerful awk to complete the basic statistical requirements: