Shell statistics PV and UV, independent IP methods _linux shell

Source: Internet
Author: User

The daily analysis of the log of the brothers really can't afford to, often need to give pv,uv, independent IP and some statistics, use C/c++,java can write, the process is such, read the file, scan by line, put the value of the tag into the data structure, row weight to arrive at the final result, In fact, Linux itself has a very powerful text processing capabilities, can be used shell + some text gadgets to produce results.

The Nngix output of the access log file is as follows:

Log file Code

Copy Code code as follows:

192.168.1.166--119272312 [05/nov/2011:16:06:59 +0800] "get/index.html http/1.1" 370 "http://192.168.1.201/" "Chrom e/15.0.874.106 ""-"
192.168.1.166--119272312 [05/nov/2011:16:06:59 +0800] "get/poweredby.png http/1.1" 3034 "http://192.168.1.201/" "C hrome/15.0.874.106 ""-"
192.168.1.177--1007071650 [05/nov/2011:16:06:59 +0800] "Get/favicon.ico http/1.1" 404 3650 "-" "chrome/15.0.874.106" " -"
192.168.1.178--58565468 [05/nov/2011:16:17:40 +0800] "get/http/1.1" 3700 "-" "mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; trident/5.0) ""-"
192.168.1.166--119272312 [05/nov/2011:16:17:40 +0800] "get/nginx-logo.png http/1.1" 370 "http://192.168.1.201/" "C hrome/15.0.874.106 ""-"

PV is very simple, roughly is to count the number of visits to a URL, such as the number of visits to statistics/index.html

Copy Code code as follows:

grep "/index.html"/var/log/nginx/access.log–c

UV, we according to user identification (fourth column), first you need to intercept the string, use the Cut command, split the space symbol,-D "", and then take the fourth column-F 4, then you need to row weight, need to use the Uniq tool, Uniq fast, but based on the nearest row weight, the previous one will be the same weight, The interval between the different, it is not, this must use the sort tool to sort the identifiers, sorted and then use the Uniq tool to achieve the purpose, between us with the pipe symbolic link, and finally the Wc–l output statistics

For example, we have visited the/index.html page of the UV:

Copy Code code as follows:

grep "/index.html"/var/log/nginx/access.log | Cut–d "" –f 4| Sort | Uniq | Wc–l


Standalone IP:

Suppose we want to count the independent IP of the whole station, then we do not need to use grep to match the specific page, only need to use cat output:

Copy Code code as follows:

Cat/var/log/nginx/access.log | Cut–d "" –f 1 | Sort| Uniq | Wc-l


All wood has the use of powerful awk to complete the basic statistical requirements:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.