Linux shell and awk statistics log number of visits to the same IP

Source: Internet
Author: User
Tags apache log phpmyadmin myadmin

awk Statistics IP Access times

Now there is a file, the amount of data in the 200多万条 record, want to use the shell of awk to do statistics, the file format is as follows
#关键字 #url#ip Address #
Test|123|1
Test|123|1
Test|123|2
Test2|12|1
Test2|123|1
Test2|123|2
Now want to count the result is: see the same keyword and URL the total number of visits, as well as how many different IP, output to a file
SQL implementation is very simple select keyword, url, count (1), COUNT (distinct IP) GROUP by keyword, URL, but the amount of data is too large, the report can not run out, want to be implemented under the shell, But my shell is not proficient, do not know how to fast implementation, especially the distinct that
The ideal result is:
#关键字 #url# different ip# search times
Test 123 2 3
Test2 123 1 2
Test2 12 1 1

Wk-f "|" ' {a[$1 ' "$2]++;b[$1" "$" "$3]++}" (b[$1 "$" "$3]==1) {++c[$1" "$2]}end{for (i in a) print I,c[i],a[i]} ' file
Test2 123 2 2
Test2 12 1 1
Test 123 2 3


statistics of the day Apache log per IP access times per hour

The log format is as follows:

127.0.0.1--[03/feb/2013:14:18:10 +0800] "get/ucenterrvicecenter/scenterrequest.php http/1.0" 302 242
127.0.0.1--[03/feb/2013:14:18:10 +0800] "get/ucenterrvicecenter/scenterrequest.php http/1.0" 200-
111.111.111.35--[03/feb/2013:14:18:32 +0800] "get/myadmin/http/1.1" 401 933
111.111.111.35-root [03/feb/2013:14:18:33 +0800] "get/myadmin/http/1.1" 200 1826
111.111.111.35-root [03/feb/2013:14:18:34 +0800] "Get/myadmin/main.php?token=67b1c9d29f9ac9107627bb991c8d2ca6 HTTP /1.1 "200 7633
111.111.111.35--[03/feb/2013:14:18:34 +0800] "Get/myadmin/css/print.css?token=67b1c9d29f9ac9107627bb991c8d2ca6 http/1.1 "200 1063
111.111.111.35-root [03/feb/2013:14:18:34 +0800] "get/myadmin/css/phpmyadmin.css.php?token= 67b1c9d29f9ac9107627bb991c8d2ca6&js_frame=right&nocache=1359872314 http/1.1 "200 20322
111.111.111.35-root [03/feb/2013:14:18:34 +0800] "get/myadmin/navigation.php?token= 67B1C9D29F9AC9107627BB991C8D2CA6 http/1.1 "200 1362
111.111.111.35-root [03/feb/2013:14:18:36 +0800] "get/myadmin/css/phpmyadmin.css.php?token= 67b1c9d29f9ac9107627bb991c8d2ca6&js_frame=left&nocache=1359872314 http/1.1 "200 3618

111.111.111.35-root [03/feb/2013:14:18:38 +0800] "get/myadmin/navigation.php?server=1&db=ucenter&table= &lang=zh-utf-8&collation_connection=utf8_unicode_ci http/1.1 "200 9631

The code is as follows:

[Root@localhost sampdb]# awk-vfs= "[:]" ' {gsub ("-.*", "", $); num[$2 "" $1]++}end{for (i in num) print I,num[i]} ' data1
14 127.0.0.1 2
14 111.111.111.35 8

The number of accesses to the same IP in the awk statistics log

The existing log, you need to count the number of times per IP access

180.153.114.199--[03/jul/2013:14:44:43 +0800] get/wp-login.php?redirect_to=http%3a%2f%2fdemo.catjia.com% 2fwp-admin%2fplugin-install.php%3ftab%3dsearch%26s%3dvasiliki%26plugin-search-input%3d%25e6%2590%259c%25e7% 25b4%25a2%25e6%258f%2592%25e4%25bb%25b6&reauth=1 http/1.1 2355-mozilla/4.0-
101.226.33.200--[03/jul/2013:14:45:52 +0800] get/wp-admin/plugin-install.php?tab=search&type=term&s= Photogram&plugin-search-input=%e6%90%9c%e7%b4%a2%e6%8f%92%e4%bb%b6 http/1.1 302 0-mozilla/4.0-
101.226.33.200--[03/jul/2013:14:45:52 +0800] get/wp-login.php?redirect_to=http%3a%2f%2fdemo.catjia.com% 2fwp-admin%2fplugin-install.php%3ftab%3dsearch%26type%3dterm%26s%3dphotogram%26plugin-search-input%3d%25e6% 2590%259c%25e7%25b4%25a2%25e6%258f%2592%25e4%25bb%25b6&reauth=1 http/1.1 2370-mozilla/4.0-
113.110.176.131--[03/jul/2013:15:03:57 +0800] Get/wp-content/themes/catjia-lio/images/menu_hover_bg.png HTTP/1.1 304 0 http://demo.catjia.com/wp-content/themes/catjia-lio/style.css mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) gecko/20100101 firefox/21.0-
180.153.205.103--[03/jul/2013:15:13:59 +0800] get/wp-admin/options-general.php http/1.1 302 0-mozilla/4.0-
180.153.205.103--[03/jul/2013:15:13:59 +0800] get/wp-login.php?redirect_to=http%3a%2f%2fdemo.catjia.com% 2fwp-admin%2foptions-general.php&reauth=1 http/1.1 2269-mozilla/4.0-
101.226.51.227--[03/jul/2013:15:14:07 +0800] Get/wp-admin/options-general.php?settings-updated=true http/1.1 302 0- mozilla/4.0-
101.226.51.227--[03/jul/2013:15:14:07 +0800] get/wp-login.php?redirect_to=http%3a%2f%2fdemo.catjia.com% 2fwp-admin%2foptions-general.php%3fsettings-updated%3dtrue&reauth=1 http/1.1 2291-mozilla/4.0-

I look at, there are too many log records, where to start?

Many people know that the first column of data can be extracted by awk, that is, the IP address.

But after it's been extracted? How do you count the number of times each IP appears?

It's complicated to say complex, but it's easy to use more.

# awk ' {a[$1]+=1;} End{for (i in a) {print a[i] "" I}} ' Demo.catjia.com_access.log
2 180.153.206.26
120 113.110.176.131
2 101.226.33.200
2 101.226.66.175
2 112.65.193.16
2 101.226.51.227
2 112.64.235.86
2 101.226.33.223
1 101.227.252.23
2 180.153.205.103
2 101.226.33.216
2 112.64.235.89
4 180.153.114.199
2 112.64.235.254
2 180.153.206.34

If you want to save the results, you can save them to the text through redirection.

Now the number of each of the same IP has been counted, but if the data is more and more confusing, such as to know the number of visits is the most IP?

Then add a sort order.

# awk ' {a[$1]+=1;} End{for (i in a) {print a[i] "" I}} ' Demo.catjia.com_access.log |sort
1 101.227.252.23
120 113.110.176.131
2 101.226.33.200
2 101.226.33.216
2 101.226.33.223
2 101.226.51.227
2 101.226.66.175
2 112.64.235.254
2 112.64.235.86
2 112.64.235.89
2 112.65.193.16
2 180.153.205.103
2 180.153.206.26
2 180.153.206.34
4 180.153.114.199

Such a look, looks like sort of, but carefully look, appeared 120 IP how ranked second, not should be in the end?

In fact here also need to add a parameter-G, otherwise sort will be sorted by the first character, it will appear as above.

Look at the result of adding a-G parameter

# awk ' {a[$1]+=1;} End{for (i in a) {print a[i] "" I}} ' Demo.catjia.com_access.log |sort-g
1 101.227.252.23
2 101.226.33.200
2 101.226.33.216
2 101.226.33.223
2 101.226.51.227
2 101.226.66.175
2 112.64.235.254
2 112.64.235.86
2 112.64.235.89
2 112.65.193.16
2 180.153.205.103
2 180.153.206.26
2 180.153.206.34
4 180.153.114.199
120 113.110.176.131

Well, that's the result.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.