A few commands
The script is based on the preceding log format. If your log format is different, you need to adjust the parameters following the awk.
Analyze the UserAgent in the log
The code is as follows: |
Copy code |
Cat access_20130704.log | awk-F "'{print $ (NF-3)}' | sort | uniq-c | sort-nr | head-20
|
The above script will analyze up to 20 useragents in the log file
The IP addresses in the analysis log have the most access
The code is as follows: |
Copy code |
Cat access_20130704.log | awk '{print $1}' | sort | uniq-c | sort-nr | head-20 |
Analyze
The maximum number of Url request visits
The code is as follows: |
Copy code |
Cat access_20130704.log | awk-F "'{print $ (NF-5)}' | sort | uniq-c | sort-nr | head-20 |
Next, let's go to the topic. The power of grep is embodied in N many aspects. Here, we use the grep regular expression to analyze nginx logs. To make it easier to use it multiple times, write scripts. Temporarily named nla. sh
You can modify the grep items as needed to get the desired results.
The code is as follows: |
Copy code |
#! /Bin/bash ######################################## ######### # # This is a default nginx log analysis script # Mainly use grep for work # # Considering that many people prefer to split logs by date and gz # Added a simple gz format judgment # The file will be restored after the log analysis is complete # Ccshaowei # gmail.com #2012/05/13 # Http://shaowei.info/ ######################################## ######## # Modify the following line as the log location Log_dir = 'Access. log -*' ########## # $ Key is a grep keyword, $ word is a prompt, one-to-one correspondence is required, and the number is the same ########## # Http code Key [0] = '" 200 [0-9] {3} '; word [0] = 'http 200' Key [1] = '" 206 [0-9] {3} '; word [1] = 'http 206' Key [2] = '" 404 [0-9] {3} '; word [2] = 'http 404' Key [3] = '" 503 [0-9] {3} '; word [3] = 'http 503' ########## # Seo/seo.html "target =" _ blank "> Search engine crawler Key [4] = 'bot bot. * google.com/bot.html'{word=4}}'{googlecrawlers' Key [5] = 'baidider Ider. * baidu.com/search/spider.html'?word=5='every hundred-degree Spider' Key [6] = 'bingbot. * bing.com/bingbot.htm'?word=6='{bingcrawler' # Soso 'sosospider. * soso.com/webspider.htm' # YoudaoBot. * youdao.com/help/webmaster/spider /' # Yahoo China 'Yahoo! Slurp China' ########## # Browser Key [7] = 'msie'; word [7] = 'msi' Key [8] = 'Gecko/. * Firefox '; word [8] = 'Firefox' Key [9] = 'applewebkit. * like Gecko '; word [9] = 'webkit' Key [10] = 'Opera. * Presto '; word [10] = 'Opera' #360 secure 'msie. * 360SE 'or the ie kernel version 'msie 6.0. * 360SE ''' MSIE 7.0. * 360SE ''' MSIE 8.0. * 360SE ''' MSIE 9.0. * 360SE' #360 QPS 'applewebkit. * QIHU 360EE' ########## # Operating system Key [11] = 'windows NT 6.1 '; word [11] = 'windows 7' Key [12] = 'Macintosh; Intel Mac OS X'; word [12] = 'Mac OS X' Key [13] = 'x11. * Linux '; word [13] = 'Linux with x11' Key [14] = 'Android; '; word [14] = 'Android' # Windows series win2000 'Windows NT 5.0 'winxp 'Windows NT 5.1 'winvasta' Windows NT 6.0 'win7 'Windows NT 100' # SymbianOS 'symbianos' ########## # Device Key [15] = 'iPad. * like Mac OS X'; word [15] = 'iPad' Key [16] = 'Nokia '; word [16] = 'Nokia series' Key [17] = 'nokia5800 '; word [17] = 'nokia5800 XpressMusic' # IPhone 'iPhone. * like Mac OS X' ########## # Others Key [18] = 'Get/. *. mp3 http'; word [18] = "access mp3 files" Key [19] = 'Get/. *. jpg http'; word [19] = "access jpg files" # End of configuration ######################################## ###################################### Log_num =$ (ls $ {log_dir} | wc-l) Fileid = 0 Isgz = 0 # Gz check For file in $ (ls $ {log_dir }) Do If ["$ {file # *.}" = "gz"]; then Isgz [$ fileid] = 1 Gzip-dvf $ file Logfile [$ fileid] =$ (echo $ file | sed's/. gz $ //') (Fileid ++ )) Else Isgz [$ fileid] = 0 Logfile [$ fileid] = $ file (Fileid ++ )) Fi Done # Check whether the number of keys and words is consistent If [$ {# word [@]}-ne $ {# key [@]}] Then Echo "configuration error, the number of keys and word is inconsistent" Else Checkid = 0 While [$ checkid-lt $ log_num] Do Filename =$ {logfile [$ checkid]} Totle = $ (cat $ filename | wc-l) Echo "logs $ {filename} total $ {totle} lines, need to process $ {# key [@]} items" Echo "number of source IP addresses: $ (cat $ filename | awk '{print $1}' | sort | uniq | wc-l )" I = 0 While [$ I-lt $ {# key [@]}] Do S1 =$ {word [$ I]} S2 = $ (cat $ filename | grep ''" $ {key [$ I]} "'' | wc-l) S3 =$ (awk 'In in {printf "%. 2f % n", ('$ s2'/'$ totle') * 100 }') Echo "$ {s3 }$ {s1 }:: {s2 }" (I ++ )) Done (Checkid ++ )) Echo "-----------------" Done Fi # Restore a compressed file Gzid = 0 While [$ gzid-lt $ log_num] Do If ["$ {isgz [$ gzid]}" = "1"] Then Gzip-v $ {logfile [$ gzid]} Fi (Gzid ++ )) Done |
The running result is as follows:
[Root @ hostname temp] # ls-lh
Total usage 299 M
-Rw-r ----- 1 root 11 M May 14 13:25 access.log-20120508.gz
-Rw-r ----- 1 root 158 M May 14 13:25 access. log-20120509
-Rw-r ----- 1 root 2.2 M May 14 13:25 access.log-20120510.gz
-Rw-r ----- 1 root 129 M May 14 13:25 access. log-20120511
-Rwxr-xr-x 1 root 3.4 K May 14 13:10 nla. sh
[Root @ hostname temp] # sh nla. sh
Access.log-20120508.gz: 93.5% -- replaced with access. log-20120508
Access.log-20120510.gz: 93.9% -- replaced with access. log-20120510
Log access. log-20120508 contains 643281 lines and 20 items need to be processed
Number of source IP addresses: 7483
44.52% http 200: 286400
3.55% http 206: 22824
20.23% http 404: 130128
14.31% http 503: 92029
1.94% from Google crawlers: 12491
2.01% from Baidu Spider: 12943
0.90% from Bing crawlers: 5780
76.53% MSIE: 492291
2.21% Firefox: 14209
7.03% Webkit: 45215
0.27% Opera: 1736
25.17% Windows 7: 161935
1.37% Mac OS X: 8830
0.03% Linux with X11: 202
0.03% Android: 190
0.11% iPad: 677
0.50% Nokia Series: 3207
0.02% Nokia5800 XpressMusic: 102
36.06% access mp3 files: 231959
23.10% access jpg files: 148600
-----------------
Log access. log-20120509 contains 608316 lines and 20 items need to be processed
Number of source IP addresses: 7429
45.15% http 200: 274651
1.79% http 206: 10884
15.59% http 404: 94854
19.95% http 503: 121376
2.83% from Google crawlers: 17245
1.80% from Baidu Spider: 10970
0.23% from Bing crawlers: 1410
78.96% MSIE: 480324
1.28% Firefox: 7783
7.85% Webkit: 47774
0.43% Opera: 2597
22.85% Windows 7: 139022
0.63% Mac OS X: 3827
0.06% Linux with X11: 389
0.06% Android: 372
0.06% iPad: 351
0.19% Nokia Series: 1158
0.00% Nokia5800 XpressMusic: 4
34.94% access mp3 files: 212555
23.46% access jpg files: 142702
-----------------
Log access. log-20120510 contains 141224 lines and 20 items need to be processed
Number of source IP addresses: 2040
50.15% http 200: 70823
1.67% http 206: 2354
14.15% http 404: 19987
17.37% http 503: 24534
4.53% from Google crawlers: 6399
2.66% from Baidu Spider: 3754
0.44% from Bing crawlers: 622
69.34% MSIE: 97921
1.19% Firefox: 1682
9.54% Webkit: 13470
0.53% Opera: 742
19.37% Windows 7: 27351
1.23% Mac OS X: 1737
0.03% Linux with X11: 45
0.00% Android: 0
0.09% iPad: 130
0.86% Nokia Series: 1220
0.00% Nokia5800 XpressMusic: 0
30.29% access mp3 files: 42777
23.91% access jpg files: 33768
-----------------
Log access. log-20120511 contains 473259 lines and 20 items need to be processed
Number of source IP addresses: 5093
44.91% http 200: 212551
1.96% http 206: 9286
15.14% http 404: 71671
21.20% http 503: 100322
2.44% from Google crawlers: 11548
1.40% from Baidu Spider: 6616
3.40% from Bing crawlers: 16068
76.75% MSIE: 363224
0.93% Firefox: 4388
6.75% Webkit: 31937
0.31% Opera: 1444
28.62% Windows 7: 135444
0.43% Mac OS X: 2057
0.02% Linux with X11: 116
0.00% Android: 0
0.09% iPad: 419
0.23% Nokia Series: 1094
0.00% Nokia5800 XpressMusic: 0
35.77% access mp3 files: 169274
22.46% access jpg files: 106299
-----------------
Access. log-20120508: 93.5% -- replaced with access.log-20120508.gz
Access. log-20120510: 93.9% -- replaced with access.log-20120510.gz
[Root @ hostname temp] # ls-lh
Total usage 299 M
-Rw-r ----- 1 root 11 M May 14 13:25 access.log-20120508.gz
-Rw-r ----- 1 root 158 M May 14 13:25 access. log-20120509
-Rw-r ----- 1 root 2.2 M May 14 13:25 access.log-20120510.gz
-Rw-r ----- 1 root 129 M May 14 13:25 access. log-20120511
-Rwxr-xr-x 1 root 3.4 K May 14 13:10 nla. sh