Knowledge Points:
1) array
An array is a variable used to store a series of values that can be indexed to access the values of the array.
An array in awk is called an associative array because its subscript (index) can be a number or a string.
Subscripts are often called keys, and the keys and values of an array element are stored in a table inside the awk program, which takes the hash algorithm, so the array elements are randomly ordered.
Array format: Array[index]=value
1, nginx log analysis
Log format: ' $remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for" '
Logging: 27.189.231.39--[09/apr/2016:17:21:23 +0800] "get/public/index/images/icon_pre.png http/1.1" 44668 "/http Www.test.com/Public/index/css/global.css "" mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ""-"
1) The maximum number of 10 IPs accessed in the statistics log
Idea: The first column is de-weighed, and the number of times the output occurs
Method 1:$ awk ' {a[$1]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log
Method 2:$ awk ' {print $} ' access.log |sort |uniq-c |sort-k1-nr |head-n10
Description: a[$1]++ creates an array A, takes the first column as the subscript, uses the operator + + as the array element, and the element initial value is 0. When processing an IP, the subscript is IP, the element plus 1, processing the second IP, the subscript is IP, the element plus 1, if the IP already exists, then the element plus 1, that is, the IP appears two times, the element result is 2, and so on. Therefore, the weight can be achieved, the number of statistical occurrences.
2) More than 100 times of IP access in the statistics log
Method 1:$ awk ' {a[$1]++}end{for (i in a) {if (a[i]>100) print I,a[i]}} ' Access.log
Method 2:$ awk ' {a[$1]++;if (a[$1]>100) {b[$1]++}}end{for (i in B) {print i,a[i]}} ' Access.log
Description: Method 1 is an IP that is determined to be compliant when the output is saved after the result is stored in the a array. Method 2 is to save the result of the A array, and to determine the IP that meets the requirements in the B array, and finally print the IP B array.
3) Statistics of the maximum number of 10 IPs visited in a day April 9, 2016
Idea: First filter out the log of this time period, and then go to the weight, count the number of occurrences
Method 1:$ awk ' $4>= ' [9/apr/2016:00:00:01 ' && $4<= ' [9/apr/2016:23:59:59 ' {a[$1]++}end{for (i in a) print A[i],i |" Sort-k1-nr|head-n10 "} ' Access.log
Method 2:$ sed-n '/\[9\/apr\/2016:00:00:01/,/\[9\/apr\/2016:23:59:59/p ' access.log |sort |uniq-c |sort-k1-nr |head-n10 #前提 Must exist in start time and end time log
4) Count the number of visits a minute before the current time
Idea: First get the time before the current time corresponding to the log format, and then match the statistics
$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m); awk-vdate= $date ' $0~date{c++}end{print c} ' Access.log
$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m) awk-vdate= $date ' $4>= ' ["Date": "&& $4<=" ["Date": "{C + +} End{print C} ' Access.log
$ GREP-C $ (date-d '-1 minute ' +%d/%b/%y:%h:%m) access.log
Description: Date +%d/%b/%y:%h:%m-09/apr/2016:01:55
5) Statistics The top 10 pages visited ($request)
$ Awk ' {a[$7]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log
6) Count the total size of each URL access content ($body _bytes_sent)
$ Awk ' {a[$7]++;size[$7]+=$10}end{for (i in a) print a[i],size[i],i} ' Access.log
7) Statistics per IP Access status code number 216.WWW. Qixoo.com ($status)
$ awk ' {a[$1 ' "" $9]++}end{for (i in a) print I,a[i]} ' Access.log
8) Statistics of Access status code 404 IP and the number of occurrences
$ Awk ' {if ($9~/404/) a[$1 "" $9]++}end{for (i in a) print I,a[i]} ' Access.log
2, two file comparison
The contents of the file are as follows:
$ Cat A
1
2
3
4
5
6
$ cat B
3
4
5
6
7
8
1) Find the same record
Method 1:www.51969.com/$ awk ' Fnr==nr{a[$0];next} ($ in a) ' a B
3
4
5
6
Before explaining, look at the difference between Fnr and NR:
$ Awk ' {print nr,$0} ' a B
1 1
2 2
3 3
4 4
5 5
6 6
9}
8 4
7 {
10 6
11 7
12 8
$ Awk ' {print fnr,$0} ' a B
1 1
2 2
3 3
4 4
5 5
6 6
1 3
2 4
3 5
4 6
5 7
6 8
You can see that the Nr is processing a row of records, the number will be added 1, but also can be seen awk two files as a merged file processing.
While FNR is processing a row of records, the number is also added 1, but when the second file is processed, the number is counted again.
Description: Fnr and NR are built-in variables. FNR==NR is often used for processing two of files, an example of which awk treats two files as a file.
When processing a file, the FNR is equal to NR, the condition is true, the execution of the A[$0],next expression means that each record is stored in a array as subscript (no element), next is jumping out, similar to continue, does not execute the following expression.
The execution process and so on, until the processing of the B, FNR is not equal to NR (Fnr re-count is 1,nr continue plus 1 is 7), the condition is false, do not perform the following a[$0],next expression, directly execute ($ in a) expression, which means to process the first B file continues to determine whether in a array, If you are printing this record, and so on.
This may be better understood:
$ Awk ' fnr==nr{a[$0]}nr>fnr{if ($ in a) print $} ' a B
Method 2:
$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]) ' A B #小括号可以不加
$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]==1) ' A b$ awk ' fnr==nr{a[$0]=1;next}{if (a[$0]==1) print} ' a B
$ awk ' Fnr==nr{a[$0]=1}fnr!=nr&&a[$0]==1 ' a B
Note: First know that the following a[$0] is not an array, but rather a subscript (b file per record) to access a array of elements. If A[b's row of records] gets the array of a element is 1, then true, which is equal to 1, prints the record, otherwise the element is not obtained, false.
Method 3:
$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]==1 ' a B
$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]==1 ' a B
Description: Argind built-in variables, processing file identifiers, the first file is 1, the second file is 2. FileName is also a built-in variable that represents the name of the input file
Method 4:$ sort a B |uniq-d
Method 5:$ Grep-f A B
2) find different records (IBID., reverse)
$ Awk ' fnr==nr{a[$0];next}! ($ in a) ' a B
$ awk ' fnr==nr{a[$0]=1;next}!a[$0] ' a B
$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]!=1 ' a B
$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]!=1 ' a B
7
8
Method 2:$ sort a B |uniq-d
Method 3:$ GREP-VF A B
3. Merge two files
1) Merge D file sex into C file
$ cat C
Zhangsan 100
Lisi 200
WANGWU 300
$ cat D
Zhangsan Mans
Lisi woman
Method 1:$ awk ' Fnr==nr{a[$1]=$0;next}{print a[$1],$2} ' c D
Zhangsan
Lisi woman
Wangwu
Method 2:$ awk ' Fnr==nr{a[$1]=$0}nr>fnr{print a[$1],$2} ' c D
Description: Nr==fnr matches the first file, NR>FNR matches the second file, and sets an array subscript
Method 3:$ awk ' Argind==1{a[$1]=$0}argind==2{print a[$1],$2} ' c D
2) Merge the service names in the A.txt file into one IP
$ cat A.txt
192.168.2.100:httpd
192.168.2.100:tomcat
192.168.2.101:httpd
192.168.2.101:postfix
192.168.2.102:mysqld
192.168.2.102:httpd
$ awk-f:-vofs= ":" ' {a[$1]=a[$1] $2}end{for (i in a) print I,a[i]} ' a.txt
$ awk-f:-vofs= ":" ' {a[$1]=$2 a[$1]}end{for (i in a) print I,a[i]} ' a.txt
192.168.2.100:HTTPD Tomcat
192.168.2.101:HTTPD postfix
192.168.2.102:mysqld httpd
Description: A[$1]=$2 The first column is subscript, the second column is an element, followed by a[$1] is the array of a elements (service name) by the first row, the result is $1=$2, and as an array of elements.
3) Append the first line to the beginning of each line below
$ cat A.txt
Xiaoli
A 100
B 110
C 120
$ Awk ' nf==1{a=$0;next}{print a,$0} ' a.txt
$ Awk ' nf==1{a=$0}nf!=1{print a,$0} ' a.txt
Xiaoli a 100
Xiaoli B 110
Xiaoli C 120
4. Flashback column Print Text
$ cat A.txt
Xiaoli a 100
Xiaoli B 110
Xiaoli C 120
$ Awk ' {for (i=nf;i>=1;i--) {printf '%s ', $i}print s} ' a.txt
A Xiaoli
b Xiaoli
C-Xiaoli
$ Awk ' {for (i=nf;i>=1;i--) if (i==1) printf $i "\ n", else printf $i ""} ' A.txt
Description: Use NF descending output, the last field as the first output, and then self-subtract, print s or print "" Printing a line break
5. Print from the second column to the last
Method 1:$ awk ' {for (i=2;i<=nf;i++) if (I==NF) prin
TF $i "\ n"; else printf $i ""} ' A.txt
Method 2:$ awk ' {$1= ' "}{print} ' a.txt
A 100
B 110
C 120
6. Place the first column in the C file into the third column in the D file
$ cat C
A
B
C
$ cat D
1 One
2
3 Three
Method 1:$ awk ' fnr==nr{a[nr]=$0;next}{$3=a[fnr]}1 ' C D
Note: with NR number as subscript, element is per line, when processing D file the third column equals get a data fnr (re-count 1-3) number as subscript.
Method 2:$ awk ' {getline f< "C";p rint $0,f} ' d
1 One A
2 b
3 Three C
1) Replace the second column
$ Awk ' {getline f< "C"; Gsub ($2,f,$2)}1 ' d
1 A
2 b
3 C2) Replace two of the second column
$ Awk ' {getline f< "C"; Gsub ("A", f,$2)}1 ' d
1 One
2 b
3 Three
7. Sum of Numbers
Method 1:$ seq 1 |awk ' {sum+=$0}end{print sum} '
Method 2:$ awk ' Begin{sum=0;i=1;while (i<=100) {sum+=i;i++}print sum} '
Method 3:$ awk ' Begin{for (i=1;i<=100;i++) sum+=i}end{print sum} '/dev/null
Method 4:$ Seq-s + 1 |BC
8. Add a line break or content every three lines
Method 1:$ awk ' $; nr%3==0{printf "\ n"} ' a
Method 2:$ awk ' {print nr%3?$0:$0 ' \ n '} ' a
Method 3:$ sed ' 4~3s/^/\n/' a
9. String splitting
Method 1:
$ echo "Hello" |awk-f "{for (i=1;i<=nf;i++) print $i} '
$ echo "Hello" |awk-f "{i=1;while (I<=NF) {print $i; i++}} '
H
E
L
L
O
Linux awk use case summary