Knowledge Points:
1) array
An array is a variable used to store a series of values that can be indexed to access the values of the array.
An array in awk is called an associative array because its subscript (index) can be a number or a string.
Subscripts are often called keys, and the keys and values of an array element are stored in a table inside the awk program, which takes the hash algorithm, so the array elements are randomly ordered.
Array format: Array[index]=value
1, nginx log analysis
Log format: ' $remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for" '
Logging: 27.189.231.39--[09/apr/2016:17:21:23 +0800] "get/public/index/images/icon_pre.png http/1.1" 44668 "/http Www.test.com/Public/index/css/global.css "" mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ""-"
1) The maximum number of 10 IPs accessed in the statistics log
Idea: The first column is de-weighed, and the number of times the output occurs
Method 1:$ awk ' {a[$1]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log
Method 2:$ awk ' {print $} ' access.log |sort |uniq-c |sort-k1-nr |head-n10
Description: a[$1]++ creates an array A, takes the first column as the subscript, uses the operator + + as the array element, and the element initial value is 0. When processing an IP, the subscript is IP, the element plus 1, processing the second IP, the subscript is IP, the element plus 1, if the IP already exists, then the element plus 1, that is, the IP appears two times, the element result is 2, and so on. Therefore, the weight can be achieved, the number of statistical occurrences.
2) More than 100 times of IP access in the statistics log
Method 1:$ awk ' {a[$1]++}end{for (i in a) {if (a[i]>100) print I,a[i]}} ' Access.log
Method 2:$ awk ' {a[$1]++;if (a[$1]>100) {b[$1]++}}end{for (i in B) {print i,a[i]}} ' Access.log
Description: Method 1 is an IP that is determined to be compliant when the output is saved after the result is stored in the a array. Method 2 is to save the result of the A array, and to determine the IP that meets the requirements in the B array, and finally print the IP B array.
3) Statistics of the maximum number of 10 IPs visited in a day April 9, 2016
Idea: First filter out the log of this time period, and then go to the weight, count the number of occurrences
Method 1:$ awk ' $4>= ' [9/apr/2016:00:00:01 ' && $4<= ' [9/apr/2016:23:59:59 ' {a[$1]++}end{for (i in a) print A[i],i |" Sort-k1-nr|head-n10 "} ' Access.log
Method 2:$ sed-n '/\[9\/apr\/2016:00:00:01/,/\[9\/apr\/2016:23:59:59/p ' access.log |sort |uniq-c |sort-k1-nr |head-n10 #前 Must exist in the start time and end time log
4) Count the number of visits a minute before the current time
Idea: First get the time before the current time corresponding to the log format, and then match the statistics
$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m); awk-vdate= $date ' $0~date{c++}end{print c} ' Access.log
$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m) awk-vdate= $date ' $4>= ' ["Date": "&& $4<=" ["Date": "{C + +} End{print C} ' Access.log
$ GREP-C $ (date-d '-1 minute ' +%d/%b/%y:%h:%m) access.log
Description: Date +%d/%b/%y:%h:%m-09/apr/2016:01:55
5) Statistics The top 10 pages visited ($request)
$ Awk ' {a[$7]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log
6) Count the total size of each URL access content ($body _bytes_sent)
$ Awk ' {a[$7]++;size[$7]+=$10}end{for (i in a) print a[i],size[i],i} ' Access.log
7) Count the number of each IP Access status code ($STATUS)
$ awk ' {a[$1 ' "" $9]++}end{for (i in a) print I,a[i]} ' Access.log
8) Statistics of Access status code 404 IP and the number of occurrences
$ Awk ' {if ($9~/404/) a[$1 "" $9]++}end{for (i in a) print I,a[i]} ' Access.log
2, two file comparison
The contents of the file are as follows:
$ Cat A
1
2
3
4
5
6
$ cat B
3
4
5
6
7
8
1) Find the same record
Method 1:$ awk ' Fnr==nr{a[$0];next} ($ in a) ' a B
3
4
5
6
Before explaining, look at the difference between Fnr and NR:
$ Awk ' {print nr,$0} ' a B
1 1
2 2
3 3
4 4
5 5
6 6
7 3
8 4
7 {
10 6
11 7
12 8
$ Awk ' {print fnr,$0} ' a B
1 1
2 2
7 ·
4 4
5 5
6 6
1 3
2 4
3 5
4 6
5 7
6 8
You can see that the Nr is processing a row of records, the number will be added 1, but also can be seen awk two files as a merged file processing.
While FNR is processing a row of records, the number is also added 1, but when the second file is processed, the number is counted again.
Description: Fnr and NR are built-in variables. FNR==NR is often used for processing two of files, an example of which awk treats two files as a file.
When processing a file, the FNR is equal to NR, the condition is true, the execution of the A[$0],next expression means that each record is stored in a array as subscript (no element), next is jumping out, similar to continue, does not execute the following expression.
The execution process and so on, until the processing of the B, FNR is not equal to NR (Fnr re-count is 1,nr continue plus 1 is 7), the condition is false, do not perform the following a[$0],next expression, directly execute ($ in a) expression, which means to process the first B file continues to determine whether in a array, If you are printing this record, and so on.
This may be better understood:
$ Awk ' fnr==nr{a[$0]}nr>fnr{if ($ in a) print $} ' a B
Method 2:
$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]) ' A B #小括号可以不加
$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]==1) ' a B
$ Awk ' fnr==nr{a[$0]=1;next}{if (a[$0]==1) print} ' a B
$ awk ' Fnr==nr{a[$0]=1}fnr!=nr&&a[$0]==1 ' a B
Note: First know that the following a[$0] is not an array, but rather a subscript (b file per record) to access a array of elements. If A[b's row of records] gets the array of a element is 1, then true, which is equal to 1, prints the record, otherwise the element is not obtained, false.
Method 3:
$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]==1 ' a B
$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]==1 ' a B
Description: Argind built-in variables, processing file identifiers, the first file is 1, the second file is 2. FileName is also a built-in variable that represents the name of the input file
Method 4:$ sort a B |uniq-d
Method 5:$ Grep-f A B
2) find different records (IBID., reverse)
$ Awk ' fnr==nr{a[$0];next}! ($ in a) ' a B
$ awk ' fnr==nr{a[$0]=1;next}!a[$0] ' a B
$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]!=1 ' a B
$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]!=1 ' a B
7
8
Method 2:$ sort a B |uniq-d
Method 3:$ GREP-VF A B
3. Merge two files
1) Merge D file sex into C file
$ cat C
Zhangsan 100
Lisi 200
WANGWU 300
$ cat D
Zhangsan Mans
Lisi woman
Method 1:$ awk ' Fnr==nr{a[$1]=$0;next}{print a[$1],$2} ' c D
Zhangsan
Lisi woman
Wangwu
Method 2:$ awk ' Fnr==nr{a[$1]=$0}nr>fnr{print a[$1],$2} ' c D
Description: Nr==fnr matches the first file, NR>FNR matches the second file, and sets an array subscript
Method 3:$ awk ' Argind==1{a[$1]=$0}argind==2{print a[$1],$2} ' c D
2) Merge the service names in the A.txt file into one IP
$ cat A.txt
192.168.2.100:httpd
192.168.2.100:tomcat
192.168.2.101:httpd
192.168.2.101:postfix
192.168.2.102:mysqld
192.168.2.102:httpd
$ awk-f:-vofs= ":" ' {a[$1]=a[$1] $2}end{for (i in a) print I,a[i]} ' a.txt
$ awk-f:-vofs= ":" ' {a[$1]=$2 a[$1]}end{for (i in a) print I,a[i]} ' a.txt
192.168.2.100:HTTPD Tomcat
192.168.2.101:HTTPD postfix
192.168.2.102:mysqld httpd
Description: A[$1]=$2 The first column is subscript, the second column is an element, followed by a[$1] is the array of a elements (service name) by the first row, the result is $1=$2, and as an array of elements.
3) Append the first line to the beginning of each line below
$ cat A.txt
Xiaoli
A 100
B 110
C 120
$ Awk ' nf==1{a=$0;next}{print a,$0} ' a.txt
$ Awk ' nf==1{a=$0}nf!=1{print a,$0} ' a.txt
Xiaoli a 100
Xiaoli B 110
Xiaoli C 120
4. Flashback column Print Text
$ cat A.txt
Xiaoli a 100
Xiaoli B 110
Xiaoli C 120
$ Awk ' {for (i=nf;i>=1;i--) {printf '%s ', $i}print s} ' a.txt
A Xiaoli
b Xiaoli
C-Xiaoli
$ Awk ' {for (i=nf;i>=1;i--) if (i==1) printf $i "\ n", else printf $i ""} ' A.txt
Description: Use NF descending output, the last field as the first output, and then self-subtract, print s or print "" Printing a line break
5. Print from the second column to the last
Method 1:$ awk ' {for (i=2;i<=nf;i++) if (I==NF) printf $i "\ n", else printf $i ""} ' A.txt
Method 2:$ awk ' {$1= ' "}{print} ' a.txt
A 100
B 110
C 120
6. Place the first column in the C file into the third column in the D file
$ cat C
A
B
C
$ cat D
1 One
2
3 Three
Method 1:$ awk ' fnr==nr{a[nr]=$0;next}{$3=a[fnr]}1 ' C D
Note: with NR number as subscript, element is per line, when processing D file the third column equals get a data fnr (re-count 1-3) number as subscript.
Method 2:$ awk ' {getline f< "C";p rint $0,f} ' d
1 One A
2 b
3 Three C
1) Replace the second column
$ Awk ' {getline f< "C"; Gsub ($2,f,$2)}1 ' d
1 A
2 b
3 C
2) Replace two of the second column
$ Awk ' {getline f< "C"; Gsub ("A", f,$2)}1 ' d
1 One
2 b
3 Three
7. Sum of Numbers
Method 1:$ seq 1 |awk ' {sum+=$0}end{print sum} '
Method 2:$ awk ' Begin{sum=0;i=1;while (i<=100) {sum+=i;i++}print sum} '
Method 3:$ awk ' Begin{for (i=1;i<=100;i++) sum+=i}end{print sum} '/dev/null
Method 4:$ Seq-s + 1 |BC
8. Add a line break or content every three lines
Method 1:$ awk ' $; nr%3==0{printf "\ n"} ' a
Method 2:$ awk ' {print nr%3?$0:$0 ' \ n '} ' a
Method 3:$ sed ' 4~3s/^/\n/' a
9. String splitting
Method 1:
$ echo "Hello" |awk-f "{for (i=1;i<=nf;i++) print $i} '
$ echo "Hello" |awk-f "{i=1;while (I<=NF) {print $i; i++}} '
H
E
L
L
O
Method 2:
$ echo "Hello" |awk ' {split ($0,a, "" "); for (I-in a) print A[i]} ' #无序
L
O
H
E
L
10. Number of occurrences of each letter in the statistics string
$ echo A,B.C.A,B.A |tr "[,.]" "\ n" |awk-f "{for (i=1;i<=nf;i++) a[$i]++}end{for (i in a) print i,a[i]|" Sort-k2-rn "}"
A 3
B 2
C 1
11. Sort the first column
$ Awk ' {A[nr]=$1}end{s=asort (a, b); for (i=1;i<=s;i++) {print i,b[i]}} ' A.txt
Description: Take each line number as the subscript value and drop the a array value into the array b,a, and assign the Asort default return value (original a array length) to S, using the For loop less than S line number, starting from 1 to the array length to print the sorted array.
12, delete duplicate rows, the order is unchanged
$ awk '!a[$0]++ ' file
Blog Address: http://lizhenliang.blog.51cto.com
13. Delete the specified line
Delete the first line:
$ Awk ' Nr==1{next}{print $} ' file #$0 can be omitted
$ awk ' Nr!=1{print} ' file
$ sed ' 1d ' file
$ Sed-n ' 1!p ' file
14. Add a line before and after the specified line
Add txt to the previous line in the second row:
$ Awk ' nr==2{sub ('/.*/', "txt\n&")}{print} ' A.txt
$ sed ' 2s/.*/txt\n&/' a.txt
After the second line, add the TXT line:
$ Awk ' nr==2{sub ('/.*/', "&\ntxt")}{print} ' A.txt
$ sed ' 2s/.*/&\ntxt/' a.txt
15. Get the NIC name via IP
$ ifconfig |awk-f ' [:] '/^eth/{nic=$1}/192.168.18.15/{print nic} '
16. Floating-point arithmetic (number 46 keeps the decimal point)
$ Awk ' begin{print 46/100} '
$ awk ' begin{printf '%.2f\n ", 46/100} '
$ echo 46|awk ' {print $0/100} '
$ Echo ' scale=2;46/100 ' |bc|sed ' s/^/0/'
$ printf "%.2f\n" $ (echo "scale=2;46/100" |BC)
Results: 0.46
17. Replace the newline character with a comma
$ cat A.txt
1
2
3
After substitution:
Method 1:
$ Awk ' {s= (s?s "," $0:$0)}end{print s} ' a.txt
Description: The three mesh operator (A?B:C), the first s is a variable, s?s "," $0:$0, the first processing 1 o'clock, the s variable is not assigned the initial value is 0,0 false, the result prints 1, the second processing 2 o'clock, the S value is 1, is true, the result 1, 2. And so on, the parentheses may not be written.
Method 2:
$ Tr ' \ n ', ' < a.txt
Method 3:
$ sed ': A; N;s/\n/,/;$!b a ' a.txt
Description: First label A, first read the first row of records 1 append to the mode space, the mode space content is 1$, the execution $!b ($! last line does not jump, B is the control Flow jump command) jump to a tag, continue to read the second row record 2 append to the pattern space, because the n command, each record with a newline character (\ N) split, at this time the mode space content is 1\n2$, execute to replace the newline character with the comma command, continue to jump to a tag ...
Method 4:
$ sed ': a;$! N;s/\n/,/;t a ' a.txt
Description: Similar to the above, where T is a test command, when the previous command (replace) execution succeeds before jumping.
Method 5:
$ Awk ' {if ($0!=3) printf "%s,", $0;else print $} ' a.txt
Description: 3 is the last number of text
Method 6:
while read line; Do
a+= ($line)
Done < A.txt
echo ${a[*]} |sed ' s//,/g '
Description: Put each row into an array and replace
18. Remove the odd line break
$ cat B.txt
String
Number
A
1
B
2
$ Awk ' ors=nr%2? ' \ t ":" \ n "' B.txt #把奇数行换行符去掉
$ xargs-n2 < A.txt #将两个字段作为一行
String number
A 1
B 2
19. Cost statistics
$ cat A.txt
Number of name charges
Zhangsan 8000 1
Zhangsan 5000 1
Lisi 1000 1
Lisi 2000 1
WANGWU 1500 1
Zhaoliu 6000 1
Zhaoliu 2000 1
Zhaoliu 3000 1
Statistics per person Total cost, total quantity:
$ Awk ' {name[$1]++;number[$1]+=$3;money[$1]+=$2}end{for (i in name) print I,number[i],money[i]} ' a.txt
Zhaoliu 3 11000
Zhangsan 2 13000
WANGWU 1 1500
Lisi 2 3000
20. Printing multiplication Formulas
Method 1:
$ Awk ' begin{for (n=0;n++<9;) {for (i=0;i++<n;) printf i "x" n "=" I*n "";p Rint ""}} '
1x1=1
1x2=2 2x2=4
1x3=3 2x3=6 3x3=9
1x4=4 2x4=8 3x4=12 4x4=16
1x5=5 2x5=10 3x5=15 4x5=20 5x5=25
1x6=6 2x6=12 3x6=18 4x6=24 5x6=30 6x6=36
1x7=7 2x7=14 3x7=21 4x7=28 5x7=35 6x7=42 7x7=49
1x8=8 2x8=16 3x8=24 4x8=32 5x8=40 6x8=48 7x8=56 8x8=64
1x9=9 2x9=18 3x9=27 4x9=36 5x9=45 6x9=54 7x9=63 8x9=72 9x9=81
Method 2:
#!/bin/bash
For ((i=1;i<=9;i++)); Do
For ((j=1;j<=i;j++)); Do
result=$ (($i * $j))
#let "Result=i*j"
Echo-n "$i * $j = $result"
Done
Echo
Done
21. Print only odd or even lines
Print odd lines:
Method 1:
$ seq 1 5 |awk ' i=!i '
Description: First know that for numeric operations, undefined variable initial value is 0, for character operations, undefined variable initial value is an empty string.
Reads the first row of records and then makes pattern matching, I is undefined variable, that is, i=!0,! the inverse meaning. The right side of the exclamation point is a Boolean value, 0 or the empty string is false, not 0 or non-empty string is True,!0 is true, so i=1, the condition is true to print the first record.
Why would print be printed without print? Because there is no action behind the pattern, the entire record is printed by default.
Read the second row of records, pattern matching, because the last I value from 0 to 1, this is i=!1, the condition is false not to print.
Read the third row of records, because the last condition is false, I restore the initial value is 0, continue printing. etc...
As can be seen, the operation is not judged by the record, but the use of Boolean value of True and false judgment.
Method 2:
$ seq 1 5 |awk ' nr%2!=0 '
Method 3:
$ seq 1 5 |sed-n ' 1~2p '
Description: Step, print once on every other line
Method 4:
$ seq 1 5 |sed-n ' p;n '
Note: The first line is printed, the n command reads the next line of the current line 2, put into the pattern space, and then there is no print mode space row operation, so only save does not print, the same way continue to print the third row.
1
3
5
Print even lines:
$ seq 1 5 |awk '! (i=!i) '
$ seq 1 5 |awk ' nr%2==0 '
$ seq 1 5 |sed-n ' 0~2p '
$ seq 1 5 |sed-n ' n;p '
Description: Reads the next line of the current line 2, puts it into the pattern space, uses the P command to print the line of the pattern space, and outputs 2.
awk statistics Occurrences--go