awk statistics Occurrences--go

Source: Internet
Author: User

Knowledge Points:

1) array

An array is a variable used to store a series of values that can be indexed to access the values of the array.

An array in awk is called an associative array because its subscript (index) can be a number or a string.

Subscripts are often called keys, and the keys and values of an array element are stored in a table inside the awk program, which takes the hash algorithm, so the array elements are randomly ordered.

Array format: Array[index]=value

1, nginx log analysis

Log format: ' $remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for" '

Logging: 27.189.231.39--[09/apr/2016:17:21:23 +0800] "get/public/index/images/icon_pre.png http/1.1" 44668 "/http Www.test.com/Public/index/css/global.css "" mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ""-"

1) The maximum number of 10 IPs accessed in the statistics log

Idea: The first column is de-weighed, and the number of times the output occurs

Method 1:$ awk ' {a[$1]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log

Method 2:$ awk ' {print $} ' access.log |sort |uniq-c |sort-k1-nr |head-n10

Description: a[$1]++ creates an array A, takes the first column as the subscript, uses the operator + + as the array element, and the element initial value is 0. When processing an IP, the subscript is IP, the element plus 1, processing the second IP, the subscript is IP, the element plus 1, if the IP already exists, then the element plus 1, that is, the IP appears two times, the element result is 2, and so on. Therefore, the weight can be achieved, the number of statistical occurrences.

2) More than 100 times of IP access in the statistics log

Method 1:$ awk ' {a[$1]++}end{for (i in a) {if (a[i]>100) print I,a[i]}} ' Access.log

Method 2:$ awk ' {a[$1]++;if (a[$1]>100) {b[$1]++}}end{for (i in B) {print i,a[i]}} ' Access.log

Description: Method 1 is an IP that is determined to be compliant when the output is saved after the result is stored in the a array. Method 2 is to save the result of the A array, and to determine the IP that meets the requirements in the B array, and finally print the IP B array.

3) Statistics of the maximum number of 10 IPs visited in a day April 9, 2016

Idea: First filter out the log of this time period, and then go to the weight, count the number of occurrences

Method 1:$ awk ' $4>= ' [9/apr/2016:00:00:01 ' && $4<= ' [9/apr/2016:23:59:59 ' {a[$1]++}end{for (i in a) print A[i],i |" Sort-k1-nr|head-n10 "} ' Access.log

Method 2:$ sed-n '/\[9\/apr\/2016:00:00:01/,/\[9\/apr\/2016:23:59:59/p ' access.log |sort |uniq-c |sort-k1-nr |head-n10 #前 Must exist in the start time and end time log

4) Count the number of visits a minute before the current time

Idea: First get the time before the current time corresponding to the log format, and then match the statistics

$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m); awk-vdate= $date ' $0~date{c++}end{print c} ' Access.log

$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m) awk-vdate= $date ' $4>= ' ["Date": "&& $4<=" ["Date": "{C + +} End{print C} ' Access.log

$ GREP-C $ (date-d '-1 minute ' +%d/%b/%y:%h:%m) access.log

Description: Date +%d/%b/%y:%h:%m-09/apr/2016:01:55

5) Statistics The top 10 pages visited ($request)

$ Awk ' {a[$7]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log

6) Count the total size of each URL access content ($body _bytes_sent)

$ Awk ' {a[$7]++;size[$7]+=$10}end{for (i in a) print a[i],size[i],i} ' Access.log

7) Count the number of each IP Access status code ($STATUS)

$ awk ' {a[$1 ' "" $9]++}end{for (i in a) print I,a[i]} ' Access.log

8) Statistics of Access status code 404 IP and the number of occurrences

$ Awk ' {if ($9~/404/) a[$1 "" $9]++}end{for (i in a) print I,a[i]} ' Access.log

2, two file comparison

The contents of the file are as follows:

$ Cat A

1

2

3

4

5

6

$ cat B

3

4

5

6

7

8

1) Find the same record

Method 1:$ awk ' Fnr==nr{a[$0];next} ($ in a) ' a B

3

4

5

6

Before explaining, look at the difference between Fnr and NR:

$ Awk ' {print nr,$0} ' a B

1 1

2 2

3 3

4 4

5 5

6 6

7 3

8 4

7 {

10 6

11 7

12 8

$ Awk ' {print fnr,$0} ' a B

1 1

2 2

7 ·

4 4

5 5

6 6

1 3

2 4

3 5

4 6

5 7

6 8

You can see that the Nr is processing a row of records, the number will be added 1, but also can be seen awk two files as a merged file processing.

While FNR is processing a row of records, the number is also added 1, but when the second file is processed, the number is counted again.

Description: Fnr and NR are built-in variables. FNR==NR is often used for processing two of files, an example of which awk treats two files as a file.

When processing a file, the FNR is equal to NR, the condition is true, the execution of the A[$0],next expression means that each record is stored in a array as subscript (no element), next is jumping out, similar to continue, does not execute the following expression.

The execution process and so on, until the processing of the B, FNR is not equal to NR (Fnr re-count is 1,nr continue plus 1 is 7), the condition is false, do not perform the following a[$0],next expression, directly execute ($ in a) expression, which means to process the first B file continues to determine whether in a array, If you are printing this record, and so on.

This may be better understood:

$ Awk ' fnr==nr{a[$0]}nr>fnr{if ($ in a) print $} ' a B

Method 2:

$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]) ' A B #小括号可以不加

$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]==1) ' a B

$ Awk ' fnr==nr{a[$0]=1;next}{if (a[$0]==1) print} ' a B

$ awk ' Fnr==nr{a[$0]=1}fnr!=nr&&a[$0]==1 ' a B

Note: First know that the following a[$0] is not an array, but rather a subscript (b file per record) to access a array of elements. If A[b's row of records] gets the array of a element is 1, then true, which is equal to 1, prints the record, otherwise the element is not obtained, false.

Method 3:

$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]==1 ' a B

$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]==1 ' a B

Description: Argind built-in variables, processing file identifiers, the first file is 1, the second file is 2. FileName is also a built-in variable that represents the name of the input file

Method 4:$ sort a B |uniq-d

Method 5:$ Grep-f A B

2) find different records (IBID., reverse)

$ Awk ' fnr==nr{a[$0];next}! ($ in a) ' a B

$ awk ' fnr==nr{a[$0]=1;next}!a[$0] ' a B

$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]!=1 ' a B

$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]!=1 ' a B

7

8

Method 2:$ sort a B |uniq-d

Method 3:$ GREP-VF A B

3. Merge two files

1) Merge D file sex into C file

$ cat C

Zhangsan 100

Lisi 200

WANGWU 300

$ cat D

Zhangsan Mans

Lisi woman

Method 1:$ awk ' Fnr==nr{a[$1]=$0;next}{print a[$1],$2} ' c D

Zhangsan

Lisi woman

Wangwu

Method 2:$ awk ' Fnr==nr{a[$1]=$0}nr>fnr{print a[$1],$2} ' c D

Description: Nr==fnr matches the first file, NR>FNR matches the second file, and sets an array subscript

Method 3:$ awk ' Argind==1{a[$1]=$0}argind==2{print a[$1],$2} ' c D

2) Merge the service names in the A.txt file into one IP

$ cat A.txt

192.168.2.100:httpd

192.168.2.100:tomcat

192.168.2.101:httpd

192.168.2.101:postfix

192.168.2.102:mysqld

192.168.2.102:httpd

$ awk-f:-vofs= ":" ' {a[$1]=a[$1] $2}end{for (i in a) print I,a[i]} ' a.txt

$ awk-f:-vofs= ":" ' {a[$1]=$2 a[$1]}end{for (i in a) print I,a[i]} ' a.txt

192.168.2.100:HTTPD Tomcat

192.168.2.101:HTTPD postfix

192.168.2.102:mysqld httpd

Description: A[$1]=$2 The first column is subscript, the second column is an element, followed by a[$1] is the array of a elements (service name) by the first row, the result is $1=$2, and as an array of elements.

3) Append the first line to the beginning of each line below

$ cat A.txt

Xiaoli

A 100

B 110

C 120

$ Awk ' nf==1{a=$0;next}{print a,$0} ' a.txt

$ Awk ' nf==1{a=$0}nf!=1{print a,$0} ' a.txt

Xiaoli a 100

Xiaoli B 110

Xiaoli C 120

4. Flashback column Print Text

$ cat A.txt

Xiaoli a 100

Xiaoli B 110

Xiaoli C 120

$ Awk ' {for (i=nf;i>=1;i--) {printf '%s ', $i}print s} ' a.txt

A Xiaoli

b Xiaoli

C-Xiaoli

$ Awk ' {for (i=nf;i>=1;i--) if (i==1) printf $i "\ n", else printf $i ""} ' A.txt

Description: Use NF descending output, the last field as the first output, and then self-subtract, print s or print "" Printing a line break

5. Print from the second column to the last

Method 1:$ awk ' {for (i=2;i<=nf;i++) if (I==NF) printf $i "\ n", else printf $i ""} ' A.txt

Method 2:$ awk ' {$1= ' "}{print} ' a.txt

A 100

B 110

C 120

6. Place the first column in the C file into the third column in the D file

$ cat C

A

B

C

$ cat D

1 One

2

3 Three

Method 1:$ awk ' fnr==nr{a[nr]=$0;next}{$3=a[fnr]}1 ' C D

Note: with NR number as subscript, element is per line, when processing D file the third column equals get a data fnr (re-count 1-3) number as subscript.

Method 2:$ awk ' {getline f< "C";p rint $0,f} ' d

1 One A

2 b

3 Three C

1) Replace the second column

$ Awk ' {getline f< "C"; Gsub ($2,f,$2)}1 ' d

1 A

2 b

3 C

2) Replace two of the second column

$ Awk ' {getline f< "C"; Gsub ("A", f,$2)}1 ' d

1 One

2 b

3 Three

7. Sum of Numbers

Method 1:$ seq 1 |awk ' {sum+=$0}end{print sum} '

Method 2:$ awk ' Begin{sum=0;i=1;while (i<=100) {sum+=i;i++}print sum} '

Method 3:$ awk ' Begin{for (i=1;i<=100;i++) sum+=i}end{print sum} '/dev/null

Method 4:$ Seq-s + 1 |BC

8. Add a line break or content every three lines

Method 1:$ awk ' $; nr%3==0{printf "\ n"} ' a

Method 2:$ awk ' {print nr%3?$0:$0 ' \ n '} ' a

Method 3:$ sed ' 4~3s/^/\n/' a

9. String splitting

Method 1:

$ echo "Hello" |awk-f "{for (i=1;i<=nf;i++) print $i} '

$ echo "Hello" |awk-f "{i=1;while (I<=NF) {print $i; i++}} '

H

E

L

L

O

Method 2:

$ echo "Hello" |awk ' {split ($0,a, "" "); for (I-in a) print A[i]} ' #无序

L

O

H

E

L

10. Number of occurrences of each letter in the statistics string

$ echo A,B.C.A,B.A |tr "[,.]" "\ n" |awk-f "{for (i=1;i<=nf;i++) a[$i]++}end{for (i in a) print i,a[i]|" Sort-k2-rn "}"

A 3

B 2

C 1

11. Sort the first column

$ Awk ' {A[nr]=$1}end{s=asort (a, b); for (i=1;i<=s;i++) {print i,b[i]}} ' A.txt

Description: Take each line number as the subscript value and drop the a array value into the array b,a, and assign the Asort default return value (original a array length) to S, using the For loop less than S line number, starting from 1 to the array length to print the sorted array.

12, delete duplicate rows, the order is unchanged

$ awk '!a[$0]++ ' file

Blog Address: http://lizhenliang.blog.51cto.com

13. Delete the specified line

Delete the first line:

$ Awk ' Nr==1{next}{print $} ' file #$0 can be omitted

$ awk ' Nr!=1{print} ' file

$ sed ' 1d ' file

$ Sed-n ' 1!p ' file

14. Add a line before and after the specified line

Add txt to the previous line in the second row:

$ Awk ' nr==2{sub ('/.*/', "txt\n&")}{print} ' A.txt

$ sed ' 2s/.*/txt\n&/' a.txt

After the second line, add the TXT line:

$ Awk ' nr==2{sub ('/.*/', "&\ntxt")}{print} ' A.txt

$ sed ' 2s/.*/&\ntxt/' a.txt

15. Get the NIC name via IP

$ ifconfig |awk-f ' [:] '/^eth/{nic=$1}/192.168.18.15/{print nic} '

16. Floating-point arithmetic (number 46 keeps the decimal point)

$ Awk ' begin{print 46/100} '

$ awk ' begin{printf '%.2f\n ", 46/100} '

$ echo 46|awk ' {print $0/100} '

$ Echo ' scale=2;46/100 ' |bc|sed ' s/^/0/'

$ printf "%.2f\n" $ (echo "scale=2;46/100" |BC)

Results: 0.46

17. Replace the newline character with a comma

$ cat A.txt

1

2

3

After substitution:

Method 1:

$ Awk ' {s= (s?s "," $0:$0)}end{print s} ' a.txt

Description: The three mesh operator (A?B:C), the first s is a variable, s?s "," $0:$0, the first processing 1 o'clock, the s variable is not assigned the initial value is 0,0 false, the result prints 1, the second processing 2 o'clock, the S value is 1, is true, the result 1, 2. And so on, the parentheses may not be written.

Method 2:

$ Tr ' \ n ', ' < a.txt

Method 3:

$ sed ': A; N;s/\n/,/;$!b a ' a.txt

Description: First label A, first read the first row of records 1 append to the mode space, the mode space content is 1$, the execution $!b ($! last line does not jump, B is the control Flow jump command) jump to a tag, continue to read the second row record 2 append to the pattern space, because the n command, each record with a newline character (\ N) split, at this time the mode space content is 1\n2$, execute to replace the newline character with the comma command, continue to jump to a tag ...

Method 4:

$ sed ': a;$! N;s/\n/,/;t a ' a.txt

Description: Similar to the above, where T is a test command, when the previous command (replace) execution succeeds before jumping.

Method 5:

$ Awk ' {if ($0!=3) printf "%s,", $0;else print $} ' a.txt

Description: 3 is the last number of text

Method 6:

while read line; Do

a+= ($line)

Done < A.txt

echo ${a[*]} |sed ' s//,/g '

Description: Put each row into an array and replace

18. Remove the odd line break

$ cat B.txt

String

Number

A

1

B

2

$ Awk ' ors=nr%2? ' \ t ":" \ n "' B.txt #把奇数行换行符去掉

$ xargs-n2 < A.txt #将两个字段作为一行

String number

A 1

B 2

19. Cost statistics

$ cat A.txt

Number of name charges

Zhangsan 8000 1

Zhangsan 5000 1

Lisi 1000 1

Lisi 2000 1

WANGWU 1500 1

Zhaoliu 6000 1

Zhaoliu 2000 1

Zhaoliu 3000 1

Statistics per person Total cost, total quantity:

$ Awk ' {name[$1]++;number[$1]+=$3;money[$1]+=$2}end{for (i in name) print I,number[i],money[i]} ' a.txt

Zhaoliu 3 11000

Zhangsan 2 13000

WANGWU 1 1500

Lisi 2 3000

20. Printing multiplication Formulas

Method 1:

$ Awk ' begin{for (n=0;n++<9;) {for (i=0;i++<n;) printf i "x" n "=" I*n "";p Rint ""}} '

1x1=1

1x2=2 2x2=4

1x3=3 2x3=6 3x3=9

1x4=4 2x4=8 3x4=12 4x4=16

1x5=5 2x5=10 3x5=15 4x5=20 5x5=25

1x6=6 2x6=12 3x6=18 4x6=24 5x6=30 6x6=36

1x7=7 2x7=14 3x7=21 4x7=28 5x7=35 6x7=42 7x7=49

1x8=8 2x8=16 3x8=24 4x8=32 5x8=40 6x8=48 7x8=56 8x8=64

1x9=9 2x9=18 3x9=27 4x9=36 5x9=45 6x9=54 7x9=63 8x9=72 9x9=81

Method 2:

#!/bin/bash

For ((i=1;i<=9;i++)); Do

For ((j=1;j<=i;j++)); Do

result=$ (($i * $j))

#let "Result=i*j"

Echo-n "$i * $j = $result"

Done

Echo

Done

21. Print only odd or even lines

Print odd lines:

Method 1:

$ seq 1 5 |awk ' i=!i '

Description: First know that for numeric operations, undefined variable initial value is 0, for character operations, undefined variable initial value is an empty string.

Reads the first row of records and then makes pattern matching, I is undefined variable, that is, i=!0,! the inverse meaning. The right side of the exclamation point is a Boolean value, 0 or the empty string is false, not 0 or non-empty string is True,!0 is true, so i=1, the condition is true to print the first record.

Why would print be printed without print? Because there is no action behind the pattern, the entire record is printed by default.

Read the second row of records, pattern matching, because the last I value from 0 to 1, this is i=!1, the condition is false not to print.

Read the third row of records, because the last condition is false, I restore the initial value is 0, continue printing. etc...

As can be seen, the operation is not judged by the record, but the use of Boolean value of True and false judgment.

Method 2:

$ seq 1 5 |awk ' nr%2!=0 '

Method 3:

$ seq 1 5 |sed-n ' 1~2p '

Description: Step, print once on every other line

Method 4:

$ seq 1 5 |sed-n ' p;n '

Note: The first line is printed, the n command reads the next line of the current line 2, put into the pattern space, and then there is no print mode space row operation, so only save does not print, the same way continue to print the third row.

1

3

5

Print even lines:

$ seq 1 5 |awk '! (i=!i) '

$ seq 1 5 |awk ' nr%2==0 '

$ seq 1 5 |sed-n ' 0~2p '

$ seq 1 5 |sed-n ' n;p '

Description: Reads the next line of the current line 2, puts it into the pattern space, uses the P command to print the line of the pattern space, and outputs 2.

awk statistics Occurrences--go

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.