Linux awk use case summary

Source: Internet
Author: User

Knowledge Points:

1) array

An array is a variable used to store a series of values that can be indexed to access the values of the array.

An array in awk is called an associative array because its subscript (index) can be a number or a string.

Subscripts are often called keys, and the keys and values of an array element are stored in a table inside the awk program, which takes the hash algorithm, so the array elements are randomly ordered.

Array format: Array[index]=value

1, nginx log analysis

Log format: ' $remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for" '

Logging: 27.189.231.39--[09/apr/2016:17:21:23 +0800] "get/public/index/images/icon_pre.png http/1.1" 44668 "/http Www.test.com/Public/index/css/global.css "" mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ""-"

1) The maximum number of 10 IPs accessed in the statistics log

Idea: The first column is de-weighed, and the number of times the output occurs

Method 1:$ awk ' {a[$1]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log

Method 2:$ awk ' {print $} ' access.log |sort |uniq-c |sort-k1-nr |head-n10

Description: a[$1]++ creates an array A, takes the first column as the subscript, uses the operator + + as the array element, and the element initial value is 0. When processing an IP, the subscript is IP, the element plus 1, processing the second IP, the subscript is IP, the element plus 1, if the IP already exists, then the element plus 1, that is, the IP appears two times, the element result is 2, and so on. Therefore, the weight can be achieved, the number of statistical occurrences.

2) More than 100 times of IP access in the statistics log

Method 1:$ awk ' {a[$1]++}end{for (i in a) {if (a[i]>100) print I,a[i]}} ' Access.log

Method 2:$ awk ' {a[$1]++;if (a[$1]>100) {b[$1]++}}end{for (i in B) {print i,a[i]}} ' Access.log

Description: Method 1 is an IP that is determined to be compliant when the output is saved after the result is stored in the a array. Method 2 is to save the result of the A array, and to determine the IP that meets the requirements in the B array, and finally print the IP B array.

3) Statistics of the maximum number of 10 IPs visited in a day April 9, 2016

Idea: First filter out the log of this time period, and then go to the weight, count the number of occurrences

Method 1:$ awk ' $4>= ' [9/apr/2016:00:00:01 ' && $4<= ' [9/apr/2016:23:59:59 ' {a[$1]++}end{for (i in a) print A[i],i |" Sort-k1-nr|head-n10 "} ' Access.log

Method 2:$ sed-n '/\[9\/apr\/2016:00:00:01/,/\[9\/apr\/2016:23:59:59/p ' access.log |sort |uniq-c |sort-k1-nr |head-n10 #前提 Must exist in start time and end time log

4) Count the number of visits a minute before the current time

Idea: First get the time before the current time corresponding to the log format, and then match the statistics

$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m); awk-vdate= $date ' $0~date{c++}end{print c} ' Access.log

$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m) awk-vdate= $date ' $4>= ' ["Date": "&& $4<=" ["Date": "{C + +} End{print C} ' Access.log

$ GREP-C $ (date-d '-1 minute ' +%d/%b/%y:%h:%m) access.log


Description: Date +%d/%b/%y:%h:%m-09/apr/2016:01:55

5) Statistics The top 10 pages visited ($request)

$ Awk ' {a[$7]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log

6) Count the total size of each URL access content ($body _bytes_sent)

$ Awk ' {a[$7]++;size[$7]+=$10}end{for (i in a) print a[i],size[i],i} ' Access.log

7) Statistics per IP Access status code number 216.WWW. Qixoo.com ($status)

$ awk ' {a[$1 ' "" $9]++}end{for (i in a) print I,a[i]} ' Access.log

8) Statistics of Access status code 404 IP and the number of occurrences

$ Awk ' {if ($9~/404/) a[$1 "" $9]++}end{for (i in a) print I,a[i]} ' Access.log

2, two file comparison

The contents of the file are as follows:

$ Cat A

1

2

3

4

5

6

$ cat B

3

4

5

6

7

8


1) Find the same record

Method 1:www.51969.com/$ awk ' Fnr==nr{a[$0];next} ($ in a) ' a B

3

4

5

6

Before explaining, look at the difference between Fnr and NR:

$ Awk ' {print nr,$0} ' a B

1 1

2 2

3 3

4 4

5 5

6 6

9}

8 4

7 {

10 6

11 7

12 8

$ Awk ' {print fnr,$0} ' a B

1 1

2 2

3 3

4 4

5 5

6 6

1 3

2 4

3 5

4 6

5 7

6 8

You can see that the Nr is processing a row of records, the number will be added 1, but also can be seen awk two files as a merged file processing.

While FNR is processing a row of records, the number is also added 1, but when the second file is processed, the number is counted again.

Description: Fnr and NR are built-in variables. FNR==NR is often used for processing two of files, an example of which awk treats two files as a file.

When processing a file, the FNR is equal to NR, the condition is true, the execution of the A[$0],next expression means that each record is stored in a array as subscript (no element), next is jumping out, similar to continue, does not execute the following expression.

The execution process and so on, until the processing of the B, FNR is not equal to NR (Fnr re-count is 1,nr continue plus 1 is 7), the condition is false, do not perform the following a[$0],next expression, directly execute ($ in a) expression, which means to process the first B file continues to determine whether in a array, If you are printing this record, and so on.

This may be better understood:

$ Awk ' fnr==nr{a[$0]}nr>fnr{if ($ in a) print $} ' a B

Method 2:

$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]) ' A B #小括号可以不加

$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]==1) ' A b$ awk ' fnr==nr{a[$0]=1;next}{if (a[$0]==1) print} ' a B

$ awk ' Fnr==nr{a[$0]=1}fnr!=nr&&a[$0]==1 ' a B

Note: First know that the following a[$0] is not an array, but rather a subscript (b file per record) to access a array of elements. If A[b's row of records] gets the array of a element is 1, then true, which is equal to 1, prints the record, otherwise the element is not obtained, false.

Method 3:


$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]==1 ' a B

$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]==1 ' a B


Description: Argind built-in variables, processing file identifiers, the first file is 1, the second file is 2. FileName is also a built-in variable that represents the name of the input file

Method 4:$ sort a B |uniq-d

Method 5:$ Grep-f A B

2) find different records (IBID., reverse)

$ Awk ' fnr==nr{a[$0];next}! ($ in a) ' a B

$ awk ' fnr==nr{a[$0]=1;next}!a[$0] ' a B

$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]!=1 ' a B

$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]!=1 ' a B

7

8

Method 2:$ sort a B |uniq-d

Method 3:$ GREP-VF A B


3. Merge two files

1) Merge D file sex into C file

$ cat C

Zhangsan 100

Lisi 200

WANGWU 300

$ cat D

Zhangsan Mans

Lisi woman

Method 1:$ awk ' Fnr==nr{a[$1]=$0;next}{print a[$1],$2} ' c D

Zhangsan

Lisi woman

Wangwu


Method 2:$ awk ' Fnr==nr{a[$1]=$0}nr>fnr{print a[$1],$2} ' c D

Description: Nr==fnr matches the first file, NR>FNR matches the second file, and sets an array subscript

Method 3:$ awk ' Argind==1{a[$1]=$0}argind==2{print a[$1],$2} ' c D

2) Merge the service names in the A.txt file into one IP

$ cat A.txt

192.168.2.100:httpd

192.168.2.100:tomcat

192.168.2.101:httpd

192.168.2.101:postfix

192.168.2.102:mysqld

192.168.2.102:httpd

$ awk-f:-vofs= ":" ' {a[$1]=a[$1] $2}end{for (i in a) print I,a[i]} ' a.txt

$ awk-f:-vofs= ":" ' {a[$1]=$2 a[$1]}end{for (i in a) print I,a[i]} ' a.txt

192.168.2.100:HTTPD Tomcat

192.168.2.101:HTTPD postfix

192.168.2.102:mysqld httpd


Description: A[$1]=$2 The first column is subscript, the second column is an element, followed by a[$1] is the array of a elements (service name) by the first row, the result is $1=$2, and as an array of elements.

3) Append the first line to the beginning of each line below

$ cat A.txt

Xiaoli

A 100

B 110

C 120

$ Awk ' nf==1{a=$0;next}{print a,$0} ' a.txt

$ Awk ' nf==1{a=$0}nf!=1{print a,$0} ' a.txt

Xiaoli a 100

Xiaoli B 110

Xiaoli C 120


4. Flashback column Print Text

$ cat A.txt

Xiaoli a 100

Xiaoli B 110

Xiaoli C 120

$ Awk ' {for (i=nf;i>=1;i--) {printf '%s ', $i}print s} ' a.txt

A Xiaoli

b Xiaoli

C-Xiaoli

$ Awk ' {for (i=nf;i>=1;i--) if (i==1) printf $i "\ n", else printf $i ""} ' A.txt


Description: Use NF descending output, the last field as the first output, and then self-subtract, print s or print "" Printing a line break

5. Print from the second column to the last

Method 1:$ awk ' {for (i=2;i<=nf;i++) if (I==NF) prin

TF $i "\ n"; else printf $i ""} ' A.txt

Method 2:$ awk ' {$1= ' "}{print} ' a.txt

A 100

B 110

C 120


6. Place the first column in the C file into the third column in the D file

$ cat C

A

B

C

$ cat D

1 One

2

3 Three


Method 1:$ awk ' fnr==nr{a[nr]=$0;next}{$3=a[fnr]}1 ' C D

Note: with NR number as subscript, element is per line, when processing D file the third column equals get a data fnr (re-count 1-3) number as subscript.

Method 2:$ awk ' {getline f< "C";p rint $0,f} ' d

1 One A

2 b

3 Three C


1) Replace the second column

$ Awk ' {getline f< "C"; Gsub ($2,f,$2)}1 ' d

1 A

2 b

3 C2) Replace two of the second column


$ Awk ' {getline f< "C"; Gsub ("A", f,$2)}1 ' d

1 One

2 b

3 Three


7. Sum of Numbers

Method 1:$ seq 1 |awk ' {sum+=$0}end{print sum} '

Method 2:$ awk ' Begin{sum=0;i=1;while (i<=100) {sum+=i;i++}print sum} '

Method 3:$ awk ' Begin{for (i=1;i<=100;i++) sum+=i}end{print sum} '/dev/null

Method 4:$ Seq-s + 1 |BC

8. Add a line break or content every three lines

Method 1:$ awk ' $; nr%3==0{printf "\ n"} ' a

Method 2:$ awk ' {print nr%3?$0:$0 ' \ n '} ' a

Method 3:$ sed ' 4~3s/^/\n/' a

9. String splitting

Method 1:

$ echo "Hello" |awk-f "{for (i=1;i<=nf;i++) print $i} '

$ echo "Hello" |awk-f "{i=1;while (I<=NF) {print $i; i++}} '

H

E

L

L

O

Linux awk use case summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.