Linux awk use case summary

Last Update:2017-02-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Knowledge Points:

1) array

An array is a variable used to store a series of values that can be indexed to access the values of the array.

An array in awk is called an associative array because its subscript (index) can be a number or a string.

Subscripts are often called keys, and the keys and values of an array element are stored in a table inside the awk program, which takes the hash algorithm, so the array elements are randomly ordered.

Array format: Array[index]=value

1, nginx log analysis

Log format: ' $remote _addr-$remote _user [$time _local] "$request" $status $body _bytes_sent "$http _referer" "$http _user_agent" "$http _x_forwarded_for" '

Logging: 27.189.231.39--[09/apr/2016:17:21:23 +0800] "get/public/index/images/icon_pre.png http/1.1" 44668 "/http Www.test.com/Public/index/css/global.css "" mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/44.0.2403.157 safari/537.36 ""-"

1) The maximum number of 10 IPs accessed in the statistics log

Idea: The first column is de-weighed, and the number of times the output occurs

Method 1:$ awk ' {a[$1]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log

Method 2:$ awk ' {print $} ' access.log |sort |uniq-c |sort-k1-nr |head-n10

Description: a[$1]++ creates an array A, takes the first column as the subscript, uses the operator + + as the array element, and the element initial value is 0. When processing an IP, the subscript is IP, the element plus 1, processing the second IP, the subscript is IP, the element plus 1, if the IP already exists, then the element plus 1, that is, the IP appears two times, the element result is 2, and so on. Therefore, the weight can be achieved, the number of statistical occurrences.

2) More than 100 times of IP access in the statistics log

Method 1:$ awk ' {a[$1]++}end{for (i in a) {if (a[i]>100) print I,a[i]}} ' Access.log

Method 2:$ awk ' {a[$1]++;if (a[$1]>100) {b[$1]++}}end{for (i in B) {print i,a[i]}} ' Access.log

Description: Method 1 is an IP that is determined to be compliant when the output is saved after the result is stored in the a array. Method 2 is to save the result of the A array, and to determine the IP that meets the requirements in the B array, and finally print the IP B array.

3) Statistics of the maximum number of 10 IPs visited in a day April 9, 2016

Idea: First filter out the log of this time period, and then go to the weight, count the number of occurrences

Method 1:$ awk ' $4>= ' [9/apr/2016:00:00:01 ' && $4<= ' [9/apr/2016:23:59:59 ' {a[$1]++}end{for (i in a) print A[i],i |" Sort-k1-nr|head-n10 "} ' Access.log

Method 2:$ sed-n '/\[9\/apr\/2016:00:00:01/,/\[9\/apr\/2016:23:59:59/p ' access.log |sort |uniq-c |sort-k1-nr |head-n10 #前提 Must exist in start time and end time log

4) Count the number of visits a minute before the current time

Idea: First get the time before the current time corresponding to the log format, and then match the statistics

$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m); awk-vdate= $date ' $0~date{c++}end{print c} ' Access.log

$ date=$ (date-d '-1 minute ' +%d/%b/%y:%h:%m) awk-vdate= $date ' $4>= ' ["Date": "&& $4<=" ["Date": "{C + +} End{print C} ' Access.log

$ GREP-C $ (date-d '-1 minute ' +%d/%b/%y:%h:%m) access.log

Description: Date +%d/%b/%y:%h:%m-09/apr/2016:01:55

5) Statistics The top 10 pages visited ($request)

$ Awk ' {a[$7]++}end{for (i in a) print a[i],i| " Sort-k1-nr|head-n10 "} ' Access.log

6) Count the total size of each URL access content ($body _bytes_sent)

$ Awk ' {a[$7]++;size[$7]+=$10}end{for (i in a) print a[i],size[i],i} ' Access.log

7) Statistics per IP Access status code number 216.WWW. Qixoo.com ($status)

$ awk ' {a[$1 ' "" $9]++}end{for (i in a) print I,a[i]} ' Access.log

8) Statistics of Access status code 404 IP and the number of occurrences

$ Awk ' {if ($9~/404/) a[$1 "" $9]++}end{for (i in a) print I,a[i]} ' Access.log

2, two file comparison

The contents of the file are as follows:

$ Cat A

$ cat B

1) Find the same record

Method 1:www.51969.com/$ awk ' Fnr==nr{a[$0];next} ($ in a) ' a B

Before explaining, look at the difference between Fnr and NR:

$ Awk ' {print nr,$0} ' a B

1 1

2 2

3 3

4 4

5 5

6 6

8 4

7 {

10 6

11 7

12 8

$ Awk ' {print fnr,$0} ' a B

1 1

2 2

3 3

4 4

5 5

6 6

1 3

2 4

3 5

4 6

5 7

6 8

You can see that the Nr is processing a row of records, the number will be added 1, but also can be seen awk two files as a merged file processing.

While FNR is processing a row of records, the number is also added 1, but when the second file is processed, the number is counted again.

Description: Fnr and NR are built-in variables. FNR==NR is often used for processing two of files, an example of which awk treats two files as a file.

When processing a file, the FNR is equal to NR, the condition is true, the execution of the A[$0],next expression means that each record is stored in a array as subscript (no element), next is jumping out, similar to continue, does not execute the following expression.

The execution process and so on, until the processing of the B, FNR is not equal to NR (Fnr re-count is 1,nr continue plus 1 is 7), the condition is false, do not perform the following a[$0],next expression, directly execute ($ in a) expression, which means to process the first B file continues to determine whether in a array, If you are printing this record, and so on.

This may be better understood:

$ Awk ' fnr==nr{a[$0]}nr>fnr{if ($ in a) print $} ' a B

Method 2:

$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]) ' A B #小括号可以不加

$ Awk ' Fnr==nr{a[$0]=1;next} (a[$0]==1) ' A b$ awk ' fnr==nr{a[$0]=1;next}{if (a[$0]==1) print} ' a B

$ awk ' Fnr==nr{a[$0]=1}fnr!=nr&&a[$0]==1 ' a B

Note: First know that the following a[$0] is not an array, but rather a subscript (b file per record) to access a array of elements. If A[b's row of records] gets the array of a element is 1, then true, which is equal to 1, prints the record, otherwise the element is not obtained, false.

Method 3:

$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]==1 ' a B

$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]==1 ' a B

Description: Argind built-in variables, processing file identifiers, the first file is 1, the second file is 2. FileName is also a built-in variable that represents the name of the input file

Method 4:$ sort a B |uniq-d

Method 5:$ Grep-f A B

2) find different records (IBID., reverse)

$ Awk ' fnr==nr{a[$0];next}! ($ in a) ' a B

$ awk ' fnr==nr{a[$0]=1;next}!a[$0] ' a B

$ awk ' Argind==1{a[$0]=1}argind==2&&a[$0]!=1 ' a B

$ Awk ' filename== "a" {a[$0]=1}filename== "B" &&a[$0]!=1 ' a B

Method 2:$ sort a B |uniq-d

Method 3:$ GREP-VF A B

3. Merge two files

1) Merge D file sex into C file

$ cat C

Zhangsan 100

Lisi 200

WANGWU 300

$ cat D

Zhangsan Mans

Lisi woman

Method 1:$ awk ' Fnr==nr{a[$1]=$0;next}{print a[$1],$2} ' c D

Zhangsan

Lisi woman

Wangwu

Method 2:$ awk ' Fnr==nr{a[$1]=$0}nr>fnr{print a[$1],$2} ' c D

Description: Nr==fnr matches the first file, NR>FNR matches the second file, and sets an array subscript

Method 3:$ awk ' Argind==1{a[$1]=$0}argind==2{print a[$1],$2} ' c D

2) Merge the service names in the A.txt file into one IP

$ cat A.txt

192.168.2.100:httpd

192.168.2.100:tomcat

192.168.2.101:httpd

192.168.2.101:postfix

192.168.2.102:mysqld

192.168.2.102:httpd

$ awk-f:-vofs= ":" ' {a[$1]=a[$1] $2}end{for (i in a) print I,a[i]} ' a.txt

$ awk-f:-vofs= ":" ' {a[$1]=$2 a[$1]}end{for (i in a) print I,a[i]} ' a.txt

192.168.2.100:HTTPD Tomcat

192.168.2.101:HTTPD postfix

192.168.2.102:mysqld httpd

Description: A[$1]=$2 The first column is subscript, the second column is an element, followed by a[$1] is the array of a elements (service name) by the first row, the result is $1=$2, and as an array of elements.

3) Append the first line to the beginning of each line below

$ cat A.txt

Xiaoli

A 100

B 110

C 120

$ Awk ' nf==1{a=$0;next}{print a,$0} ' a.txt

$ Awk ' nf==1{a=$0}nf!=1{print a,$0} ' a.txt

Xiaoli a 100

Xiaoli B 110

Xiaoli C 120

4. Flashback column Print Text

$ cat A.txt

Xiaoli a 100

Xiaoli B 110

Xiaoli C 120

$ Awk ' {for (i=nf;i>=1;i--) {printf '%s ', $i}print s} ' a.txt

A Xiaoli

b Xiaoli

C-Xiaoli

$ Awk ' {for (i=nf;i>=1;i--) if (i==1) printf $i "\ n", else printf $i ""} ' A.txt

Description: Use NF descending output, the last field as the first output, and then self-subtract, print s or print "" Printing a line break

5. Print from the second column to the last

Method 1:$ awk ' {for (i=2;i<=nf;i++) if (I==NF) prin

TF $i "\ n"; else printf $i ""} ' A.txt

Method 2:$ awk ' {$1= ' "}{print} ' a.txt

A 100

B 110

C 120

6. Place the first column in the C file into the third column in the D file

$ cat C

$ cat D

1 One

3 Three

Method 1:$ awk ' fnr==nr{a[nr]=$0;next}{$3=a[fnr]}1 ' C D

Note: with NR number as subscript, element is per line, when processing D file the third column equals get a data fnr (re-count 1-3) number as subscript.

Method 2:$ awk ' {getline f< "C";p rint $0,f} ' d

1 One A

2 b

3 Three C

1) Replace the second column

$ Awk ' {getline f< "C"; Gsub ($2,f,$2)}1 ' d

1 A

2 b

3 C2) Replace two of the second column

$ Awk ' {getline f< "C"; Gsub ("A", f,$2)}1 ' d

1 One

2 b

3 Three

7. Sum of Numbers

Method 1:$ seq 1 |awk ' {sum+=$0}end{print sum} '

Method 2:$ awk ' Begin{sum=0;i=1;while (i<=100) {sum+=i;i++}print sum} '

Method 3:$ awk ' Begin{for (i=1;i<=100;i++) sum+=i}end{print sum} '/dev/null

Method 4:$ Seq-s + 1 |BC

8. Add a line break or content every three lines

Method 1:$ awk ' $; nr%3==0{printf "\ n"} ' a

Method 2:$ awk ' {print nr%3?$0:$0 ' \ n '} ' a

Method 3:$ sed ' 4~3s/^/\n/' a

9. String splitting

Method 1:

$ echo "Hello" |awk-f "{for (i=1;i<=nf;i++) print $i} '

$ echo "Hello" |awk-f "{i=1;while (I<=NF) {print $i; i++}} '

Linux awk use case summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux awk use case summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux awk use case summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support