Awk implements left, join query, remove duplicate values, and local variables explain examples _linux shell

Source: Internet
Author: User

Recently saw a few good small examples in the forum, for learning awk is still helpful, here in detail

A. A LEFT join query in a similar database

Copy Code code as follows:

[Root@krlcgcms01 mytest]# cat A.txt//a.txt
aaa
222 BBB
333 CCCC
444 DDD
[Root@krlcgcms01 mytest]# cat B.txt//b.txt
111 123 456
2 ABC CBD
444 RTS 786


Required output result is
111,aaa,123,456
444,ddd,rts,786

Implementation method:

Copy Code code as follows:

[Root@krlcgcms01 mytest]# awk ' nr==fnr{a[$1]=$2;} NR!=FNR && A[$1]{print $ "," a[$1] "," $ "," $} ' a.txt B.txt
111,aaa,123,456
444,ddd,rts,786

Explanation: When NR and Fnr are the same, this means that the first file is manipulated, A[$1]=$2 says, to create an array with the first field as subscript and the second field as the value. When NR!=FNR, the description is in the second file, note: This time the $ and the previous $ is not the same thing, the preceding is represented by the first field of A.txt, and the following is the b.txt of the first field. A[$1] Represents the value of the first field in B.txt, and if a[$1] has a value, the description also exists in the A.txt file, so that the data print out.

Implementation Method 2:

Copy Code code as follows:

[Root@krlcgcms01 mytest]# awk-v ofs= "," ' nr==fnr{a[$1]=$2;} NR!=FNR && $ in a {print $1,a[$1],$2,$3} ' a.txt b.txt
111,aaa,123,456
444,ddd,rts,786

Explanation:-v ofs= "," This is the column delimiter when the output is set, and in a This is the value of the first column in the B.txt file is not in the key of array A, which is well understood by the program, which is used in all languages, or functions. For example, there are in_array functions in PHP. Compare the print in methods 1 and 2, Method 1 I added double quotes, Method 2 I did not add, but the output is the same effect.

Second, remove the duplicate value

Copy Code code as follows:

[Root@krlcgcms01 mytest]# Cat Repea//File Repea
A b
C D
E F
b d
b A
F E
1 2
2 1

If there are a,b and b,a such situation, delete the B,a, of course, the same number;

Implementation Method 1:

Copy Code code as follows:

awk ' {for (i=1;i<=nf;i++) a[i]= $i; Asort (a), for (I=1;i<=length (a); i++) printf a[i] "\ t";p rintf "\ n"} ' repea|sort| Uniq
1 2
A b
b d
C D
E F

Explanation: for (i=1;i<=nf;i++) a[i]= $i; Place two fields in each column in an array, Asort (a), the array is sorted, the following code outputs the array data, the sort command sorts the input data, the same data is sorted together, The same column is removed by Uniq. This method is more versatile, not only for two columns, three columns, four columns. But the efficiency is a bit poor.

Implementation Method 2:

Copy Code code as follows:

[Root@krlcgcms01 mytest]# awk ' {a[$0]=$0;if ( $ OFS $ in a) print a[$0]} ' Repea
A b
C D
E F
b d
1 2
[Root@krlcgcms01 mytest]# awk ' {a[$0];if ( $ OFS $ (a)) print} ' Repea
A b
C D
E F
b d
1 2

Explanation: Method 2 of the two kinds of writing, the result is the same, a[$0]; no assignment and no error, why? Awk gives it an initial value when it encounters a variable that is not defined. if (!) ( $ OFS in a) indicates that the backward field is not in array a, which is said to indicate whether the key exists, not a value. Print does not write by default is a row.

Implementation Method 3:

Copy Code code as follows:

[Root@krlcgcms01 mytest]# awk '!a[$1_$2]++&&!a[$2_$1]++ ' Repea
A b
C D
E F
b d
1 2
[Root@krlcgcms01 mytest]# awk ' {if (!a[$1_$2]++&&!a[$2_$1]++) print $} ' Repea
A b
C D
E F
b d
1 2

Explanation:!a[$1_$2]++&&!a[$2_$1]++ equals if (!a[$1_$2]++&&!a[$2_$1]++), the value of a[$2_$1] is undefined for the first occurrence of the record, due to the following + + is a mathematical calculation, so a[$2_$1] will be assigned to the number 0, but also because of the + + operator, will first take the value, and then calculate, from left to right + + operator precedence greater than! operator, so the first line of records is actually if (! 0) Print $! is to take the counter, 0 is false,! 0 is true, then you will perform the following print $ for subsequent duplicate records, a[$0] After the calculation of + + has become 1, 2, 3 ... and! 1! 2! 3.. are false and will not print.

Third, the local variables of awk

This example illustrates the bizarre local variables of awk

Copy Code code as follows:

[Root@krlcgcms01 mytest]# cat sum
1 2
2 3
A b
3 2
4 1
3 R

The number of rows, the largest number added up, the first line is 2, the second row is 3, every four rows is 3, the fifth line is 4, the sum is 12

Copy Code code as follows:

function Max (one,two) {
if (one > two) {
sum = sum + one;
}else{
sum = sum + two;
}
}

{if ($1~ "[0-9]" && $2~ "[0-9]") max ($1,$2);}
End{print "sum=" sum}


In the Max method, the sum of variables will affect the outside, and the sum here is global.
[root@krlcgcms01 mytest]# awk-f add.sh sum
Sum=12

Copy Code code as follows:

Sum local variable in function max (one,two,sum) {//method
if (one > two) {
sum = sum + one;
}else{
sum = sum + two;
}
}

{if ($1~ "[0-9]" && $2~ "[0-9]") max ($1,$2,sum);}

End{print "sum=" sum}//So empty

[root@krlcgcms01 mytest]# awk-f add.sh sum
sum=

Copy Code code as follows:

function Max (one,two,sum) {
if (one > two) {
sum = sum + one;
}else{
sum = sum + two;

}

return sum//plus return is OK.
}

{if ($1~ "[0-9]" && $2~ "[0-9]") sum = max ($1,$2,sum);}

End{print "sum=" sum}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.