Recently saw a few good small examples in the forum, for learning awk is still helpful, here in detail
A. A LEFT join query in a similar database
Copy Code code as follows:
[Root@krlcgcms01 mytest]# cat A.txt//a.txt
aaa
222 BBB
333 CCCC
444 DDD
[Root@krlcgcms01 mytest]# cat B.txt//b.txt
111 123 456
2 ABC CBD
444 RTS 786
Required output result is
111,aaa,123,456
444,ddd,rts,786
Implementation method:
Copy Code code as follows:
[Root@krlcgcms01 mytest]# awk ' nr==fnr{a[$1]=$2;} NR!=FNR && A[$1]{print $ "," a[$1] "," $ "," $} ' a.txt B.txt
111,aaa,123,456
444,ddd,rts,786
Explanation: When NR and Fnr are the same, this means that the first file is manipulated, A[$1]=$2 says, to create an array with the first field as subscript and the second field as the value. When NR!=FNR, the description is in the second file, note: This time the $ and the previous $ is not the same thing, the preceding is represented by the first field of A.txt, and the following is the b.txt of the first field. A[$1] Represents the value of the first field in B.txt, and if a[$1] has a value, the description also exists in the A.txt file, so that the data print out.
Implementation Method 2:
Copy Code code as follows:
[Root@krlcgcms01 mytest]# awk-v ofs= "," ' nr==fnr{a[$1]=$2;} NR!=FNR && $ in a {print $1,a[$1],$2,$3} ' a.txt b.txt
111,aaa,123,456
444,ddd,rts,786
Explanation:-v ofs= "," This is the column delimiter when the output is set, and in a This is the value of the first column in the B.txt file is not in the key of array A, which is well understood by the program, which is used in all languages, or functions. For example, there are in_array functions in PHP. Compare the print in methods 1 and 2, Method 1 I added double quotes, Method 2 I did not add, but the output is the same effect.
Second, remove the duplicate value
Copy Code code as follows:
[Root@krlcgcms01 mytest]# Cat Repea//File Repea
A b
C D
E F
b d
b A
F E
1 2
2 1
If there are a,b and b,a such situation, delete the B,a, of course, the same number;
Implementation Method 1:
Copy Code code as follows:
awk ' {for (i=1;i<=nf;i++) a[i]= $i; Asort (a), for (I=1;i<=length (a); i++) printf a[i] "\ t";p rintf "\ n"} ' repea|sort| Uniq
1 2
A b
b d
C D
E F
Explanation: for (i=1;i<=nf;i++) a[i]= $i; Place two fields in each column in an array, Asort (a), the array is sorted, the following code outputs the array data, the sort command sorts the input data, the same data is sorted together, The same column is removed by Uniq. This method is more versatile, not only for two columns, three columns, four columns. But the efficiency is a bit poor.
Implementation Method 2:
Copy Code code as follows:
[Root@krlcgcms01 mytest]# awk ' {a[$0]=$0;if ( $ OFS $ in a) print a[$0]} ' Repea
A b
C D
E F
b d
1 2
[Root@krlcgcms01 mytest]# awk ' {a[$0];if ( $ OFS $ (a)) print} ' Repea
A b
C D
E F
b d
1 2
Explanation: Method 2 of the two kinds of writing, the result is the same, a[$0]; no assignment and no error, why? Awk gives it an initial value when it encounters a variable that is not defined. if (!) ( $ OFS in a) indicates that the backward field is not in array a, which is said to indicate whether the key exists, not a value. Print does not write by default is a row.
Implementation Method 3:
Copy Code code as follows:
[Root@krlcgcms01 mytest]# awk '!a[$1_$2]++&&!a[$2_$1]++ ' Repea
A b
C D
E F
b d
1 2
[Root@krlcgcms01 mytest]# awk ' {if (!a[$1_$2]++&&!a[$2_$1]++) print $} ' Repea
A b
C D
E F
b d
1 2
Explanation:!a[$1_$2]++&&!a[$2_$1]++ equals if (!a[$1_$2]++&&!a[$2_$1]++), the value of a[$2_$1] is undefined for the first occurrence of the record, due to the following + + is a mathematical calculation, so a[$2_$1] will be assigned to the number 0, but also because of the + + operator, will first take the value, and then calculate, from left to right + + operator precedence greater than! operator, so the first line of records is actually if (! 0) Print $! is to take the counter, 0 is false,! 0 is true, then you will perform the following print $ for subsequent duplicate records, a[$0] After the calculation of + + has become 1, 2, 3 ... and! 1! 2! 3.. are false and will not print.
Third, the local variables of awk
This example illustrates the bizarre local variables of awk
Copy Code code as follows:
[Root@krlcgcms01 mytest]# cat sum
1 2
2 3
A b
3 2
4 1
3 R
The number of rows, the largest number added up, the first line is 2, the second row is 3, every four rows is 3, the fifth line is 4, the sum is 12
Copy Code code as follows:
function Max (one,two) {
if (one > two) {
sum = sum + one;
}else{
sum = sum + two;
}
}
{if ($1~ "[0-9]" && $2~ "[0-9]") max ($1,$2);}
End{print "sum=" sum}
In the Max method, the sum of variables will affect the outside, and the sum here is global.
[root@krlcgcms01 mytest]# awk-f add.sh sum
Sum=12
Copy Code code as follows:
Sum local variable in function max (one,two,sum) {//method
if (one > two) {
sum = sum + one;
}else{
sum = sum + two;
}
}
{if ($1~ "[0-9]" && $2~ "[0-9]") max ($1,$2,sum);}
End{print "sum=" sum}//So empty
[root@krlcgcms01 mytest]# awk-f add.sh sum
sum=
Copy Code code as follows:
function Max (one,two,sum) {
if (one > two) {
sum = sum + one;
}else{
sum = sum + two;
}
return sum//plus return is OK.
}
{if ($1~ "[0-9]" && $2~ "[0-9]") sum = max ($1,$2,sum);}
End{print "sum=" sum}