We often merge two associated text files. Obtain the required columns from different files, and then output them together. When awk processes multiple files, it often encounters two problems. The first one is how to merge multiple files into one file. The second question is:
We often merge two associated text files. Obtain the required columns from different files, and then output them together. When awk processes multiple files, it often encounters two problems. The first one is how to merge multiple files into one file. The second problem is how to merge multiple rows into one row for display. Here I will talk about two methods and implementation methods.
Instance text:
Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ awk 'fnr = 1 {print "\ r \ n" FILENAME} {print $0} 'a.txt B .txt
A.txt
100 wang man
200 wangsan woman
300 wangming man
400 wangzheng man
B .txt
100 90 80
200 80 70
300 60 50
400 70 20
Merge to obtain the result:
100 wang man 90 80
200 wangsan woman 80 70
300 wangming man 60 50
400 wangzheng man 70 20
Awk multi-file operation Method 1:
Implementation ideas:
Merge files by using external commands, sort files, and merge files by awk.
First:
Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ cat a.txt B .txt | sort-n-k1 | awk '{print }'
100 90 80
100 wang man
200 80 70
200 wangsan woman
300 60 50
300 wangming man
400 70 20
400 wangzheng man
Now we need to merge the same processing in the first column into one row. here we need to use the "next" statement. For more information, see [next usage] (Common Application 4)
Continue:
Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ cat a.txt B .txt | sort-n-k1 | awk 'NR % 2 = 1 {fd1 = $2 "\ t" $3; next} {print $0 "\ t" fd1 }'
100 wang man 90 80
200 wangsan woman 80 70
300 wangming man 60 50
400 wangzheng man 70 20
You need to merge several rows. the common method is: NR % num, save the row value, and next the row. Print the output.
Awk multi-file operation method 2
Implementation
Open multiple files directly through awk without using the 3rd release tool. Then, you can use: FILENAME to obtain the name of the currently processed file. Total records of NR current file records and total number of input parameters of ARGC. ARGV is an array and each parameter value.
Take a look at these instances:
Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ awk 'In in {print ARGC, ARGV [0], ARGV [1], ARGV [2]} {print FILENAME, NR, FNR, $0} 'a.txt B .txt
3 awk a.txt B .txt
A.txt 1 1 100 wang man
A.txt 2 2 200 wangsan woman
A.txt 3 3 300 wangming man
A.txt 4 4 400 wangzheng man
B .txt 5 1 100 90 80
B .txt 6 2 200 80 70
B .txt 7 3 300 60 50
B .txt 8 4 400 70 20
Program code:
Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ awk'
BEGIN {
If (ARGC <3)
{
Exit 1;
}
File = "";
}
{
AData [FILENAME, $1] = ARGV [1] = FILENAME? $0: $2 "\ t" $3;
}
END {
For (k in aData)
{
Split (k, idx, SUBSEP );
If (idx [1] = ARGV [1] & (ARGV [2], idx [2]) in aData)
{
Print aData [ARGV [1], idx [2], aData [ARGV [2], idx [2] | "sort-n-k1 ";
}
}
} 'A.txt B .txt
100 wang man 90 80
200 wangsan woman 80 70
300 wangming man 60 50
400 wangzheng man 70 20
Code description:
Here we use a 2-dimensional array, aData [file name, associated column value]. this method can separate multiple file contents. Put in a unified two-dimensional array. Then, use the loop array (if (I, j} in array) to find the corresponding column value and check whether it exists in other files.
The above are two implementation methods. The first method is simple and easy to understand. The second method is complicated. There are better ways to share with me.