Linuxawk multi-file operations two implementation methods

Source: Internet
Author: User
We often merge two associated text files. Obtain the required columns from different files, and then output them together. When awk processes multiple files, it often encounters two problems. The first one is how to merge multiple files into one file. The second question is:

We often merge two associated text files. Obtain the required columns from different files, and then output them together. When awk processes multiple files, it often encounters two problems. The first one is how to merge multiple files into one file. The second problem is how to merge multiple rows into one row for display. Here I will talk about two methods and implementation methods.

Instance text:


Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ awk 'fnr = 1 {print "\ r \ n" FILENAME} {print $0} 'a.txt B .txt

A.txt
100 wang man
200 wangsan woman
300 wangming man
400 wangzheng man

B .txt
100 90 80
200 80 70
300 60 50
400 70 20

Merge to obtain the result:

100 wang man 90 80
200 wangsan woman 80 70
300 wangming man 60 50
400 wangzheng man 70 20

Awk multi-file operation Method 1:

Implementation ideas:

Merge files by using external commands, sort files, and merge files by awk.

First:


Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ cat a.txt B .txt | sort-n-k1 | awk '{print }'
100 90 80
100 wang man
200 80 70
200 wangsan woman
300 60 50
300 wangming man
400 70 20
400 wangzheng man

Now we need to merge the same processing in the first column into one row. here we need to use the "next" statement. For more information, see [next usage] (Common Application 4)

Continue:


Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ cat a.txt B .txt | sort-n-k1 | awk 'NR % 2 = 1 {fd1 = $2 "\ t" $3; next} {print $0 "\ t" fd1 }'
100 wang man 90 80
200 wangsan woman 80 70
300 wangming man 60 50
400 wangzheng man 70 20

You need to merge several rows. the common method is: NR % num, save the row value, and next the row. Print the output.

Awk multi-file operation method 2

Implementation

Open multiple files directly through awk without using the 3rd release tool. Then, you can use: FILENAME to obtain the name of the currently processed file. Total records of NR current file records and total number of input parameters of ARGC. ARGV is an array and each parameter value.

Take a look at these instances:


Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ awk 'In in {print ARGC, ARGV [0], ARGV [1], ARGV [2]} {print FILENAME, NR, FNR, $0} 'a.txt B .txt
3 awk a.txt B .txt
A.txt 1 1 100 wang man
A.txt 2 2 200 wangsan woman
A.txt 3 3 300 wangming man
A.txt 4 4 400 wangzheng man
B .txt 5 1 100 90 80
B .txt 6 2 200 80 70
B .txt 7 3 300 60 50
B .txt 8 4 400 70 20

Program code:


Copy codeThe code is as follows:
[Chengmo @ centos5 shell] $ awk'
BEGIN {
If (ARGC <3)
{
Exit 1;
}

File = "";
}
{
AData [FILENAME, $1] = ARGV [1] = FILENAME? $0: $2 "\ t" $3;
}
END {
For (k in aData)
{
Split (k, idx, SUBSEP );
If (idx [1] = ARGV [1] & (ARGV [2], idx [2]) in aData)
{
Print aData [ARGV [1], idx [2], aData [ARGV [2], idx [2] | "sort-n-k1 ";
}
}
} 'A.txt B .txt

100 wang man 90 80
200 wangsan woman 80 70
300 wangming man 60 50
400 wangzheng man 70 20

Code description:

Here we use a 2-dimensional array, aData [file name, associated column value]. this method can separate multiple file contents. Put in a unified two-dimensional array. Then, use the loop array (if (I, j} in array) to find the corresponding column value and check whether it exists in other files.

The above are two implementation methods. The first method is simple and easy to understand. The second method is complicated. There are better ways to share with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.