Example of awk Array Processing two files

Source: Internet
Author: User
Example of awk Array Processing two files

If file a contains file B, the records of file B are printed and output to file C.

File:
10/05766798607, 11/20050325191329, 29/0. 05766798607/
10/05767158557, 11/20050325191329, 29/0. 05767158557/

File B:
05766798607
05766798608
05766798609
Compare file a with file B to export such a file.
10/05766798607, 11/20050325191329, 29/0. 05766798607/I have found many online answers with error codes. The correct answer should be: Method 1: awk-F' [/,] ''argind = 1 {A [$0]} argind> 1 {if ($2 in A) Print $0} 'B A> C Method 2: awk-F' [/,] ''Nr = FNR {A [$0]} Nr> FNR {if ($2 in) print $0} 'B A> C these two methods are processed with arrays, the speed is relatively fast, processing 90 thousand rows takes only 4 seconds. Another method is to use the while command to read a record from B and compare it with $2 in A. If the value is equal, the record is output to root @ testas4 in C.
Zlwt] # More for3.sh
#! /Bin/bash
While read line; do
Awk-F' [/,] ''$2 = '$ line' {print $0} 'a> cdone <B; this method is easy to understand, however, the speed is very slow, and only one record is read at a time. It takes 5 hours to process 90 thousand rows.Example 2
Awk Array Processing of two file indexes (Alternative method)
[Root @ testas4 zlwt] # more
Depta
Deptb
Deptc
Deptd
[Root @ testas4 zlwt] # more B
AAA 0
Bbb 1
CCC 2
Ddd 0
Eee 2
Fff 2
[Root @ testas4 zlwt] # awk 'nr = FNR {k [I ++] = $1} Nr> FNR {print $1, K [$2]} 'a B

AAA depta
Bbb deptb
CCC deptc
Ddd depta
Eee deptc
Fff deptc Nr = FNR {k [I ++] = $1} # first assign the value of file a to array K, subscript automatically increases NR> FNR {print $1, K [$2] # where $1 and $2 are the first and second fields in B, K [$2] is the value of a. The method below is the same for the r2007 moderator.
[Root @ testas4 zlwt] # awk '{If (Nr = FNR) K [I ++] = $0; else Print $1, K [$2]} 'a B
AAA depta
Bbb deptb
CCC deptc
Ddd depta
Eee deptc
Another example of fff deptcAwk


'In in {FS = "[|]"; OFS = "| "}
FNR = nR {A [$1] = $2}
FNR <Nr {If (! A [$1]) {$1 = "13"; print}
Else {$1 = A [$1]; print }}
'Wj wj1> wj2

File 1
1 | name1
2 | name2
3 | name3
5 | name5
6 | name6

File 2
1 | name11
2 | name22
3 | name33
4 | name44
5 | name55
6 | name66
7 | name77
8 | name88

Output result
Name1 | name11
Name2 | name22
Name3 | name33
13 | name44
Name5 | name55
Name6 | name66
13 | name77
13 | name88 it is processing two |-Separated Files
For example
File 1 WJ format
Id1 | desc1
File 2 wj1 format
Id2 | desc2

FNR = nR {A [$1] = $2} indicates that desc1 is assigned to the [id1] unit of array a when processing the first file.
FNR <Nr condition is true when processing the 2nd file. In this way, when processing 2nd files
{If (! A [$1]) {$1 = "13"; print}
Else {$1 = A [$1]; print
If a [$1] is empty, replace Column 2nd in the row of file 1st with column 13, for example, 13 | desc2.
If a [$1] is not empty, the array value is already assigned to the 1st file. Replace $1 with a [$1], that is, the $2 corresponding to file 1. Desc1 | desc2

In summary, in file 2, check the corresponding desc1 of id1 = Id2 in file 1 with Id2,
Find output desc1 | desc2
Output 13 not found | desc2For example, replace 1331131 ***** in an array with 86 in batches.
1331131 *****

# Cat a.txt

13994623 ***
13394660 ***
13394660 ***
13394671 ***
13394672 ***
13394690 ***
13394692 ***
15304863 ***

# Awk '{print "86" $1}' a.txt> B .txt
8613994623 *** 8613394660 ***
8613394660 ***
8613394671 ***
8613394672 ***
8613394690 ***
8613394692 ***
8615304863 ***

# Awk '{print substr ($, 11)}' B .txt remove 86

13994623 ***
13394660 ***
13394660 ***
13394671 ***
13394672 ***
13394690 ***
13394692 ***
15304863 ***------------------------------------------------------------------------------
Associate two files
[Root @ testas4 CWM] # awk '{print $1}' 153mdn.txt | uniq-C
4 Qitaihe
5 Yichun
18 jiamusi
13 Shuangyashan
66 Harbin
1 Daxinganling
32 Daqing
20 Mudanjiang
19 Suihua
16 Jixi
15 Hegang
10 Heihe
19 Qiqihar
[Root @ testas4 CWM] # awk '{print $1, substr ($1, 1, 7)} 'hlj_jifei> hlj_temp
[Root @ testas4 MDN] # More hlj_temp
13009700055 1300970
13009700495 1300970
13009701075 1300970
13009701282 1300970

[Root @ testas4 MDN] # ls
2 3 awk_script CWM hlj_jifei hlj_temp newmdn_table.txt temp test1
[Root @ testas4 MDN] # More test1
1300019 510 Guangzhou
1300101 110 010 Beijing
1300103 110 010 Beijing
1300104 110 010 Beijing
1300106 110 010 Beijing

[Root @ testas4 MDN] # awk
'Nr = FNR {A [substr ($, 7)] = $4} Nr> FNR & A [B = substr ($, 7)] {print
$1, a [B]} 'test1 hlj_temp | more
Or
[Root @ testas4 MDN] # awk 'nr = FNR {A [$1] = $4} Nr> FNR & A [B = substr ($, 7)] {print $1, a [B]} 'test1 hlj_temp
13009700055 Harbin
13009700495 Harbin
13009701075 Harbin
13009701282 Harbin

Bytes --------------------------------------------------------------------------------------
[Root @ testas4 MDN] # More temp
1300970 13009700055
1300970 13009700495
1300970 13009701075
1300970 13009701282

--------------------------------------------------------------------------------

[Root @ testas4 MDN] # More awk_script
Begin {While (Getline <"test1")> 0) {lines [$1] = $4}; OFS = ""}
{
If ($1 in lines ){
$1 = lines [$1] # Replace $4 of test1 with $1 of temp.
Print $0
}
}
# Insert the fourth field of the test1 file to the first child segment of the corresponding item in the Temp File
# Use Getline to obtain the fourth field of the test1 file and put it in an array.

[Root @ testas4 MDN] # ls
2 3 awk_script CWM hlj_jifei hlj_temp newmdn_table.txt temp test1
[Root @ testas4 MDN] # awk-F awk_script temp | WC-l
63440
[Root @ testas4 MDN] # awk-F awk_script temp | more
Harbin 13009700055
Harbin 13009700495

Another example of awk is to calculate the sum of all values in a column.


Sum the values of all the second columns.

[Root @ testas4 ~] # More cwm.txt
(CWM 123)
Zbll 124
Yhh 2
CJ 1
[Root @ testas4 ~] # Awk '{A [x ++] =2 2}; end {for (I = 1; I <= nR; I ++) B = B + A [I-1]; print B} 'cwm.txt
250
[Root @ testas4 ~] # Awk '{A [Nr] = $2; B = 0}; end {for (I = 1; I <= nR; I ++) B = B + A [I]; print B} 'cwm.txt
250

Displays the number of rows from MB to N of the file.

[Root @ testas4 ~] # Sed-n'2, 10' P mdn.txt

[Root @ testas4 ~] # Awk 'nr = 2, Nr = 10 {print $0} 'mdn.txt

Divide the mobile phone number into the Internet.

1. Network C (Network C is the number starting with 133 or 153)


Awk '$1 ~ /^ 133/|
$1 ~ /^ 153/'file.txt> cnet.txt

2. GNET (because GNET is mostly non-133 and non-153 and starts)

Awk '$1 !~ // ^ 133 /&&
$1 !~ /^ 153/'file.txt> gwang.txt

Connect two files to each row

[Root @ testas4 CWM] # More tep_01.txt
(CWM 13911320988)
Zbll 13931095233
Chen 12333333333
(CWM 12233333333)
(CWM 45555555555)
[Root @ testas4 CWM] # More tep_02.txt
Cwm1 111320988
Zbl1 131095233
Chen1 133333333
Cwm1 133333333
Cwm1 455555555

Awk 'nr = FNR {A [FNR] = $0} Nr> FNR {print $0, a [FNR]} 'tep_01.txt tep_02.txt

Cwm1 111320988 CWM 13911320988
Z/BL1 131095233 Z/BL 13931095233
Chen1 133333333 Chen 12333333333
Cwm1 133333333 CWM 12233333333
Cwm1 455555555 CWM 45555555555

There is also a command Paste
[Root @ testas4 CWM] # paste tep_01.txt tep_02.txt
CWM 13911320988 cwm1 111320988
Zbl 13931095233 zbl1 131095233
Chen 12333333333 chen1 133333333
CWM 12233333333 cwm1 133333333
CWM 45555555555 cwm1 455555555

Awk processes the file starting with Han and ending with the last line of numbers of the next Han... or extract a file segment that starts with Han and ends with the last row field of the next Han to generate files such as han1.
[Root @ testas4 CWM] # More file1.txt
Han 1
12 23 34 45
23 45 56
Han 2
12 23 34 45
23 45 56
12 23 34 45
Han 3
12 23 34 45
23 45 56 44
12 23 34 45
23 45 56
Han 4
12 23 34 45
23 45 56
Han n
Awk '{if ($1 = "Han" & NF = 2) fn = $2; print $0> "Han" FN;}' file1.txt
Awk '{fn = $2; print $1> FN "HB"}' hbuse.txt this is all records classified as $2.

----------------------- Find the same and different values of the two files.
----------------------------------
Awk 'nr = FNR {A [$0] ++} Nr> FNR &&! A [$0] 'file1 file2 find different values in file 2
Awk 'nr = FNR {A [$0] ++} Nr> FNR & A [$0] 'file1 file2 find the same value in the two files
Or
Awk 'nr = FNR {A [$0]} Nr> FNR {If (! ($1 in a) Print $0} 'file1 file2 find different values in file 2
Awk 'nr = FNR {A [$0]} Nr> FNR {if ($1 in a) Print $0} 'file1 file2 find the same value in the two files

------------------------ Awk statistics by field category
----------------------------------------
1300018 Guangdong
1300019 Guangdong
1300100 Beijing
1300101 Beijing
1300126 Beijing
1300127 Beijing
1300128 Beijing
1300129 Beijing
Tianjin 1300130
Tianjin 1300131
Tianjin 1300132
Tianjin 1300133

You want to get three files:
Guangdong 2.txt
1300018
1300019

Beijing 6.txt
1300100
1300101
1300126
1300127
1300128
1300129

Tianjin 4.txt
1300130
1300131
1300132
1300133

Awk '{A [$2] ++; print $1> $2} end {for (I In) {print "mv" I "" I "" A [I] ". TXT "} 'ufile | sh

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.