Example of awk Array Processing two files

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If file a contains file B, the records of file B are printed and output to file C.

File:
10/05766798607, 11/20050325191329, 29/0. 05766798607/
10/05767158557, 11/20050325191329, 29/0. 05767158557/

File B:
05766798607
05766798608
05766798609
Compare file a with file B to export such a file.
10/05766798607, 11/20050325191329, 29/0. 05766798607/I have found many online answers with error codes. The correct answer should be: Method 1: awk-F' [/,] ''argind = 1 {A [$0]} argind> 1 {if ($2 in A) Print $0} 'B A> C Method 2: awk-F' [/,] ''Nr = FNR {A [$0]} Nr> FNR {if ($2 in) print $0} 'B A> C these two methods are processed with arrays, the speed is relatively fast, processing 90 thousand rows takes only 4 seconds. Another method is to use the while command to read a record from B and compare it with $2 in A. If the value is equal, the record is output to root @ testas4 in C.
Zlwt] # More for3.sh
#! /Bin/bash
While read line; do
Awk-F' [/,] ''$2 = '$ line' {print $0} 'a> cdone <B; this method is easy to understand, however, the speed is very slow, and only one record is read at a time. It takes 5 hours to process 90 thousand rows.Example 2
Awk Array Processing of two file indexes (Alternative method)
[Root @ testas4 zlwt] # more
Depta
Deptb
Deptc
Deptd
[Root @ testas4 zlwt] # more B
AAA 0
Bbb 1
CCC 2
Ddd 0
Eee 2
Fff 2
[Root @ testas4 zlwt] # awk 'nr = FNR {k [I ++] = $1} Nr> FNR {print $1, K [$2]} 'a B

AAA depta
Bbb deptb
CCC deptc
Ddd depta
Eee deptc
Fff deptc Nr = FNR {k [I ++] = $1} # first assign the value of file a to array K, subscript automatically increases NR> FNR {print $1, K [$2] # where $1 and $2 are the first and second fields in B, K [$2] is the value of a. The method below is the same for the r2007 moderator.
[Root @ testas4 zlwt] # awk '{If (Nr = FNR) K [I ++] = $0; else Print $1, K [$2]} 'a B
AAA depta
Bbb deptb
CCC deptc
Ddd depta
Eee deptc
Another example of fff deptcAwk

'In in {FS = "[|]"; OFS = "| "}
FNR = nR {A [$1] = $2}
FNR <Nr {If (! A [$1]) {$1 = "13"; print}
Else {$1 = A [$1]; print }}
'Wj wj1> wj2

FNR = nR {A [$1] = $2} indicates that desc1 is assigned to the [id1] unit of array a when processing the first file.
FNR <Nr condition is true when processing the 2nd file. In this way, when processing 2nd files
{If (! A [$1]) {$1 = "13"; print}
Else {$1 = A [$1]; print
If a [$1] is empty, replace Column 2nd in the row of file 1st with column 13, for example, 13 | desc2.
If a [$1] is not empty, the array value is already assigned to the 1st file. Replace $1 with a [$1], that is, the $2 corresponding to file 1. Desc1 | desc2

In summary, in file 2, check the corresponding desc1 of id1 = Id2 in file 1 with Id2,
Find output desc1 | desc2
Output 13 not found | desc2For example, replace 1331131 ***** in an array with 86 in batches.
1331131 *****

# Cat a.txt

13994623 ***
13394660 ***
13394660 ***
13394671 ***
13394672 ***
13394690 ***
13394692 ***
15304863 ***

# Awk '{print "86" $1}' a.txt> B .txt
8613994623 *** 8613394660 ***
8613394660 ***
8613394671 ***
8613394672 ***
8613394690 ***
8613394692 ***
8615304863 ***

# Awk '{print substr ($, 11)}' B .txt remove 86

13994623 ***
13394660 ***
13394660 ***
13394671 ***
13394672 ***
13394690 ***
13394692 ***
15304863 ***------------------------------------------------------------------------------
Associate two files
[Root @ testas4 CWM] # awk '{print $1}' 153mdn.txt | uniq-C
4 Qitaihe
5 Yichun
18 jiamusi
13 Shuangyashan
66 Harbin
1 Daxinganling
32 Daqing
20 Mudanjiang
19 Suihua
16 Jixi
15 Hegang
10 Heihe
19 Qiqihar
[Root @ testas4 CWM] # awk '{print $1, substr ($1, 1, 7)} 'hlj_jifei> hlj_temp
[Root @ testas4 MDN] # More hlj_temp
13009700055 1300970
13009700495 1300970
13009701075 1300970
13009701282 1300970

[Root @ testas4 MDN] # ls
2 3 awk_script CWM hlj_jifei hlj_temp newmdn_table.txt temp test1
[Root @ testas4 MDN] # More test1
1300019 510 Guangzhou
1300101 110 010 Beijing
1300103 110 010 Beijing
1300104 110 010 Beijing
1300106 110 010 Beijing

[Root @ testas4 MDN] # awk
'Nr = FNR {A [substr ($, 7)] = $4} Nr> FNR & A [B = substr ($, 7)] {print
$1, a [B]} 'test1 hlj_temp | more
Or
[Root @ testas4 MDN] # awk 'nr = FNR {A [$1] = $4} Nr> FNR & A [B = substr ($, 7)] {print $1, a [B]} 'test1 hlj_temp
13009700055 Harbin
13009700495 Harbin
13009701075 Harbin
13009701282 Harbin

Bytes --------------------------------------------------------------------------------------
[Root @ testas4 MDN] # More temp
1300970 13009700055
1300970 13009700495
1300970 13009701075
1300970 13009701282

--------------------------------------------------------------------------------

[Root @ testas4 MDN] # More awk_script
Begin {While (Getline <"test1")> 0) {lines [$1] = $4}; OFS = ""}
{
If ($1 in lines ){
$1 = lines [$1] # Replace $4 of test1 with $1 of temp.
Print $0
}
}
# Insert the fourth field of the test1 file to the first child segment of the corresponding item in the Temp File
# Use Getline to obtain the fourth field of the test1 file and put it in an array.

[Root @ testas4 MDN] # ls
2 3 awk_script CWM hlj_jifei hlj_temp newmdn_table.txt temp test1
[Root @ testas4 MDN] # awk-F awk_script temp | WC-l
63440
[Root @ testas4 MDN] # awk-F awk_script temp | more
Harbin 13009700055
Harbin 13009700495

Another example of awk is to calculate the sum of all values in a column.

Sum the values of all the second columns.

[Root @ testas4 ~] # More cwm.txt
(CWM 123)
Zbll 124
Yhh 2
CJ 1
[Root @ testas4 ~] # Awk '{A [x ++] =2 2}; end {for (I = 1; I <= nR; I ++) B = B + A [I-1]; print B} 'cwm.txt
250
[Root @ testas4 ~] # Awk '{A [Nr] = $2; B = 0}; end {for (I = 1; I <= nR; I ++) B = B + A [I]; print B} 'cwm.txt
250

Displays the number of rows from MB to N of the file.

[Root @ testas4 ~] # Sed-n'2, 10' P mdn.txt

[Root @ testas4 ~] # Awk 'nr = 2, Nr = 10 {print $0} 'mdn.txt

Divide the mobile phone number into the Internet.

1. Network C (Network C is the number starting with 133 or 153)

Awk '$1 ~ /^ 133/|
$1 ~ /^ 153/'file.txt> cnet.txt

2. GNET (because GNET is mostly non-133 and non-153 and starts)

Awk '$1 !~ // ^ 133 /&&
$1 !~ /^ 153/'file.txt> gwang.txt

Connect two files to each row

[Root @ testas4 CWM] # More tep_01.txt
(CWM 13911320988)
Zbll 13931095233
Chen 12333333333
(CWM 12233333333)
(CWM 45555555555)
[Root @ testas4 CWM] # More tep_02.txt
Cwm1 111320988
Zbl1 131095233
Chen1 133333333
Cwm1 133333333
Cwm1 455555555

Awk 'nr = FNR {A [FNR] = $0} Nr> FNR {print $0, a [FNR]} 'tep_01.txt tep_02.txt

Cwm1 111320988 CWM 13911320988
Z/BL1 131095233 Z/BL 13931095233
Chen1 133333333 Chen 12333333333
Cwm1 133333333 CWM 12233333333
Cwm1 455555555 CWM 45555555555

There is also a command Paste
[Root @ testas4 CWM] # paste tep_01.txt tep_02.txt
CWM 13911320988 cwm1 111320988
Zbl 13931095233 zbl1 131095233
Chen 12333333333 chen1 133333333
CWM 12233333333 cwm1 133333333
CWM 45555555555 cwm1 455555555

Awk processes the file starting with Han and ending with the last line of numbers of the next Han... or extract a file segment that starts with Han and ends with the last row field of the next Han to generate files such as han1.
[Root @ testas4 CWM] # More file1.txt
Han 1
12 23 34 45
23 45 56
Han 2
12 23 34 45
23 45 56
12 23 34 45
Han 3
12 23 34 45
23 45 56 44
12 23 34 45
23 45 56
Han 4
12 23 34 45
23 45 56
Han n
Awk '{if ($1 = "Han" & NF = 2) fn = $2; print $0> "Han" FN;}' file1.txt
Awk '{fn = $2; print $1> FN "HB"}' hbuse.txt this is all records classified as $2.

----------------------- Find the same and different values of the two files.
----------------------------------
Awk 'nr = FNR {A [$0] ++} Nr> FNR &&! A [$0] 'file1 file2 find different values in file 2
Awk 'nr = FNR {A [$0] ++} Nr> FNR & A [$0] 'file1 file2 find the same value in the two files
Or
Awk 'nr = FNR {A [$0]} Nr> FNR {If (! ($1 in a) Print $0} 'file1 file2 find different values in file 2
Awk 'nr = FNR {A [$0]} Nr> FNR {if ($1 in a) Print $0} 'file1 file2 find the same value in the two files

------------------------ Awk statistics by field category
----------------------------------------
1300018 Guangdong
1300019 Guangdong
1300100 Beijing
1300101 Beijing
1300126 Beijing
1300127 Beijing
1300128 Beijing
1300129 Beijing
Tianjin 1300130
Tianjin 1300131
Tianjin 1300132
Tianjin 1300133

You want to get three files:
Guangdong 2.txt
1300018
1300019

Beijing 6.txt
1300100
1300101
1300126
1300127
1300128
1300129

Tianjin 4.txt
1300130
1300131
1300132
1300133

Awk '{A [$2] ++; print $1> $2} end {for (I In) {print "mv" I "" I "" A [I] ". TXT "} 'ufile | sh

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Example of awk Array Processing two files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Example of awk Array Processing two files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support