First, the demand
original file a for 1.7g Mangyo look. There are also 2 auxiliary files b and C c files are only 2 and b1 same, A2 a1 B2" a4 a5
Second, the idea
2.1 Before using awk to process data, consider using whileread ... done<a.txt way to process data, but execution is inefficient and does not meet expectations.
2.2 later uses awk to process data, but the problem that must be addressed is passing in an external array or passing data from three files into awk .
Third, the way to solve
3.1 using the while read ... done<a.txt way to process data
While read B1 B2
Do
b_array[$B 1]= $B 2
Done<b.txt
While read C1 C2
Do
c_array[$C 1]= $C 2
Done <b.txt
While read A1 A2 A3 A4 A5
Do
b2_value=b_array[$A 1]
c2_value=c_array[$A 2]
ECHO$A1, $B 2_value, $C 2_value, $A 3, $A 4, $A 5>>d.txt
Fi
Done <a.txt
# The first method can be used when the amount of data is not available.
3.2 An external array into awk
awk Multi-file processing, I encountered an unresolved problem, is to read 3 files into awk directly printed, it is impossible to perform other operations, So I gave up this way instead of using the method of passing an external array into awk. So I found the following code
Awk-vs1= "${time[*]}"-V s2= "${!time[*]}" '
Begin{split (S1,S3, ""); Split (S2,s4, "");
For (I=1;i<=length (S4); i++)
Res[s4[i]]=s3[i];} '
(Reference blog:http://sunlujing.iteye.com/blog/1918907)
The code that eventually processes the data becomes the following:
While ReadB1 B2
Do
b_array[$B 1]= $B 2
Done <b.txt
While ReadC1 C2
Do
c_array[$C 1]= $C 2
Done <b.txt
Awk-f ",",-v s1= "${b_array[*"} "-V s2=" ${! B_array[*]} "-V w1=" ${c_array[*]} "-V w2=" ${! C_array[*]} "'
begin{
Split (S1,S3, ",");
Split (S2,S4, ",");
For (I=1;i<=length (S4); i++)
B_new_array[s4[i]]=s3[i];
Split (W1,W3, ",");
Split (W2,w4, ",");
For (I=1;i<=length (W4); i++)
C_new_array[w4[i]]=w3[i];
}
{
Len=split ($1,a_array, "")
A1=A_ARRAY[1];
A2=A_ARRAY[2];
A3=A_ARRAY[3];
A4=A_ARRAY[4];
A5=A_ARRAY[5];
B2_VALUE=B_NEW_ARRAY[A1];
C2_VALUE=C_NEW_ARRAY[A2];
Printa1,b2_value,c2_value,a3,a4,a5
} ' A.txt>>d.txt
# execution time is approximately 4 minutes.
3.3 and then there was a new demand .
The new requirements are based on A1,B2,C2 are grouped, summing operations on A3,A4,and A5 respectively. This can be done directly in SQL using the GroupBy grouping.
Reference Blog In this:http://linuxguest.blog.51cto.com/195664/424496(awk 's class SQL data processing )
The code then becomes the following:
While ReadB1 B2
Do
b_array[$B 1]= $B 2
Done <b.txt
While ReadC1 C2
Do
c_array[$C 1]= $C 2
Done <b.txt
Awk-f ",",-v s1= "${b_array[*"} "-V s2=" ${! B_array[*]} "-V w1=" ${c_array[*]} "-V w2=" ${! C_array[*]} "'
begin{
Split (S1,S3, ",");
Split (S2,S4, ",");
For (I=1;i<=length (S4); i++)
B_new_array[s4[i]]=s3[i];
Split (W1,W3, ",");
Split (W2,w4, ",");
For (I=1;i<=length (W4); i++)
C_new_array[w4[i]]=w3[i];
}
{
Len=split ($1,a_array, "")
A1=A_ARRAY[1];
A2=A_ARRAY[2];
A3=A_ARRAY[3];
A4=A_ARRAY[4];
A5=A_ARRAY[5];
B2_VALUE=B_NEW_ARRAY[A1];
C2_VALUE=C_NEW_ARRAY[A2];
A3_array[a1 "," B2_value "," C2_VALUE]+=A3;
A4_array[a1 "," B2_value "," C2_VALUE]+=A4;
A5_array[a1 "," B2_value "," C2_VALUE]+=A5;}
end{
for (i Ina3_array)
{
Printi "," a3_array[i] "," A4_array "," A5_array
}
} ' A.txt>>e.txt
This article is from the "three countries Cold jokes" blog, please be sure to keep this source http://hwj91.blog.51cto.com/9763975/1698470
Using awk to process large volumes of data in shell scripts