Improve work efficiency using awk's numerical computing Function

Source: Internet
Author: User
Tags mathematical functions ibm developerworks

From IBM developerworks

Awk is an excellent text style scanning and processing tool. This article focuses on the application of awk in numerical computation, and illustrates how to use the computing function of awk to improve our work efficiency through several practical examples.

Awk is an excellent text style scanning and processing tool. Awk is somewhat similar to SED and grep, but its function is much better than that of SED and grep. Features provided by awk include style loading, flow control, mathematical operators, process control, and many built-in variables and functions. With these features, we can easily use awk to process various files (such as the data files produced by the test and database files. This article introduces the application of awk in numerical computation, and illustrates how to use the computing function of awk to improve our work efficiency through several practical examples.

 

Basic operators, mathematical functions, and simple operation examples of awk

Awk supports many common operators, such as + (plus),-(minus), * (multiplication),/(except), ^ Or ** (multiplication ), % (Modulo) and so on. In addition, awk also provides some common mathematical functions, suchSin (X),Cos (X),Exp (X),Log (X),SQRT (X),Rand (). Using these operators and functions, you can directly perform some simple operations:

Listing 1. Using awk for simple numerical calculation

echo | awk '{print 19+7}' ==> 26 echo | awk '{print 19-7}' ==> 12echo | awk '{print 19*7}' ==> 133echo | awk '{print 19/7}' ==> 2.71429echo | awk '{print 19**7}' ==> 893871739echo | awk '{print 19%7}' ==> 5echo | awk '{print atan2(19, 7)}' ==> 1.21781

The preceding computation can also be completed using a script file Calc. awk:

Listing 2. Script File Calc. awk

{  print $1 " + " $2 " = " $1 + $2  print $1 " - " $2 " = " $1 - $2  print $1 " x " $2 " = " $1 * $2  print $1 " / " $2 " = " $1 / $2  print $1 " ^ " $2 " = " $1 ** $2  print $1 " mod " $2 " = " $1 % $2  print " atan2( " $1 " , " $2 " ) " " = " atan2($1, $2) }

RunAwk -F Calc. awk 19 7The calculation result is the same as that in Listing 1. Options-FAllows the awk to call and execute the program file Calc. awk; the final19And7Is the input, which corresponds to$1And$2.

 

Complex numeric calculation

Now we use awk to complete some slightly complex calculations. We first use awk to calculate the Fibonacci series. For the corresponding awk program fib. awk, see listing 3:

Listing 3. program file for computing the Fibonacci series

function fibo(n) {  if(n<=1) return 1;  return (fibo(n-2) + fibo(n-1)); } BEGIN {   n = (ARGV[1] < 1) ? 1 : ARGV[1];   printf("%d\n", fibo(n));   exit; }

Use commands during computingAwk-F fib. awk n.Input hereNIs an integer. In addition, you only needFibo (N)A slight modification can be used for factorial calculation. The modified code is as follows:

Listing 4. awk script for calculating price Multiplication

function factorial(n) {  if(n<=1) return 1;  return (n*factorial(n-1)); }BEGIN {   n = (ARGV[1] < 1) ? 1 : ARGV[1];   printf("%d\n", factorial(n));   exit; }

Let's look at an example of square root. Although awk provides a function to calculate the square root, we can also implement it by writing a program. The corresponding algorithm is shown in listing 5. Listing 6 provides a specific example: calculate the square root of number 3.7:

Listing 5. Square Root Algorithm


Listing 6. Example of square root Calculation

BEGIN {   a = 3.7;   x = a; while((x**2-a)**2 > 1e-12) { x = (x + a/x)/2;}   print x }

Example 1: quickly calculate the time difference between two files

I am afraid that awk is not our best choice if we are only engaged in numerical calculations. After all, awk is designed to facilitate text processing. However, if the value calculation is closely related to the text, for example, before calculation, we need to process the data in the text (such as searching and extracting data), then the advantages of awk will be fully displayed. This situation is often encountered at work. Let's look at a practical example. Suppose we want to compare the efficiency of some parallel programs running on the Linux cluster, a feasible method is to estimate the time required for these programs to run. These programs usually run for a long period of time, from 10 hours to more than a week. Note that the program will continuously generate data files during running, and the Linux system will record the time when each data file was created (if it did not exist before) or modified (if it existed before, in this way, the efficiency of parallel programs can be estimated by calculating the time difference between two files. We know that the STAT Command provided by Linux can be used to obtain various attributes of a file. For example, the command S is used for the data file simu_space_1.dat.Tat simu_space_1.datThere will be the following output:

Listing 7. Output of the command stat simu_space_1.dat

File:  "simu_space_1.dat"   Size: 237928    Blocks: 480        IO Block: 4096   regular DateiDevice: 801h/2049dInode: 2768915     Links: 1Access: (0644/-rw-r--r--)  Uid: ( 1000/     nst)   Gid: ( 1000/     nst)Access: 2008-11-14 10:56:05.000000000 +0100Modify: 2008-11-13 23:26:44.000000000 +0100Change: 2008-11-13 23:26:44.000000000 +0100

The above output contains the keyword 'modify', which records the time when the file was modified. Therefore, in principle, the time difference between the two files can be calculated by using the STAT command to obtain their modification time. If the number of computations is small, this can be done manually. However, frequent computation takes a long time, and the probability of errors increases. In this case, we can turn to awk for help to automatically complete this computation. For this reason, we have created the following script time_df.awk:

Listing 8. awk program for Calculating Time Difference

BEGIN {  n = 0;  d1 = 0;  s1 = 0;  FS = ":|-| *";}{ for(i=1; i<=NF; i++) {   if($i~/Modify/)   {    n = n + 1;    d = $(i+4);    h = $(i+5);    m = $(i+6);    s = $(i+7);    d1 = d1 + ((-1)**n)*d*24*3600;    s1 = s1 + ((-1)**n)*(3600*h + 60*m + s);   }   }}END {  s1 = s1 + d1;  D = int(s1/(24*3600));  H = int((s1 - D*24*3600)/3600);  M = int((s1-D*24*3600-H*3600)/60);  S = s1 % 60;  printf("The total time required %d days, %d hours,   \  %d minutes and %d seconds\n", D, H, M, S) ;}

The above Code is based on the following considerations: first use awk to find the row containing the 'modify' keyword, and then extract the data about the date and time. Because it is not convenient to directly subtract the date and time, convert the date and time into a number in seconds (starting from 00:00:00 on the first day of each month ). It is easy to understand that the time difference between two numbers is also measured in seconds. For intuitive display, the time difference is expressed as days, hours, minutes, and seconds. To calculate the time difference between the simu_space_1.dat and simu_space_100.dat files, run the following command:

Listing 9. Command for calculating the file time difference

stat simu_space_1.dat simu_space_100.dat | awk -f time_df.awk

Put simu_space_1.dat (earlier) in front of the file, and put the generated file simu_space_100.dat behind it. If you want to calculate the time difference between the other two files, you just need to change the file name. With the above awk code, we can quickly and accurately obtain the time interval between any two data files. It should be noted that the above procedures did not consider cross-monthly situations. That is to say, if the first data file is generated at the end of a month, and the second file is generated at the beginning of next month, it cannot be used for calculation, because the obtained time is meaningless negative.

 

Example 2: Verify the throughput: extract data from multiple files and calculate

In this example, the fluid flux at different locations is calculated to verify if they are the same. The flux here can be regarded as the particle concentration passing through a certain section, the product of the fluid velocity and the cross section area. The problem is that parameters such as the concentration and speed are distributed in different data files, which coexist with the data. For example, the simu_space_1.dat file containing the concentration has the following format:

Listing 10. Data File simu_space_1.dat format

{0.436737, 0.429223, 3.000000, 1.000000, 43300806482080792.000000, 243231808.137785 },

{1.296425, 0.429223, 3.000000, 1.000000, 107468809895964656.000000, 584622938.047805 },

{2.128973, 0.429223, 3.000000, 1.000000, 102324821165926400.000000, 539067822.351442 },

......

{19.358569, 4.875000, 3.000000, 1.000000, 257544788738191712.000000, 1460324590.999991 },

{19.620925, 4.875000, 3.000000, 1.000000, 266676357086157504.000000, 1464352706.940682 },

{19.875000, 4.875000, 3.000000, 1.000000, 260249342336872224.000000, 1383971975.659338 },

The first step is to extract the concentration data at a certain position (the fifth number from the left of each row) from the above file. The following awk Code sets the locationX = 0.429223Extracted and saved to a temporary file number.txt:

Listing 11. Extract and save data at a specified location

awk  -F'{|,\t|},' '{for(i=1; i<NF; i++) {if($i~/0.429223/) print $(i+3)}}'  \simu_space_1.dat > number.txt

Now the file number.txt contains a column of concentration data. Then we extract the speed and area data at the same location from other files and save them to the temporary files velocity.txt and area.txt respectively. Then, combine the data in the three temporary files into another file flux.txt to facilitate awk calculation. This merge operation can be easily completed using the paste tool. The code is shown in listing 12:

Listing 12. Merge data from different files into one file

paste number.txt velocity.txt area.txt > flux.txt

Flux.txt contains three columns of data: concentration, speed, and area. According to the above-mentioned flux calculation method, the three data records in each row in flux.txt must be multiplied first, and then all the products are added together to obtain the flux through that section, for specific code, see listing 13:

Listing 13. Calculating the awk code

awk '{x=x+($1*$2*$3)} END {print x}' flux.txt

The above Code uses a variableXAt the first execution,XThe product of the three data in the first row in the flux.txt file. During the second execution, it retains the value calculated for the first time, adds the product of the three data in the second row, and so on until the total sum is reached. The end function is to display only the final result, instead of the accumulated result in the middle. We can make a comparison. In the past, we used other software (such as Excel or OpenOffice calc) to calculate the throughput. This must involve importing data and selecting a series of operations such as computing functions, while awk requires only one line of code! If we consider that the work of extracting data from different files before calculation is also completed by awk (in fact, several lines of code), we can use awk in this example to save a considerable amount of time.



Back to Top

Summary

The value calculation function of awk should not be ignored. It can complete simple to complex numerical operations. Especially when data file processing is involved in the computing process, it is very convenient to use awk. Because awk has a strong text processing function, it can easily extract data from the text and perform corresponding calculations. The examples in this article show that the flexible use of these features of awk may significantly improve our work efficiency.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.