Increase productivity with AWK's numerical computing Capabilities (reprint)

Source: Internet
Author: User
Tags mathematical functions square root

Awk is an excellent tool for text-style scanning and processing. Turn focuses on the use of awk in numerical computing, and through several practical examples, explains how to use awk's computational capabilities to improve our productivity. Turn from IBM Bluemix, Link: http://www.ibm.com/developerworks/cn/linux/l-cn-awkinwork/

Awk is an excellent tool for text-style scanning and processing. Awk is somewhat similar to sed and grep, but it is much more powerful than the latter. AWK provides features such as style loading, flow control, mathematical operators, Process Control, and many built-in variables and functions. With these features, we can easily use awk to process various files (such as data files, database files, etc.) that are produced by experiments. This paper introduces the application of awk in numerical computation, and through several practical examples, expounds how to use Awk's computational function to improve our work efficiency.

AWK basic operators, mathematical functions, and simple examples of operations

AWK supports a number of common operators, such as + (plus),-(minus), * (multiply),/(except), ^ or * * (exponentiation),% (modulo), and so on. In addition, AWK provides some common mathematical functions such as sin (x), cos (x), exp (x), log (x), sqrt (x ), Rand (). You can use these operators and functions to perform simple operations directly:

Listing 1. Use awk to do simple numerical calculations
echo | awk ' {print 19+7} ' ==> echo | awk ' {print 19-7} ' ==> 12echo | awk ' {print 19*7} ' ==> 133echo | awk ' {print 19/7} ' ==> 2.71429echo | awk ' {print 19**7} ' ==> 893871739echo | awk ' {print 19%7} ' ==> 5echo | awk ' {print atan2 (7)} ' ==> 1.21781

The above calculation can also be done with a script file Calc.awk:

Listing 2. Script File Calc.awk
{  print $ "+" $ "=" $ + $  print $ "-" $ "=" $-$ print "x" $ $  "=" $ * $ $  print "  "/" $ "=" $/$  print "^" $ "=" $ * * $  print "mod" $ "=" $ $  print "atan2 (" $ "," $ ")" "=" atan2 ($, $)}

Execute awk - f Calc.awk 7 to get the same results as in Listing 1. Here option- f allows awk to invoke and execute the program file Calc.awk ; the last 7 is the input, which corresponds to the $ and $2 >

Some of the more complex numerical calculations

Now we're using awk to do some slightly more complex calculations. We first use awk to calculate the Fibonacci sequence, and the corresponding awk program Fib.awk see listing 3:

Listing 3. Program files for calculating the Fibonacci sequence
function Fibo (n) {  if (n<=1) return 1;  Return (Fibo (n-2) + Fibo (n-1)); } BEGIN {   n = (Argv[1] < 1)? 1:argv[1];   printf ("%d\n", Fibo (n));   Exit }

Use command awk-f fib.awk n when calculating. the input n here is an integer. In addition, as long as the above program in the function Fibo (n) slightly changed, it can be used to perform factorial operations, the modified code is as follows:

Listing 4. awk script for calculating price multiplication
function factorial (n) {  if (n<=1) return 1;  Return (N*factorial (n-1)); }begin {   n = (Argv[1] < 1)? 1:argv[1];   printf ("%d\n", factorial (n));   Exit }

Let's look at an example of square root. Although awk provides a function to calculate the square root, we can also do it by writing our own program, the corresponding algorithm is shown in Listing 5, and listing 6 gives a concrete example: finding the square root of the number 3.7:

Listing 5. Algorithm for finding square root

Listing 6. Example of square root calculation
BEGIN {   a = 3.7;   x = A; while ((x**2-a) **2 > 1e-12) {x = (x + a/x)/2;}   Print X}

Example 1: Quickly calculate the time difference between two files

If we are only doing simple numerical calculations, I am afraid awk is not our best choice, after all, awk is designed for the convenience of text processing. However, if the numerical calculation is closely related to the text, for example, before calculating the data in the text (such as finding, extracting data), then the advantage of awk will be fully displayed. And this kind of situation is often encountered in the work. Let's look at a practical example. Suppose we want to compare the efficiency of some parallel programs running on a Linux cluster, one possible way is to estimate the time it takes for these programs to run. These programs usually run longer and can take from 10 hours to one weeks. Note that the program will continuously generate data files while running, and the Linux system will record the time at which each data file was created (if it did not exist before) or modified (if it existed previously), so that the efficiency of the parallel program can be estimated by calculating the difference in the two files. We know that the stat command provided by Linux can be used to obtain various properties of a file, such as the simu_space_1.dat of a data file using thetat simu_space_1.dat will have the following output:

Listing 7. Output of the command stat Simu_space_1.dat
File:  "Simu_space_1.dat"   size:237928    blocks:480        IO block:4096   Regular dateidevice:801h/ 2049dinode:2768915     links:1access: (0644/-rw-r--r--)  Uid: (1000/     NST)   Gid: (1000/     NST) Access: 2008-11-14 10:56:05.000000000 +0100modify:2008-11-13 23:26:44.000000000 +0100change:2008-11-13 23:26:44.000000000 + 0100

The above output contains the keyword ' Modify ' in a row that records the time the file was modified. So in principle, as long as the two files using the Stat command, get their modification time, you can calculate the difference between them. If the number of calculations is very small, this work can of course be done by hand. But it takes time to calculate frequently, and the odds of making a mistake can get bigger. In this case we can turn to awk and let it do the computation automatically, so we created the following script Time_df.awk:

Listing 8. awk program to calculate the time difference
BEGIN {  n = 0;  D1 = 0;  S1 = 0;  FS = ": |-| *";} {for (i=1; i<=nf; i++) {   if ($i ~/modify/)   {    n = n + 1;    D = $ (i+4);    H = $ (i+5);    m = $ (i+6);    s = $ (i+7);    D1 = D1 + (( -1) **n) *d*24*3600;    S1 = S1 + (( -1) **n) * (3600*h + 60*m + s);}}}   END {  S1 = s1 + d1;  D = Int (s1/(24*3600));  H = Int ((s1-d*24*3600)/3600);  M = Int ((s1-d*24*3600-h*3600)/60);  S = s1%;  printf ("The total time required%d days,%d hours,     %d minutes and%d seconds\n", D, H, M, S);}

The above code is based on the following considerations: First use awk to find the line containing the ' Modify ' keyword, and then extract the data about the date and time. Since it is not convenient to subtract the date and time directly, convert the date and time to a number in seconds (starting from 0:0 0 seconds on the first day of each month). It is easy to understand that the time difference obtained by subtracting two numbers is also in seconds. In order to be able to visualize, the output of this time difference expressed as days, hours, minutes and seconds. To calculate the time difference between the two files Simu_space_1.dat and Simu_space_100.dat, you can use the following command:

Listing 9. command to calculate the file time difference
Stat Simu_space_1.dat Simu_space_100.dat | Awk-f Time_df.awk

Mr. Cheng's file Simu_space_1.dat (that is, earlier) is placed in front, and the resulting file Simu_space_100.dat is placed behind. If you want to calculate the time difference between the other two files, just change the file name. With the awk code above, we can quickly and accurately get the time interval for any two data files. It should be noted that the above procedure does not take into account the situation across the month. That is, if the first data file is generated at the end of a month, and the second file is generated at the beginning of the next month, then it cannot be calculated because the resulting time is a negative number with no meaning.

Example 2: Verify throughput: Extract data from multiple files and calculate

This example verifies that they are the same by calculating the flux of the fluid at different locations. The flux here can be considered as the product of the particle concentration, the velocity of the fluid and the cross-sectional area of a section. The problem now is that the concentration, velocity and other parameters are distributed in different data files, which coexist with characters and data, such as a file containing a concentration of simu_space_1.dat in the following format:

Listing 10. Format of the data file Simu_space_1.dat

{0.436737, 0.429223, 3.000000, 1.000000, 43300806482080792.000000, 243231808.137785},

{1.296425, 0.429223, 3.000000, 1.000000, 107468809895964656.000000, 584622938.047805},

{2.128973, 0.429223, 3.000000, 1.000000, 102324821165926400.000000, 539067822.351442},

......

{19.358569, 4.875000, 3.000000, 1.000000, 257544788738191712.000000, 1460324590.999991},

{19.620925, 4.875000, 3.000000, 1.000000, 266676357086157504.000000, 1464352706.940682},

{19.875000, 4.875000, 3.000000, 1.000000, 260249342336872224.000000, 1383971975.659338},

The first step, of course, is to extract the concentration data (the fifth number from the left of each line) from the above file in a location. The following awk code extracts the concentration at position x = 0.429223 and saves it to a temporary file Number.txt:

Listing 11. Extracts the data at the specified location and saves
Awk-  f ' {|,\t|}, ' {for (i=1; i<nf; i++) {if ($i ~/0.429223/) print $ (i+3)}} '  Simu_space_1.dat > Number.txt

Now there is a column of concentration data in the file Number.txt. We then extract the velocity and area data from the other files in the same location, and then save them to the temporary files Velocity.txt and area.txt respectively. The data in the three temporary files is then merged into another file flux.txt to facilitate the calculation of awk. This merge operation can be done easily with the tool paste, as shown in Listing 12:

Listing 12. Merging data from different files into a single file
Paste Number.txt velocity.txt area.txt > Flux.txt

The Flux.txt now contains three columns of data, namely concentration, velocity and area. According to the flux calculation method described earlier, the three data of each row in the file flux.txt is multiplied first, then all the products are added together to get the flux through that section, the specific code is shown in Listing 13:

Listing 13. The awk code that calculates the throughput
awk ' {x=x+ ($1*$2*$3)} END {print x} ' Flux.txt

The above code uses a variable x, the first time it executes, the x is given the product of the first row of three data in the file Flux.txt. The second time it is executed, it retains the value of the first calculation plus the product of the second row of three data, and so on, until the cumulative summation is reached. End's function is to show only the final result, not the intermediate cumulative result. We can make a comparison, previously we used other software (such as Excel or OpenOffice Calc) to calculate the throughput. This inevitably involves importing data, selecting the appropriate calculation function, and so on, and using awk as long as one line of code! If you consider that the work of extracting data from different files before computing is also done by awk (which is actually a few lines of code), this saves a considerable amount of time using awk for this example.

Summarize

Awk's numerical computations should not be overlooked, which can be done from simple to more complex numerical operations. Awk is often handy when it comes to the processing of data files in the calculation process. Because awk itself has a strong text-processing capability, it can easily separate data from text and then calculate accordingly. The examples in this article show that it is possible to significantly improve our productivity if you have the flexibility to use these features of awk.

Increase productivity with AWK's numerical computing Capabilities (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.