Variable scope issues in awk (awk modularization, supplemented information added by The Transcoder)

Source: Internet
Author: User
Tags define local variable scope

Http://www.ibm.com/developerworks/cn/linux/l-cn-awkf/

Wen Quan (saphires@sohu.com), software engineer, Jie Si Rui Technology (Beijing) Co., Ltd.

Introduction:This article introduces the global variable pollution in awk starting with two incorrect routines and analyzes the causes. Next, based on the features of the awk variable scope, two common methods are proposed to avoid global variable pollution, the methods for defining local variables in awk are introduced, and the modifiedCode. Then, through the variable debugging function of awk, the shortcomings of the modified Code are put forward, and some points of attention should be paid to writing general functions. Finally, we will give a brief description of the methods that awk contains header files, and advocate more scientific and effective use of awk as a text processing tool.ArticleSome common awk reference documents are provided at the end of this Article for your reference.

In most scenarios such as C and PHPProgramIn the language, variables declared in the function will automatically become local variables, the life cycle of the variables is only in the function execution period, and the variables are automatically destroyed when the function returns. In most cases, we use global variables without worrying about the scope of variables. Among dozens of hundreds of rows of small scripts, the first-line scripts are indeed understandable, and there is no cross-reference of variables caused by function calls, although there will be no global variable pollution. With the expansion of the script scale, awk scripts will be written in a structured and modular manner, and user-defined functions will become commonplace. Maybe a small feature is added one day and the result is not satisfactory, but this code is so clear that no one will doubt its correctness. It must be historic to judge the problem, but debugging is obviously time-consuming and laborious. I have spent nearly half a day on this issue. After suffering from debugging, I noticed the variable scope problem in awk. These lessons are summarized here for your reference.

Variable contamination

Let's look at the situation of variable pollution through two routines:

Listing 1. fac1.awk prints a factorial from 1 to 10

1 #2 # fac1.awk 3 # original version of fac1.awk 4 #5 6 function factorial (n) 7 {8 S = 1; 9 10 for (I = 1; I <= N; I ++) 11 {12 S * = I; 13} 14 15 return s; 16} 17 18 {19 for (I = 1; I <= 10; I ++) 20 {21 value = factorial (I); 22 printf ("FAC (% d) = % d \ n", I, value ); 23} 24} 25 run and view the result: [Robert @ saphires awk_var] $ echo "" | awk-F fac1.awkfac (2) = 1fac (4) = 6fac (6) = 120fac (8) = 5040fac (10) = 362880

 

Only the factorial values of 2, 4, 6, 8, and 10 are displayed, and the result is incorrect. Obviously there is a problem with the program execution process. Why? This program is relatively simple. After simple analysis, we can find that the global variable I is overwritten in the User-Defined Function factorial (), which affects the workflow of the program and results in exceptions.

In the above example, circular variables such as I are affected, which is the most typical example of global variables being contaminated. In addition, variable contamination may also occur in some recursive functions. Let's take a look at the following routine:


Listing 2. fac2.awk prints the factorial from 1 to 10 (the function is implemented recursively)

1 #2 # fac2.awk 3 # original version of fac2.awk 4 #5 6 function factorial (n) 7 {8 If (n = 1) 9 {10 I = 1; 11 return I; 12} 13 else 14 {15 I = factorial (n-1) * n; 16 return I; 17} 18} 19 20 {21 for (I = 1; I <= 10; I ++) 22 {23 value = factorial (I); 24 printf ("FAC (% d) = % d \ n", I, value ); 25} 26} 27

 

Run and view the result:

[Robert @ saphires awk_var] $ echo "" | awk-F fac2.awkfac (1) = 1fac (2) = 2fac (6) = 6fac (5040) = 5040

 

The results are also strange. The reason is that global variable I is contaminated. Although the above program is a little stiff, I is generally not used in the function to handle the return value of the function, but the exposed problem is the problem of global variables being contaminated in awk.

The cause of the above two routine problems is global variable pollution, but what is the cause of global variable pollution? As mentioned in the gawk user manual, functions are not supported in the traditional awk, and the program is interpreted and executed in sequence. The variables that appear for the first time are initialized and referenced in future code. We currently use gawk in Linux, which is an extension of awk. In the original awk variable processing principle, gawk references a user-defined function and changes the original sequence execution process, that is, the Code may jump. In awk without local variable implementation, variable contamination occurs. Next, we will analyze how to avoid global variable pollution caused by historical reasons.

 

How to Avoid variable pollution

The preceding two Error Examples briefly describe the common situation that may easily cause variable pollution in awk. So what methods can be used to prevent this situation? The most stupid way is to make the variables used in the function do not have the same name as the global variables. I used this stupid method a long time ago. For "local variables" in each function, they all start with a custom function name abbreviation, to prevent global variable name conflicts. To some extent, this is indeed an effective method. However, logically, this method is by no means a panacea.

The ideal solution is to define local variables in the function, which is also the easiest way to think. In C language, the variables defined in the function are automatically converted into local variables and destroyed at the end of the function. In bash, although global variable pollution is similar to that in awk, you can use the local keyword to declare local variables in the function to avoid global variable pollution. The question of how to define local variables in the awk seems not so obvious, at least there is no clue about local variables in the "variables" section of the gawk manual. It is very interesting that, although most people first think of this solution, they eventually go back to the first solution. When I reread sed & awk in my spare time, I found that: "awk provides a poor way to define local variables, that is, through the function parameter list." Soon after, I found a similar article in the "user-defined functions" section in the gawk Manual: "because the original awk does not support functions, the implementation of local variables in awk is rather clumsy. It is implemented by defining additional parameters for the function. By convention, add a few spaces after a real parameter to separate the real parameter from the local variable declaration ."

After solving the local variable definition problem, let's go back to the problematic program. Taking fac1.awk as an example, let's look at the modified Code:


Listing 3. fac1-2.awk traditional local variable definitions

1 #2 # fac1-2.awk 3 # version 0.2 of fac1.awk 4 #5 6 function factorial (n, I) 7 {8 S = 1; 9 10 for (I = 1; I <= N; I ++) 11 {12 S * = I; 13} 14 15 return s; 16} 17 18 {19 for (I = 1; I <= 10; I ++) 20 {21 value = factorial (I); 22 printf ("FAC (% d) = % d \ n", I, value ); 23} 24} 25

 

Run and view the result:

[Robert @ saphires awk_var] $ echo "" | awk-F fac1-2.awkfac (1) = 1fac (2) = 2fac (3) = 6fac (4) = 24fac (5) = 120fac (6) = 720fac (7) = 5040fac (8) = 40320fac (9) = 362880fac (10) = 3628800

 

Yes, it's that simple, and the problem is solved! As mentioned in the manual, this is a historical issue. defining local variables in a function using form parameters is only a compromise solution. For those who do not know the details, the original function becomes a Variable Parameter Function. For the above example, you can call either factorial (n) or factorial (n, I). Readers or maintainers who do not understand the code may be in trouble. Although there are two parameters in the function definition, the second one is actually not expected to be referenced in the function call.

I prefer to add a parameter named _ argvend _ after a normal parameter compared to the space-adding method written in the book, which indicates that the parameters required for a normal call end here, all the parameters marked here are "false", which is actually just the definition of a local variable. God knows who will see such a function with multiple space-separated parameters and complain that "this is a function declaration written by a lame programmer" and then removes these spaces and even references them when calling the function. With this identifier, the real "lame programmer" will at least think about why there is a _ argvend _ here? Of course, you can use this logo any one you like, but one thing you must note is that tradition is a tradition. Before the method proposed by the author becomes a tradition, you must know what the parameters between multiple spaces are, so as not to make yourself a real "lame programmer ". The modified code is as follows:


Listing 4. Definitions of local variables for fac1-3.awk author

1 #2 # fac1-3.awk 3 # version 0.3 of fac1.awk 4 #5 6 function factorial (n, _ argvend _, I) 7 {8 S = 1; 9 10 for (I = 1; I <= N; I ++) 11 {12 S * = I; 13} 14 15 return S; 16} 17 18 {19 for (I = 1; I <= 10; I ++) 20 {21 value = factorial (I); 22 printf ("FAC (% d) = % d \ n ", I, value); 23} 24} 25

 

Run and view the result:

[Robert @ saphires awk_var] $ echo "" | awk-F fac1-3.awkfac (1) = 1fac (2) = 2fac (3) = 6fac (4) = 24fac (5) = 120fac (6) = 720fac (7) = 5040fac (8) = 40320fac (9) = 362880fac (10) = 3628800

 

Back to Top

Debug global variables

We talked about the conditions and solutions for variable pollution. Here is a simple debugging method for related problems, that is, the-dump-variables parameter of awk, it can print all global variables after the program runs to a text file for debugging. Let's take a look at the following example, the output of the execution of the fac1-3.awk:

[Robert @ saphires awk_var] $ echo "" | awk-F fac1-3.awk -- dump-variables =/tmp/var. dumpfac (1) = 1fac (2) = 2fac (3) = 6fac (4) = 24fac (5) = 120fac (6) = 720fac (7) = 5040fac (8) = 40320fac (9) = 362880fac (10) = 3628800 [Robert @ saphires awk_var] $ CAT/tmp/var. dumpargc: Number (1) argind: Number (0) argv: array, 1 elementsbinmode: Number (0) convfmt: string ("%. 6g ") errno: Number (0) fieldwidths: string (" ") filename: string ("-") FNR: Number (1) FS: string (" ") ignorecase: number (0) Lint: Number (0) NF: Number (0) Nr: Number (1) ofmt: string ("%. 6g ") ofs: string (" ") ors: string (" \ n ") rlength: Number (0) RS: string (" \ n ") rstart: number (0) RT: string ("") subsep: string ("\ 034") textdomain: string ("messages") I: Number (11) S: Number (3628800) value: Number (3628800)

 

We can see that the S variable in the factorial () function is actually treated as a global variable, although the fac1-3.awk running results have fully met our functional needs, however, if we port this function named factorial () to another code that is not written by us, do we know if it will be mixed with other global variables named S to cause variable pollution? If that day really comes, it may take several hours for us to find out the source of pollution. How can we avoid such a disaster as soon as possible? Of course, S is also defined as a local variable. The example is as follows:


Listing 5. fac1-4.awk Final Version

1 #2 # fac1-4.awk 3 # version 0.4 of fac1.awk 4 #5 6 function factorial (n, _ argvend _, I, S) 7 {8 S = 1; 9 10 for (I = 1; I <= N; I ++) 11 {12 S * = I; 13} 14 15 return S; 16} 17 18 {19 for (I = 1; I <= 10; I ++) 20 {21 value = factorial (I); 22 printf ("FAC (% d) = % d \ n ", I, value); 23} 24} 25

 

Then let's look at vardump of the fac1-4.awk:

[Robert @ saphires awk_var] $ echo "" | awk-F fac1-4.awk -- dump-variables =/tmp/var. dumpfac (1) = 1fac (2) = 2fac (3) = 6fac (4) = 24fac (5) = 120fac (6) = 720fac (7) = 5040fac (8) = 40320fac (9) = 362880fac (10) = 3628800 [Robert @ saphires awk_var] $ CAT/tmp/var. dump ...... (Omitted) I: Number (11) value: Number (3628800)

 

From the above results, we can see that after the program runs, there are only two global variables I and value that we think should exist. In this way, the factorial () function can basically be transplanted to other awk scripts as a general function. Through the debugging output of the awk global variables, we also come up with a principle to note when writing the awk function: as long as it is a local variable, it should be defined in the parameter list, only in this way can we completely avoid global variable pollution.

Use included files

The factorial () function in the fac1-4.awk above can already be considered a safe awk function. Can awk applications contain other source files like # include in C language or source in Bash? The answer is yes. Let's save the factorial () function to the fac-lib.awk first:


Listing 6. fac-lib.awk awk function library

1 #2 # library for awk 3 #4 5 function factorial (n, _ argvend _, I, S) 6 {7 S = 1; 8 9 for (I = 1; I <= N; I ++) 10 {11 S * = I; 12} 13 14 return s; 15} 16

 

    1. One method is to reference multiple awk scripts. This method does not need to contain any mark related to the source file:

Listing 7. fac3.awk does not contain the main program of the awk function library

1 #2 # fac3.awk 3 # original version of fac3.awk 4 #5 6 {7 for (I = 1; I <= 10; I ++) 8 {9 value = factorial (I); 10 printf ("FAC (% d) = % d \ n", I, value); 11} 12} 13

 

Run and view the result:

[Robert @ saphires awk_var] $ echo "" | awk-F fac-lib.awk-F fac3.awkfac (1) = 1fac (2) = 2fac (3) = 6fac (4) = 24fac (5) = 120fac (6) = 720fac (7) = 5040fac (8) = 40320fac (9) = 362880fac (10) = 3628800.

 

The following method is more likeSource codeInclude, but for awk scripts containing functions, we need to use igawk for execution. Igawk is actually just a script. It analyzes the @ include flag in the awk script at runtime, merges the files included in @ include into the @ include row of the current script, and then implements the interpretation and execution. On the one hand, the so-called inclusion function of awk is also so unauthentic, on the other hand, the extension of this inclusion function is also completed by awk, very interesting. The following describes how to use igawk to run the awk script containing the function library.

Listing 8. fac3-2.awk contains the main program of the awk function library

1 #2 # fac3-2.awk 3 # original version of fac3-2.awk 4 #5 6 @ include fac-lib.awk 7 8 {9 for (I = 1; I <= 10; I ++) 10 {11 value = factorial (I); 12 printf ("FAC (% d) = % d \ n", I, value); 13} 14} 15

 

Run and view the result:

 [Robert @ saphires awk_var] $ echo "" |  igawk -F fac3-2.awkfac (1) = 1fac (2) = 2fac (3) = 6fac (4) = 24fac (5) = 120fac (6) = 720fac (7) = 5040fac (8) = 40320fac (9) = 362880fac (10) = 3628800 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.