You may be familiar with Unix, but you may be unfamiliar with awk, which is not surprising. Indeed, compared with its excellent functions, awk is far from its proper popularity. What is awk? Unlike most other UNIX commands, we cannot know the functions of awk in terms of names: it is neither an independent English word nor an abbreviation of several related words. In fact, awk is short for three people: Aho, (Peter) weberger and (brain) kernighan. The three created awk, an excellent style scanning and processing tool.
What is awk's function? Like SED and grep, awk is a style scanning and processing tool. However, the function is much stronger than SED and grep. Awk provides extremely powerful functions: it can almost complete all the work that grep and sed can do. At the same time, it can also perform style loading, flow control, number operators, process control statements, and even built-in variables and functions. It has almost all the exquisite features of a complete language. In fact, awk does have its own language: awk Program Design language. The three creators of awk have officially defined it as style scanning and processing language.
2. Why is awk used?
Even so, you may still ask why I want to use awk?
The first reason for using awk is that text-based style scanning and processing is what we often do. What awk does is a bit like a database, but unlike a database, it processes text files. These files do not have a special storage format, so ordinary people can edit, read, understand, and process them. Database files often have special storage formats, so that they must be processed by database processing programs. Since we often encounter this kind of database-like processing work, we should find a simple and easy way to process them. Unix has many tools in this field, for example, sed, grep, sort, and find, awk is an excellent one.
The second reason for using awk is that awk is a simple tool. Of course, this is relative to its powerful functions. Indeed, Unix has many excellent tools, such as the Unix natural development tool C language and its continuation of C ++ is very good. However, compared with them, it is much easier and easier for awk to complete the same function. The first reason is that awk provides solutions to meet a variety of needs: From awk command lines that solve simple problems to complex and sophisticated awk programming languages. The advantage of doing so is that, you don't have to use complicated methods to solve simple problems. For example, you can use a command line to solve simple problems, but C is not good. Even a simple program, the C language must be compiled and compiled throughout the process. Secondly, the awk itself is interpreted and executed, which makes the awk program not to be compiled, but also makes it fit well with the shell script program. Finally, awk itself is relatively simple in C language. Although awk absorbs many excellent components of C language, it will be of great help to learn awk, however, awk itself does not need to use C language-a development tool that is powerful but requires a lot of time to learn its skills.
The third reason for using awk is that awk is an easy-to-obtain tool. Unlike C and C ++, awk has only one file (/bin/awk), and almost every version of UNIX provides its own version of awk, you don't have to bother thinking about how to get awk. But this is not the case with C language. Although C language is a natural Unix development tool, it is released independently. In other words, you must pay for your C language development tool for your UNIX version (except for those who use the D version), obtain and install it, and then you can use it.
For the above reasons, coupled with the powerful features of awk, we have reason to say that if you want to deal with the work related to text style scanning, awk should be your first choice. Here is a general principle that can be followed: If you have difficulty using a common shell tool or shell script, try awk. If awk still cannot solve the problem, use C language, if the C language still fails, move it to C ++.
3. awk call Method
As mentioned earlier, awk provides different solutions to meet various needs. They are:
I. awk command line: You can use awk like a Common Unix Command. You can also use the awk programming language in the command line. Although awk supports multi-line input, however, inputting a long command line and ensuring that it is correct is a headache. Therefore, this method is generally only used to solve simple problems. Of course, you can also reference awk command lines or even awk program scripts in shell script programs.
2. Use the-F option to call the awk program. Awk allows an awk program to be written into a text file, and then the program is called and executed using the-F option in the awk command line. The specific method is described in the awk syntax.
3. Use the command interpreter to call the awk program: using the command interpreter function supported by UNIX, we can write an awk program into a text file, and then add the following to its first line :#! /Bin/awk-F and grant this text file the execution permission. After doing so, you can call and execute this awk program in the command line in a way similar to the following.
Awk script text name: file to be processed
4. awk Syntax:
Like other UNIX commands, awk has its own Syntax:
Awk [-f re] [parameter...] ['prog'] [-F progfile] [in_file...]
Parameter description:
-F re: Allows awk to change its field separator.
Parameter: this parameter helps assign values to different variables.
'Prog': Specifies the Program Statement segment of the awk. The statement segment must be enclosed by the single extension number 'and' to prevent shell interpretation. The standard format of this Program Statement segment is:
'Pattern{ action }'
The pattern parameter can be any of the regular expressions of egrep. It can be made up of the syntax/RE/and some style matching techniques. Similar to SED, you can also use "," to separate the two formulas to select a certain range. For details about the matching, refer to the appendix. If you still don't understand it, find a Unix book to learn grep and sed (I learned the matching technology when learning ed ). The action parameter is always enclosed by braces. It consists of a system awk Statement, which is separated. Awk interprets them and performs their operations on the records that match the pattern given. Similar to shell, you can also use "#" as the annotator to make the content from "#" to the end of the line into comments, which will be ignored during interpretation. You can omit either pattern or action, but not both. If pattern is omitted, no style match exists, indicating that all rows (Records) are operated, if action is omitted, the default operation is executed. The default operation is displayed on the standard output.
-F progfile: Allows awk to call and execute progfile to specify a program file. Progfile is a text file that must comply with awk syntax.
In_file: the input file of the awk. awk allows processing of multiple input files. It is worth noting that awk does not modify the input file. If no input file is specified, awk accepts the standard input and displays the result on the standard output. Awk supports input/output redirection.
5. awk records, fields, and built-in variables:
As mentioned above, the awk processing is similar to the database processing method. One of the similarities is that awk supports processing records and fields, the processing of fields is not implemented by grep and SED, which is one of the reasons why awk is better than both. In awk, by default, a row in a text file is always regarded as a record, and a part of a row is used as a field in the record. In order to operate these different fields, awk uses the shell method to represent different fields in the row (record) in sequence in the form of 1, 2, 3. In particular, awk uses 0 to represent the entire row (record ). Different fields are separated by characters called delimiters. The default Delimiter is space. Awk allows you to change the Separator in the form of-f re in the command line. In fact, awk uses a built-in variable FS to remember this separator. Awk has several such built-in variables, for example, record the delimiter variable RS, the number of records currently working NR, etc. The appendix below this article lists all the built-in variables. These built-in variables can be referenced or modified in the awk program. For example, you can use the NR variable to specify the work scope in the pattern matching, you can also modify the record delimiter rs to set a special character instead of a line break as the record separator.
For example, the first field, third field, and seventh field, separated by characters %, between the seventh row and the seventh row of the myfile text file are displayed:
Awk-F % 'nr = 7, Nr = 15 {printf 1 3 7 }'
6. built-in functions of awk
One of the reasons why awk has become a good programming language is that it has absorbed many advantages of some excellent programming languages (such as C. One of these advantages is the use of built-in functions. awk defines and supports a series of built-in functions. Thanks to the use of these functions, awk provides more comprehensive and powerful functions, for example, awk uses a series of built-in functions for string processing (these functions seem similar to string processing functions in C, and their usage is similar to those in C ), it is precisely because of the use of these built-in functions that make awk more powerful in processing strings. The appendix below contains the built-in functions provided by the general awk. These built-in functions may differ from your awk version. Therefore, before using them, it is best to refer to the online help in your system.
As an example of a built-in function, we will introduce the printf function of awk here. This function enables C Language . In fact, many references in awk are borrowed from the C language. If you are familiar with the C language, you may remember the printf function. The powerful format output function provided by printf has brought us a lot of convenience. Fortunately, we have reunited with awk again. Printf in awk is almost the same as that in C language. If you are familiar with C language, you can use printf in awk in C language mode. So here, we only provide one example. If you are not familiar with it, please refer to a C language entry book.
For example, the row number and field 3rd in the myfile file are displayed:
Awk '{printf "% 03d % s \ n", NR, 1}' myfile
7. Use awk in the command line
In order, we should explain the content of the awk program design. However, we will use some examples to review the previous knowledge, these examples are all used in the command line, so we can know how convenient it is to use awk in the command line. The reason for doing so is to pave the way for the following content, and introduce some methods to solve simple problems, there is no need for us to solve simple problems in a complicated way-since awk provides a simpler method.
For example, display all rows of the text file mydoc match (containing) string "Sun.
Awk '/Sun/{print}' mydoc
Because displaying the entire record (full row) is the default action of awk, the action item can be omitted.
Awk '/Sun/' mydoc
For example, the following is a complex matching example:
Awk '/[ss] UN/,/[mm] oon/{print}' myfile
It will display the rows between the first row that matches sun or sun and the first row that matches moon or moon, and display them to the standard output.
For example, the following example shows the use of built-in variables and built-in function length:
Awk 'length (0)> 80 {print Nr} 'myfile
The command line will display all the lines with more than 80 characters in the text myfile. Here, 0 indicates the entire record (line), and the built-in variable Nr does not use the flag ''.
Example: as a more practical example, we assume that we want to perform security checks on users in UNIX by examining the passwd file in/etc and checking the passwd field (second field) if it is "*" or not, if it is not "*", it indicates that the user has not set a password and the user names (the first field) are displayed ). We can use the following statement:
# Awk-F: '2 = "" {printf ("% s no password! \ N ", 1'/etc/passwd
In this example, the field separator of the passwd file is ":". Therefore, you must use-F: to change the default field separator. This example also involves the use of the built-in function printf.
8. awk Variables
Like other programming languages, awk allows variables to be set in programming languages. In fact, the function of providing variables is the requirement of programming languages, I have never seen any programming language that does not provide variables.
Awk provides two types of variables. One is the built-in variables of awk. As we have mentioned earlier, we must note that, unlike other variables mentioned later, the built-in variables referenced in the awk program do not need to use the flag "(recall the use of NR mentioned above ). Another variable provided by awk is a custom variable. Awk allows users to define and call their own variables in awk program statements. Of course, this type of variable cannot be the same as the built-in variable and other reserved words of awk. to reference a custom variable in awk, you must add a flag "" before it "". Unlike C, awk does not need to initialize variables. awk determines its specific data type based on its first appearance in the form and context in awk. When the variable type is unknown, awk is a string type by default. Here is a tip: If you want your awk program to know the explicit type of the variable you are using, you should assign the initial value to it in the program. This technique will be used in subsequent instances.
Calculation and judgment:
As one of the features of a programming language, awk supports a variety of operations, which are the same as those provided by C: +,-, *,/, %, etc. At the same time, awk also supports functions similar to ++, --, + =,-=, = +, and =-in C, this makes it very convenient for users familiar with the C language to write awk programs. As an extension of the computing function, awk also provides a series of built-in computing functions (such as log, sqr, cos, sin, and so on) and some for string operations (operations) (such as length and substr ). The reference of these functions greatly improves the awk operation function.
As part of the conditional transfer instruction, relational judgment is a function of each programming language, and awk is no exception. Multiple tests are allowed in awk, such as commonly used = (equal ),! = (Not equal to),> (greater than), <(less than),> = (greater than or equal to),> = (less than or equal to), and so on, as style matching, also provides ~ (Matched) and !~ (Mismatched) judgment.
As an extension of the test, awk also supports the use of logical operators :! (Not), & (and), | (OR), and parentheses () for multiple judgments, which greatly enhances the awk function. The appendix of this article lists the operations, judgments, and operator priorities allowed by awk.
9. awk Process Control
Flow Control statements are indispensable to any programming language. Any good language has some statements that execute flow control. Awk provides a complete flow control statement similar to the C language, which brings great convenience to programming.
1. begin and end:
In awk, there are two special expressions, begin and end, both of which can be used in pattern (refer to the previous awk syntax ), the function of providing begin and end is to give the program an initial state and execute some scanning work after the program ends. Any operations listed after begin (within {}) will be executed before awk starts scanning input, and operations listed after end will be executed after scanning full input. Therefore, begin is usually used to display variables and preset (initialization) variables, and end is used to output the final result.
For example, the sales amount in the cumulative sales file Xs (assuming that the sales amount is in the third field of the record ):
Awk> 'begin{ FS = ":"; print "sales amount Statistics"; Total = 0 }>{ print 3; Total = total + 3 ;}> end {printf "total sales amount: %. 2f ", total} 'sx (Note:> is the second prompt provided by shell. to wrap a line in the shell awk statement and awk language, add a backslash \ at the end of the line \)
Here, begin provides the internal variable FS (field separator) and the custom variable total, and displays the output line header before scanning. The end command prints the total sum after scanning.
2. The process control statement awk provides a complete process control statement, which is similar to the C language. Here we will explain one by one:
2.1. If... else statement:
Format: If (expression) Statement 1 else Statement 2
In the format, "Statement 1" can be multiple statements. If you want to facilitate awk judgment and read it yourself, you 'd better include multiple statements in. The awk branch structure can be nested in the following format:
If (expression 1) {If (expression 2) Statement 1 else Statement 2} Statement 3 else {If (expression 3) Statement 4 else Statement 5} Statement 6
Of course, you may not use such a complicated branch structure in the actual operation process. Here we just want to give its style.
2.2 While statement
Format:
While (expression) Statement
2.3 do-while statement
Format:
Do {statement} while (condition judgment Statement)
2.4. For statement
Format:
For (initial expression; termination condition; Step expression) {statement}
You can use the break and continue statements in the while, do-while, and for statements of the awk to control the process and exit using the exit statement. The break interrupts the current loop and jumps out of the loop to execute the next statement. Continue is executed from the current position to the beginning of the loop. There are two conditions for exit execution: When the exit statement is not in the end, the exit command in any operation performs as to the end of the file, and the execution of all modes or operations will stop, the operation in end mode is executed. The exit that appears in the end will cause the program to terminate.
For example,
Custom functions in awk
Defining and calling your own functions is a feature of almost every advanced language. awk is no exception, but the original awk does not provide function functions, functions can be added only in nawk or newer awk versions.
Function usage includes two parts: Function Definition and function call. The function definition includes Code (Function itself) and temporary calls that pass from the main program code to the function.
The definition of the awk function is as follows:
Function Name (parameter table) {function body}
In gawk, function is allowed to be omitted to func, but awk of other versions is not allowed. The function name must be a valid identifier and cannot be provided in the parameter table (however, the parentheses after the function name are still indispensable when calling the function ), you can also provide one or more parameters. Similar to the C language, awk parameters are passed through values.
The method for calling a function in awk is similar to that in C, but awk is more flexible than C, and does not perform parameter validity check. In other words, when you call a function, you can list more or less parameters than the expected function (defined in the function definition). Excessive parameters will be ignored by awk, awk sets them to the default value 0 or an empty string. The specific value depends on how the parameter is used.
The awk function has two return methods: Implicit return and explicit return. When awk is executed to the end of the function, it automatically returns to the calling program, which is implicitly returned by the function. If you need to exit the function before the end, you can use the return statement to exit the function in advance. The method is to use a statement in the format of return in a function.
For example, the following example shows how to use a function. In this example, a function named print_header is defined. This function calls two parameters: filename and pagenum. the filename parameter is passed to the currently used file name of the function. The pagenum parameter is the page number of the current page. This function is used to print (Display) the file name of the current file and the page number of the current page. After this function is completed, the page number of the next page is returned.
Nawk> 'in in {pageno = 1; file = FILENAME> pageno = print_header (file, pageno); # Call the print_header function> printf ("the current page number is: % d \ n ", pageno) ;>}
> # Define the function print_header> function print_header (filename, pagenum) {> printf ("% S % d \ n", filename, pagenum);> pagenum ++; return pagenum; >}>} 'myfile
Executing this program will display the following content:
Myfile 1 the current page number is: 2
Awk advanced Input and Output
1. Read the next record:
The next statement of the awk causes the awk to read the next record and complete the pattern matching, and then immediately perform the corresponding operation. It usually uses the matching mode to execute the code in the operation. Next causes any extra matching modes of this record to be ignored.
2. Simply read a record
The Getline Statement of awk is used to read a record. If you have a data record similar to two physical records, Getline is particularly useful. It separates general fields (sets the field variable 0 fnr nf nr ). If the operation succeeds, 1 is returned. If the operation fails, 0 is returned (to the end of the file ). To simply read a file, you can write the following code:
Example: Use of Getline
{While (Getline = 1) {# process the inputted fields }}
You can also use Getline to store input data in a field, instead of processing common fields in the form of Getline variable. When this method is used, NF is set to 0, and FNR and NR are added.
You can also use Getline <"FILENAME" to input data from a given file, rather than from the content listed in the command line. In this case, Getline completes the general field separation (set the field variables 0 and NF ). If the object does not exist,-1 is returned, success is returned, 1 is returned, and 0 is returned, indicating failure. You can read data from a given file to a variable, or use stdin (standard input device) or a variable containing the file name to replace filename. It is worth noting that FNR and NR are not modified when this method is used.
Another way to use the Getline statement is to accept input from Unix commands, for example, the following example:
For example, input is accepted from Unix commands.
{While ("who-U" | Getline) {# process each line from the WHO command }}
You can also use the following format:
"Command" | Getline variable
3. close the file:
Awk allows you to close an input or output file in a program by using the close Statement of awk.
Close ("FILENAME"
Filename can be a file opened by Getline (or stdin, a variable containing the file name or the exact command used by Getline ). Or an output file (it can be stdout, a variable containing the file name or an exact command using the pipeline ).
4. output to a file:
Awk allows output of results to a file as follows:
Printf ("Hello word! \ N ">" datafile "or printf (" Hello word! \ N ">" datafile"
5. output to a command
Awk allows the following method to output the result to a command:
Printf ("Hello word! \ N "|" sort-t ','"
Hybrid programming of awk and shell script
Because awk can be used as a shell command, awk can be well integrated with the shell Batch Processing Program, which provides the possibility of implementing hybrid programming between awk and Shell programs. The key to implementing mixed programming is the dialogue between awk and shell script, that is, information exchange between awk and shell script: awk obtains the required information (usually the value of the variable) from the shell script, executes the shell command line in awk, and the shell script sends the command execution result to awk for processing and shell scri.