Awk is a very good language and has a very strange name. In this series of articles, Danielrobbins will give you a quick grasp of awk programming skills. As the series progresses, more advanced topics will be discussed, and a true high-level awk demo will be demonstrated.
defending awk
In this series of articles, I'll make you a coder who is proficient in awk. I admit that awk does not have a very nice and very "fashionable" name. The GNU version of awk (called gawk) sounds very bizarre. People who are unfamiliar with the language may have heard of "awk" and may think of it as a set of outdated and outdated chaotic codes. It would even put the most knowledgeable UNIX authority on the verge of insanity (making him keep making "kill-9!" orders, like using a coffee machine).
Indeed, awk does not have a catchy name. But it's a great language. Awk is suitable for text processing and report generation, and it has a number of well-designed features that allow you to program programs that require special skills. Unlike some languages, awk's syntax is more common. It draws on some of the best parts of some languages, such as C, Python, and bash (although technically, Awk was created earlier than Python and bash). Awk is the language that will become the main part of your strategic code library once you've learned it.
First awk
Let's go ahead and start using awk to understand how it works. At the command line, enter the following command:
$ Awk ' {print} '/etc/passwd
|
You will see the contents of the/etc/passwd file appear in front of you. Now, explain what awk has done. When invoking awk, we specify/ETC/PASSWD as the input file. When you execute awk, it executes the print command for each row in the/etc/passwd in turn. All output is sent to stdout, and the result is exactly the same as the execution catting/etc/passwd.
Now, interpret the {print} code block. In awk, curly braces are used to group together several pieces of code, similar to the C language. There is only one print command in the code block. In awk, if only the Print command appears, the entire contents of the current line are printed.
Here is another example of awk, which works exactly the same as the previous example:
$ Awk ' {print $} '/etc/passwd
|
In awk, the $ variable represents the entire current line, so print and print $ work exactly the same.
If you want, you can create an awk program that outputs data that is completely unrelated to the input data. Here's an example:
$ Awk ' {print '} '/etc/passwd
|
As long as the string is passed to the print command, it prints a blank line. If you test the script, you will see that awk prints a blank line for each row in the/etc/passwd file. Again, Awk executes this script for each row in the input file. Here's another example:
$ Awk ' {print ' Hiya '} '/etc/passwd
|
Running this script will fill your screen with Hiya. :)
more Fields
Awk is very good at handling text that breaks into multiple logical fields, and lets you easily refer to each individual field in the awk script. The following script will print out a list of all user accounts on your system:
$ awk-f ":" ' {print $} '/etc/passwd
|
In the example above, when you invoke awk, use the-F option to specify ":" as the field separator. When awk processes the print command, it prints out the first field that appears in each row in the input file. Here's another example:
$ awk-f ":" ' {print $} '/etc/passwd
|
The following is an excerpt from the script output:
Halt7
operator11
root0
shutdown6
sync5
bin1
.... etc.
|
As you can see, awk prints out the first and third fields of the/etc/passwd file, which are exactly the user name and user identification fields. Now, when the script runs, it's not ideal-there are no spaces between the two output fields. If you're used to programming with bash or Python, you'll expect the print $ command to insert spaces between two fields. However, when two strings are adjacent to each other in the awk program, awk connects them but does not add spaces between them. The following command inserts spaces in both fields:
$ awk-f ":" ' {print $ ' "$} '/etc/passwd
|
When you call print this way, it connects to $, "", and $ $ to create readable output. Of course, if necessary, we can also insert some text tags:
$ awk-f ":" ' {print ' username: "$" \t\tuid: "$"} '/etc/passwd
|
This produces the following output:
Username:halt uid:7
username:operator uid:11
username:root
uid:0 uid:6
username:sync uid:5
username:bin uid:1
... etc...
|
External Scripts
Passing the script as a command-line argument to awk is very simple for small single-line programs, and it is more complex for multiline programs. You definitely want to compose a script in an external file. You can then pass the-f option to awk to provide it with this script file:
$ awk-f Myscript.awk myfile.in
|
Putting the script in a text file also allows you to use the additional awk feature. For example, this multiline script is the same as the previous Single-line script, which prints out the first field of each row in the/etc/passwd:
BEGIN {
fs= ":"
}
{print $}
|
The difference between the two methods is how to set the field separator. In this script, the field delimiter is specified in the code itself (by setting the FS variable), and in the previous example, the FS is set by passing the-F ":" option to awk on the command line. In general, it is best to set the field delimiter in the script itself, simply because it means you can enter less than one command line argument. We will discuss the FS variable in detail later in this article.
BEGIN and End blocks
In general, Awk executes each script block once for each input line. However, in many programming situations, you might want to execute initialization code before AWK starts working on the text in the input file. In this case, awk allows you to define a BEGIN block. We used the BEGIN block in the previous example. Because Awk executes the BEGIN block before it starts processing the input file, it is an excellent place to initialize the FS (field delimiter) variable, print the header, or initialize other global variables that will be referenced later in the program.
Awk also provides another special block, called an end block. Awk executes this block after all the rows in the input file are processed. Typically, end blocks are used to perform final calculations or to print summary information that should appear at the end of the output stream.
rule expressions and blocks
Awk allows the rule expression to be used to select the execution of a separate block of code based on whether the rule expression matches the current row. The following example script prints only those rows that contain the character sequence Foo: