Awk is a very powerful text analysis tool in Linux. In a nutshell, awk reads the file line by row, dividing each row into separate sections with whitespace as the default separator, and then doing a variety of analytical processing.
Basic usage of awk
The basic forms used by awk are as follows
awk ' {pattern + action} ' {filenames}
where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.
In use, we will generally adopt the following usage
awk [f field-separator] ' commands ' input-file (s)
Where commands is the true awk command, [-f field separator] is optional. Input-file (s) is the file to be processed. In awk, each item in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space without naming the-f field separator.
Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command.
Example of using awk
Take the/etc/passwd file as an example, and when you execute the CAT/ETC/PASSWD command, you get the following, which takes only the first 4 lines
# CAT/ETC/PASSWD
Root:x:0:0:root:/root:/bin/bash
Daemon:x:1:1:daemon:/usr/sbin:/bin/sh
Bin:x:2:2:bin:/bin:/bin/sh
Sys:x:3:3:sys:/dev:/bin/sh
1. Awk+action usage
We use the awk command to take out the account name and get the following output
#cat/etc/passwd |awk-f ': ' {print $} '
Root
Daemon
Bin
Sys
Explain the meaning of the awk command above: Read a record with ' \ n ' newline character split (read-by-line), the record is then delimited by the specified field separator (-f Specifies the field delimiter ': '), executes the command (print $), and $ $ represents all fields, the first field, $n represents the nth domain. The default Domain delimiter is the blank key or the [tab] key, so the user name is represented, followed by, and so on.
So if you want to print both the/ETC/PASSWD account and the corresponding shell of your account, and then divide it with commas, you can use the following command
# cat/etc/passwd |awk-f ': ' {print $ ', ' $} '
Root,/bin/bash
Daemon,/bin/sh
Bin,/bin/sh
Sys,/bin/sh
2. Awk+pattern usage
Search for all rows with the root keyword/etc/passwd
#awk-F: '/root/'/etc/passwd
Root:x:0:0:root:/root:/bin/bash
This is an example of pattern usage, where the line that matches the pattern (here is root) executes the action (no action is specified and the content of each row is output by default).
The matching pattern is usually written in/pattern/, i.e.
awk '/pattern/'
Search support Regular, for example to start with root: awk-f: '/^root/'/etc/passwd
3. Awk+pattern+action Usage
Search for all lines with the root keyword in the/etc/passwd and display the corresponding shell
# awk-f: '/root/{print $} '/etc/passwd
/bin/bash
Action{print $} was specified here
The extended usage of awk
1. awk Built-in variables
Number of ARGC command line arguments
ARGV Command line parameter arrangement
The use of system environment variables in ENVIRON support queues
FileName awk Browse file name
Number of records FNR browsing files
FS Set input field separator, equivalent to command line-f option
NF browsing the number of fields recorded
The number of records that NR has read
OFS Output Field Separator
ORS Output Record Separator
RS Control Record Separator
Here are some simple things to use:
1, output file second line
awk ' nr==2 '
2, the output file the second line to line fourth
awk ' nr==2,nr==4 '
3, delete all the blank lines
awk NF
4, the last line of the output file
awk ' End {print} '
2, print and printf
The functions of print and printf two kinds of printouts are also available in awk.
Where the print function argument can be a variable, a numeric value, or a string. The string must be quoted in double quotes and the arguments are separated by commas. If there are no commas, the arguments are concatenated together without distinction. Here, the function of the comma is the same as the delimiter of the output file, except that the latter is a space.
printf functions, which are basically similar to printf in the C language, can format strings, and when output is complex, printf works better and the code is easier to understand.
AWK programming
Variables and Assignments
In addition to the built-in variables of awk, awk can also customize variables.
The following statistics/etc/passwd account number
awk ' {count++;p rint $} End{print "User Count is", count} '/etc/passwd
Root:x:0:0:root:/root:/bin/bash
......
The user count is 40count is a custom variable. Before the action{} is only one print, in fact, print is only a statement, and action{} can have multiple statements, separated by a.
Count is not initialized here, although the default is 0, the proper approach is to initialize to 0:
awk ' BEGIN {count=0;print ' [Start]user count is ', count} {count=count+1;print $} End{print "[End]user Count is", count} '/etc/passwd
[Start]user count is 0
Root:x:0:0:root:/root:/bin/bash
...
[End]user Count is 40
Count the number of bytes in a file under a folder
Ls-l |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size} ' [End]size is 8657198
If displayed in M:
Ls-l |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size/1024/1024, "M"} ' [End]size are 8.25889 m note that statistics do not include subdirectories of folders.
Conditional statement
The conditional statements in awk are drawn from the C language, as in the following declarations:
if (expression) {
Statement
Statement
... ...
}
if (expression) {
Statement
} else {
Statement2;
}
if (expression) {
Statement1;
else if (expression1) {
Statement2;
} else {
Statement3;
}
Counts the number of bytes of files under a folder, filtering files of 4096 sizes (typically folders):
Ls-l |awk ' BEGIN {size=0;print ' [start]size is ', size} {if ($5!=4096) {size=size+$5;}} End{print "[End]size is", size/1024/1024, "M"} ' [End]size is 8.22339 m
Loop statement
The looping statements in awk also refer to the C language and support while, Do/while, for, break, and continue, which are semantically identical to the semantics of the C language.
Array
Because the subscripts of an array in awk can be numbers and letters, the subscript of an array is often called a keyword. Both values and keywords are stored inside a table that applies a hash to the key/value. Because the hash is not sequential, it is shown that the contents of the array are not displayed in the order that you expect. Arrays and variables are created automatically when they are used, and awk automatically determines whether they store numbers or strings. In general, an array in awk is used to gather information from records, to calculate totals, to count words, and to track how many times a template is matched, and so on.
Show/ETC/PASSWD's account
Awk-f ': ' BEGIN {count=0} {Name[count] = $1;count++; End{for (i = 0; i < NR; i++) print I, Name[i]} '/etc/passwd
0 Root
1 daemon
2 bin
3 SYS
4 Sync
5 games
...... This uses the For loop to traverse the array