Brief Introduction
Awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing and generating reports on data, compared to grep lookup and sed editing. To put it simply, awk reads the file line by row, using a space as the default delimiter to slice each row, cut the section, and then perform various analytical processing.
AWK has 3 different versions: AWK, Nawk, and gawk, which are not specifically described, and generally refer to Gawk,gawk as the GNU version of awk.
Awk has its name from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan. In fact Awk does have its own language: The AWK programming language, which the three-bit creator has formally defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, as well as countless other features. How to use
awk ' {pattern +action} ' {filenames}
Although the operation can be complex, the syntax is always the same, where pattern represents what awk looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.
The most basic function of the awk language is to browse and extract information based on specified rules in a file or string, and awk extracts the information before it can perform other text operations. A complete awk script is typically used to format the information in a text file. Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command. Invoke awk
There are three ways of invoking awk.
1, the command line mode
awk [f field-separator] ' commands ' input-file (s)
Where commands is the true awk command, [-f field separator] is optional. Input-file (s) is the file to be processed.
In awk, each item in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space without naming the-f field separator.
2, Shell script mode
Insert all of the awk commands into a file and make the awk program executable, and then the awk command interpreter is invoked as the first line of the script, once again by typing the script name.
Equivalent to the first line of a shell script: #!/bin/sh
Can be replaced by: #!/bin/awk
3. Insert all of the awk commands into a separate file, and then call.
Awk-f awk-script-file Input-file (s)
Where the-f option loads the awk script in Awk-script-file, Input-file (s) is the same as above. Getting Started example
The examples described below are called primarily by using the command line.
Suppose the output of Last-n 5 is as follows:
[Root@www ~]# last-n 5 <== Only remove the first five elements
Root pts/1 192.168.1.100 Tue Feb 11:21 still in
Root PTS/1 192.168.1.100 Tue Feb 10 00:46-02:28 (01:41)
Root PTS/1 192.168.1.100 Mon Feb 9 11:41-18:30 (06:48)
Dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41-11:41 (00:00)
Root tty1 Fri Sep 5 14:09-14:10 (00:01)
1, if only the latest login to display the 5 accounts:
#last-N 5 | awk ' {print '} '
Root
Root
Root
Dmtsai
Root
The awk workflow is as follows: Read a record with a ' \ n ' newline character split, then divide the record by the specified field delimiter, fill the field, and the $ $ represents all the fields, representing the first field, $n the nth field. The default Domain delimiter is the blank key or the [tab] key, so it represents the Logged-in user, the $ $ means the logged-on user IP, and so on. (the domain is simple to understand and can be seen as the first few columns).
2, if only show the/ETC/PASSWD account
#cat/etc/passwd |awk-f ': ' {print $} '
Root
Daemon
Bin
Sys
This is an example of awk+action, where each row executes action{print $}. -f Specifies the delimiter as ': '.
3, if only show/etc/passwd account and account corresponding shell, and the account and shell between the TAB key segmentation.
#cat/etc/passwd | Awk-f ': ' {print $ \ t ' $} '
Root/bin/bash
Daemon/bin/sh
Bin/bin/sh
Sys/bin/sh
4, if only show/etc/passwd account and account corresponding shell, and the account and shell separated by commas, and in all rows Add column name Name,shell, add "Blue,/bin/nosh" on the last line.
CAT/ETC/PASSWD |awk-f ': ' BEGIN {print ' name, Shell ' {print $ ', ' $} end {print ' Blue,/bin/nosh '} '
Name,shell
Root,/bin/bash
Daemon,/bin/sh
Bin,/bin/sh
Sys,/bin/sh
....
Blue,/bin/nosh
The awk workflow is like this: first executes the beging, then reads the file, reads a record with the/n newline character split, then divides the record by the specified field delimiter, fills the field, and $ represents all fields, the first field, $n the nth field, The action action for the pattern is then started. Then start reading the second record. Until all the records have been read, the end operation is performed.
5, search/etc/passwd have root keyword of all lines
#awk-F: '/root/'/etc/passwd
Root:x:0:0:root:/root:/bin/bash
This is an example of pattern usage, where the line that matches the pattern (here is root) executes the action (no action is specified and the content of each row is output by default).
Search support Regular, for example to start with root: awk-f: '/^root/'/etc/passwd
6, search/etc/passwd have root keyword of all lines, and show the corresponding shell
# awk-f: '/root/{print $} '/etc/passwd
/bin/bash
Action{print $} awk built-in variables are specified here
Awk has a number of built-in variables to set up environment information, which can be changed, and some of the most commonly used variables are given below.
ARGC |
Number of command line arguments |
Argv |
Command line argument arrangement |
ENVIRON |
Support for the use of system environment variables in queues |
FILENAME |
The filename that awk browses |
FNR |
Browse the number of records in a file |
Fs |
Sets the input field delimiter, which is equivalent to the command line-F option |
Nf |
Browse the number of fields that are logged |
Nr |
Number of records read |
OFS |
Output Domain Separator |
ORS |
Output Record Separator |
Rs |
Control Record Separator |
In addition, the $ variable refers to the entire record. Represents the first field in the current row, and $ $ represents the second field of the current row ... Analogy
1, Statistics/etc/passwd: file name, line number of each line, the number of columns per row, the corresponding full line of content:
#awk-F ': ' {print ' filename: "filename", linenumber: "NR", Columns: "NF", Linecontent: "$}"/etc/passwd
Filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
Filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
Filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
Filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh
2, using printf instead of print, you can make the code more concise, easy to read.
Awk-f ': ' {printf ("filename:%10s,linenumber:%s,columns:%s,linecontent:%s \ n ", filename,nr,nf,$0)} '/etc/passwd print and printf
The functions of print and printf two kinds of printouts are also available in awk.
Where the print function argument can be a variable, a numeric value, or a string. The string must be quoted in double quotes and the arguments are separated by commas. If there are no commas, the arguments are concatenated together without distinction. Here, the function of the comma is the same as the delimiter of the output file, except that the latter is a space.
printf functions, which are basically similar to printf in the C language, can format strings, and when output is complex, printf works better and the code is easier to understand. awk Programming Variables and Assignments
In addition to the built-in variables of awk, awk can also customize variables.
1, the following statistics/etc/passwd account number
awk ' {count++; print$0;} End{print "User Count is", count} '/etc/passwd
Root:x:0:0:root:/root:/bin/bash
......
User Count is 40
Count is a custom variable. Before the action{} is only one print, in fact, print is only a statement, and action{} can have multiple statements, separated by a.
2, there is no initialization of count, although the default is 0, but the appropriate approach is to initialize to 0:
awk ' BEGIN {count=0;print ' [Start]usercount is ', count} {count = Count+1;print $} End{print "[End]usercount is", count} '/etc/passwd
[Start]user count is 0
Root:x:0:0:root:/root:/bin/bash
...
[End]user Count is 40
3. Statistics the number of bytes occupied by a file under a folder.
Ls-l |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size} '
[End]size is 8657198
If displayed in M:
Ls-l |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size/1024/1024, "M"} '
[End]size is 8.25889 M
Note that statistics do not include subdirectories of folders. Conditional Statement
The conditional statements in awk are drawn from the C language, as in the following declarations:
if (expression) {
Statement
Statement
... ...
}
if (expression) {
Statement
} else {
Statement2;
}
if (expression) {
Statement1;