Brief introduction
Awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing and generating reports on data, compared to grep lookup and sed editing. To put it simply, awk reads the file line by row, using a space as the default delimiter to slice each row, cut the section, and then perform various analytical processing.
AWK has 3 different versions: AWK, Nawk, and gawk, which are not specifically described, and generally refer to Gawk,gawk as the GNU version of awk.
Awk has its name from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan. In fact Awk does have its own language: The AWK programming language, which the three-bit creator has formally defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, as well as countless other features.
How to use
awk ' {pattern + action} ' {filenames} ' although the operation can be complex, the syntax is always the case, where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.
The most basic function of the awk language is to browse and extract information based on specified rules in a file or string, and awk extracts the information before it can perform other text operations. A complete awk script is typically used to format the information in a text file.
Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command.
Invoke awk
There are three ways of calling Awk
1. Command line mode
awk [f field-separator] ' commands ' input-file (s)
Where commands is the true awk command, [-f field separator] is optional. Input-file (s) is the file to be processed.
In awk, each item in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space without naming the-f field separator.
2.shell Scripting Way
Insert all of the awk commands into a file and make the awk program executable, and then the awk command interpreter is invoked as the first line of the script, once again by typing the script name.
Equivalent to the first line of a shell script: #!/bin/sh
Can be replaced by: #!/bin/awk
3. Insert all awk commands into a separate file, and then call:
Awk-f awk-script-file Input-file (s)
Where the-f option loads the awk script in Awk-script-file, Input-file (s) is the same as above. This chapter focuses on the command-line approach.
Getting Started example
Suppose the output of Last-n 5 is as follows
The code is as follows:
[Root@www ~]# last-n 5 <== Only remove the first five elements
Root pts/1 192.168.1.100 Tue Feb 11:21 still in
Root PTS/1 192.168.1.100 Tue Feb 10 00:46-02:28 (01:41)
Root PTS/1 192.168.1.100 Mon Feb 9 11:41-18:30 (06:48)
Dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41-11:41 (00:00)
Root tty1 Fri Sep 5 14:09-14:10 (00:01) If only 5 accounts that have recently logged in are displayed
< p> #last-N 5 | awk ' {print $} ' Rootrootrootdmtsairootawk
The workflow is as follows: Read a record with an ' n ' newline character split, and then divide the record by the specified field delimiter, fill the field, and $ = all fields, representing the first field, $n the nth field. The default Domain delimiter is the blank key or the [tab] key, so it represents the Logged-in user, the $ $ means the logged-on user IP, and so on.
If you just show/etc/passwd's account
The code is as follows:
#cat/etc/passwd |awk-f ': ' {print $} '
Root
Daemon
Bin
Sys
This is an example of awk+action, where each row executes action{print $}.
-f Specifies that the field separator is ': '.
If you just display the/etc/passwd account and the corresponding shell of the account, and the account and the shell are separated by the TAB key
The code is as follows:
#cat/etc/passwd |awk-f ': ' {print ' t ' $} '
Root/bin/bash
Daemon/bin/sh
Bin/bin/sh
Sys/bin/sh
Add "Blue,/bin/nosh" on the last line if you just display the/etc/passwd account and the corresponding shell of the account, and the account is separated from the shell by a comma, and the column name Name,shell is added to all rows.
The code is as follows:
CAT/ETC/PASSWD |awk-f ': ' BEGIN {print ' Name,shell '} {print $ ', ' $} end {print ' Blue,/bin/nosh '} '
Name,shell
Root,/bin/bash
Daemon,/bin/sh
Bin,/bin/sh
Sys,/bin/sh
....
Blue,/bin/noshawk
The workflow is this: Execute beging First, then read the file, reads a record with the/n newline character split, then divides the record by the specified field delimiter, fills the field, and $ represents all fields, the first field, $n represents the Nth field, and then the action action for the pattern is started. Then start reading the second record. Until all the records have been read, the end operation is performed.
Search for all rows with the root keyword/etc/passwd
The code is as follows:
#awk-F: '/root/'/etc/passwd
Root:x:0:0:root:/root:/bin/bash
This is an example of pattern usage, where the line that matches the pattern (here is root) executes the action (no action is specified and the content of each row is output by default).
Search support Regular, for example to start with root: awk-f: '/^root/'/etc/passwd
Search for all lines with the root keyword in the/etc/passwd and display the corresponding shell
The code is as follows:
# awk-f: '/root/{print $} '/etc/passwd
/bin/bash
Action{print $} was specified here
awk Built-in variables
Awk has a number of built-in variables to set up environment information, which can be changed, and some of the most commonly used variables are given below.
Number of ARGC command line arguments
ARGV Command line parameter arrangement
The use of system environment variables in ENVIRON support queues
FileName awk Browse file name
Number of records FNR browsing files
FS Set input field separator, equivalent to command line-f option
NF browsing the number of fields recorded
The number of records that NR has read
OFS Output Field Separator
ORS Output Record Separator
RS Control Record Separator In addition, the $ variable refers to the entire record. Represents the first field in the current row, and $ $ represents the second field of the current row,...... Analogy
Statistics/etc/passwd: File name, line number for each line, number of columns per row, corresponding full line content:
The code is as follows:
#awk-F ': ' {print ' filename: "filename", linenumber: "NR", Columns: "NF", Linecontent: "$}"/etc/passwd
Filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
Filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
Filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
Filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh
Use printf instead of print to make your code simpler and easier to read
The code is as follows:
Awk-f ': ' {printf ("filename:%10s,linenumber:%s,columns:%s,linecontent:%sn", Filename,nr,nf,$0)} '/etc/passwd
Print and printf
The functions of print and printf two kinds of printouts are also available in awk.
Where the print function argument can be a variable, a numeric value, or a string. The string must be quoted in double quotes and the arguments are separated by commas. If there are no commas, the arguments are concatenated together without distinction. Here, the function of the comma is the same as the delimiter of the output file, except that the latter is a space.
printf functions, which are basically similar to printf in the C language, can format strings, and when output is complex, printf works better and the code is easier to understand.
Awk Day often uses learning notes:
# Remove the same part from two files
The code is as follows:
awk ' Nr==fnr{a[$0]=0;next}{if ($ in a) {print $}} ' file1 file2
# take out two different parts of a file
The code is as follows:
awk ' Nr==fnr{a[$0]=0;next}{if ( $ in a)) {print $}} ' file1 file2
# COMPUTE Nginx Log access Top 10 IP
The code is as follows:
awk ' {a[$1]++}end{for (i in a) print a[i],i} ' Access.log | Sort-rn | Head-10
#统计各个科目的数量
The code is as follows:
# cat Test.txt
XQQ Chinese Mathematics
XQ English language
X Mathematical Art
awk ' {for (i=2;i<=nf;i++) a[$i]++}end{for (i in a) print I,a[i]} ' test.txt
# Get System IP
The code is as follows:
Ifconfig eth0 | awk ' Nr==2{print $} ' | cut-d:-f2