Linux awk command details, awk command details
Introduction
Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.
Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.
Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.
Usage
Awk '{pattern + action}' {filenames}
Although the operation may be complex, the syntax is always like this. pattern indicates the content that AWK searches for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.
The most basic function of the awk language is to browse and extract information based on specified rules in a file or string. Only after awk extracts information can other text operations be performed. A complete awk script is usually used to format information in a text file.
In general, awk is a row of files for processing. Every time an awk receives a line of files, it then executes the corresponding command to process the text.
CallAwk
There are three methods to call awk
1. Command Line Method awk [-F field-separator] 'commands' input-file (s)
Commands is a real awk command, and the [-F domain separator] is optional. Input-file (s) is a file to be processed.
In awk, each line of a file is called a domain separated by a domain separator. Generally, the default domain separator is a space without specifying the-F domain separator. 2. shell script
Insert all the awk commands into a file and make the awk program executable. Then, the awk command interpreter serves as the first line of the script and is called by typing the script name again.
Equivalent to the first line of shell script :#! /Bin/sh can be changed :#! /Bin/awk 3. Insert all the awk commands into a separate file, and then call: awk-f awk-script-file input-file (s)
Among them, the-f option loads the awk script in the awk-script-file, and the input-file (s) is the same as above.
This chapter focuses on the command line method.
Entry instance
Assume that the output of last-n 5 is as follows:
[Root @ www ~] # Last-n 5 <= retrieve only the first five rows
Root pts/1 192.168.1.100 Tue Feb 10 still logged in root pts/1 192.168.1.100 Tue Feb 10)
Root pts/1 192.168.1.100 Mon Feb 9)
Dmtsai pts/1 192.168.1.100 Mon Feb 9)
Root tty1 Fri Sep 5)
If only the five most recently logged on accounts are displayed
# Last-n 5 | awk '{print $1 }'
Root
Root
Root
Dmtsai
Root
The awk workflow is as follows: Read a record with '\ n' line breaks, divide the record into fields according to the specified domain separator, and fill in the fields. $0 indicates all fields, $1 indicates the first domain, and $ n indicates the nth domain. The default domain separator is "Blank key" or "[tab] Key", so $1 indicates the logon user, $3 indicates the logon user ip, and so on.
If you only display the/etc/passwd account
# Cat/etc/passwd | awk-F': ''{print $1} 'root
Daemon
Bin
Sys
This is an example of awk + action. action {print $1} is executed on each line }.
-F specifies that the domain separator is ':'.
If only the/etc/passwd account and shell corresponding to the account are displayed, the account and shell are separated by the tab key.
# Cat/etc/passwd | awk-F': ''{print $1" \ t "$7} 'root/bin/bash
Daemon/bin/sh sys/bin/sh
If only the shell corresponding to the/etc/passwd account and account is displayed, the account and shell are separated by commas, and the name and shell column are added to all rows, add "blue,/bin/nosh" to the last line ".
Cat/etc/passwd | awk-F': ''BEGIN {print" name, shell "} {print $1", "$7} END {print" blue, /bin/nosh "} 'name, shell
Root,/bin/bash
Daemon,/bin/sh bin,/bin/sh sys,/bin/sh ....
Blue,/bin/nosh
The awk workflow is as follows: first execute BEGING, then read the file, read a record with/n line breaks, then divide the record into Domains Based on the specified domain separator, and fill in the domain, $0 indicates all domains, $1 indicates the first domain, $ n indicates the nth domain, and then starts the action corresponding to the execution mode. Then read the second record until all the records are read and the END operation is executed.
Search for all rows with the root keyword in/etc/passwd.
# Awk-F: '/root/'/etc/passwd root: x: 0: 0: root:/bin/bash
This is an example of pattern. Only the row matching pattern (root here) can execute action (no action is specified, and the content of each row is output by default ).
Regular Expressions are supported in search, for example, awk-F: '/^ root/'/etc/passwd.
Search for all rows with the root keyword in/etc/passwd and display the corresponding shell
# Awk-F: '/root/{print $7}'/etc/passwd/bin/bash
Action {print $7} is specified here}
AwkBuilt-in Variables
Awk has many built-in variables used to set environment information. These variables can be changed. The following lists the most common variables.
Number of ARGC command line parameters
ARGV command line parameter arrangement
ENVIRON supports the use of system environment variables in the queue
FILENAME awk browsed file name
Number of FNR browsing file records
FS sets the input domain separator, which is equivalent to the command line-F Option
Number of NF browsing records
Number of records read by NR
OFS output domain Separator
ORS output record Separator
RS control record delimiter
In addition, the $0 variable refers to the entire record. $1 indicates the first domain of the current row, $2 indicates the second domain of the current row, and so on.
Statistics/etc/passwd: file name, row number of each row, column number of each row, corresponding to the complete row content:
# Awk-F': ''{print" filename: "FILENAME", linenumber: "NR", columns: "NF", linecontent: "$0} '/etc/passwd filename:/etc/passwd, linenumber: 1, columns: 7, linecontent: root: x: 0: 0: root:/root: /bin/bash
Filename:/etc/passwd, linenumber: 2, columns: 7, linecontent: daemon: x: 1: 1: daemon:/usr/sbin:/bin/sh filename: /etc/passwd, linenumber: 3, columns: 7, linecontent: bin: x: 2: bin:/bin/sh filename:/etc/passwd, linenumber: 4, columns: 7, linecontent: sys: x: 3: 3: sys:/dev:/bin/sh
Use printf instead of print to make the code more concise and easy to read
Awk-F': ''{printf (" filename: % 10 s, linenumber: % s, columns: % s, linecontent: % s \ n ", FILENAME, NR, NF, $0)} '/etc/passwd
PrintAndPrintf
Both print and printf are provided in awk.
The print function can be a variable, a value, or a string. The string must be referenced in double quotation marks and the parameters must be separated by commas. If there are no commas (,), the parameters are connected together and cannot be distinguished. Here, the comma serves the same purpose as the separator of the output file, except that the latter is a space.
The printf function is similar to the printf function in C language. It can format strings. When the output is complex, printf is easier to use and the code is easier to understand.
AwkProgramming
Variables and assignments
In addition to the built-in variables of awk, awk can also customize variables.
The following table lists the number of accounts in/etc/passwd.
Awk '{count ++; print $0;} END {print "user count is", count}'/etc/passwd root: x: 0: 0: root: /root:/bin/bash
......
User count is 40
Count is a custom variable. In the previous action {}, only one print exists. In fact, print is only a statement, and action {} can have multiple statements separated by a comma.
The count is not initialized here. Although the default value is 0, it is recommended to initialize it as 0:
Awk 'in in {count = 0; print "[start] user count is", count} {count = count + 1; print $0 ;} END {print "[end] user count is", count} '/etc/passwd [start] user count is 0 root: x: 0: 0: root:/root: /bin/bash
...
[End] user count is 40
Count the number of bytes occupied by files in a folder
Ls-l | awk 'in in {size = 0 ;}{ size = size + $5 ;}end {print "[END] size is", size }'
[End] size is 8657198
If the unit is M:
Ls-l | awk 'in in {size = 0 ;}{ size = size + $5 ;}end {print "[END] size is", size/1024/1024, "M"} '[end] size is 8.25889 M
Note: statistics do not include subdirectories of folders.
Condition Statement
The condition statements in the awk are used for reference in the C language. See the following declaration method:
If (expression ){
Statement;
Statement;
......
} If (expression ){
Statement;
} Else {
Statement2;
} If (expression ){
Statement1;
} Else if (expression1 ){
Statement2;
} Else {
Statement3;
}
Count the number of bytes occupied by files in a folder and filter out files of 4096 size (usually folders ):
Ls-l | awk 'in in {size = 0; print "[start] size is", size} {if ($5! = 4096) {size = size + $5 ;}end {print "[END] size is", size/1024/1024, "M"} '[end] size is 8.22339 M
Loop statement
The loop statements in awk are also used in C language and support while, do/while, for, break, and continue. These keywords have the same semantics as those in C language.
Array
Because the subscript of an array in awk can be numbers and letters, the subscript of an array is usually called a key ). Both values and keywords are stored in an internal table that uses hash for key/value applications. Because hash is not stored in sequence, you will find that the array content is not displayed in the expected order. Arrays and variables are automatically created when they are used, and awk automatically determines whether they are stored as numbers or strings. In general, arrays in awk are used to collect information from records. They can be used to calculate the sum, count words, and track the number of times the template is matched.
Show/etc/passwd account
Awk-F': ''BEGIN {count = 0 ;}{ name [count] = $1; count ++ ;}; END {for (I = 0; I <NR; I ++) print I, name [I]} '/etc/passwd 0 root 1 daemon 2 bin 3 sys 4 sync 5 games
......
Java enterprise-level general permission security framework source code SpringMVC mybatis or hibernate + ehcache shiro druid bootstrap HTML5
[Download java framework source code]