Linux awk command details, awk command details

Source: Internet
Author: User

Linux awk command details, awk command details

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.

Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.

Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.


awk '{pattern + action}' {filenames}

Although the operation may be complex, the syntax is always like this. pattern indicates the content that AWK searches for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.

The most basic function of the awk language is to browse and extract information based on specified rules in a file or string. Only after awk extracts information can other text operations be performed. A complete awk script is usually used to format information in a text file.

In general, awk is a row of files for processing. Every time an awk receives a line of files, it then executes the corresponding command to process the text.


Call awk

There are three methods to call awk

1. Command Line Mode: awk [-F field-separator] 'commands' input-file (s) where commands is a real awk command and [-F domain separator] is optional. Input-file (s) is a file to be processed. In awk, each line of a file is called a domain separated by a domain separator. Generally, the default domain separator is a space without specifying the-F domain separator. 2. In shell script mode, all the awk commands are inserted into a file and the awk program can be executed. Then, the awk command interpreter serves as the first line of the script and is called by typing the script name again. Equivalent to the first line of shell script :#! /Bin/sh can be changed :#! /Bin/awk3. insert all the awk commands into a separate file, and then call: awk-f awk-script-file input-file (s) where, the-f option loads the awk script in the awk-script-file. The input-file (s) is the same as above.

This chapter focuses on the command line method.


Entry instance

Assume that the output of last-n 5 is as follows:

[Root @ www ~] # Last-n 5 <= retrieve only the first five lines of root pts/1 Tue Feb 10 21 still logged inroot pts/1 Tue Feb 10) root pts/1 Mon Feb 9-() dmtsai pts/1 Mon Feb 9-() root tty1 Fri Sep 5)

If only the five most recently logged on accounts are displayed

#last -n 5 | awk  '{print $1}'rootrootrootdmtsairoot

The awk workflow is as follows: Read a record with '\ n' line breaks, divide the record into fields according to the specified domain separator, and fill in the fields. $0 indicates all fields, $1 indicates the first domain, and $ n indicates the nth domain. The default domain separator is "Blank key" or "[tab] Key", so $1 indicates the logon user, $3 indicates the logon user ip, and so on.


If you only display the/etc/passwd account

#cat /etc/passwd |awk  -F ':'  '{print $1}'  rootdaemonbinsys

This is an example of awk + action. action {print $1} is executed on each line }.

-F specifies that the domain separator is ':'.


If only the/etc/passwd account and shell corresponding to the account are displayed, the account and shell are separated by the tab key.

#cat /etc/passwd |awk  -F ':'  '{print $1"\t"$7}'root    /bin/bashdaemon  /bin/shbin     /bin/shsys     /bin/sh


If only the shell corresponding to the/etc/passwd account and account is displayed, the account and shell are separated by commas, and the name and shell column are added to all rows, add "blue,/bin/nosh" to the last line ".

cat /etc/passwd |awk  -F ':'  'BEGIN {print "name,shell"}  {print $1","$7} END {print "blue,/bin/nosh"}'name,shellroot,/bin/bashdaemon,/bin/shbin,/bin/shsys,/bin/,/bin/nosh

The awk workflow is as follows: first execute BEGING, then read the file, read a record with/n line breaks, then divide the record into Domains Based on the specified domain separator, and fill in the domain, $0 indicates all domains, $1 indicates the first domain, $ n indicates the nth domain, and then starts the action corresponding to the execution mode. Then read the second record until all the records are read and the END operation is executed.


Search for all rows with the root keyword in/etc/passwd.

#awk -F: '/root/' /etc/passwdroot:x:0:0:root:/root:/bin/bash

This is an example of pattern. Only the row matching pattern (root here) can execute action (no action is specified, and the content of each row is output by default ).

Regular Expressions are supported in search, for example, awk-F: '/^ root/'/etc/passwd.


Search for all rows with the root keyword in/etc/passwd and display the corresponding shell

# awk -F: '/root/{print $7}' /etc/passwd             /bin/bash

Action {print $7} is specified here}


Awk built-in Variables

Awk has many built-in variables used to set environment information. These variables can be changed. The following lists the most common variables.

ARGC command line parameter count ARGV command line parameter arrangement ENVIRON supports the number of records in the FNR file browsed by FILENAME awk for system environment variables in the queue. FS sets the input domain separator, it is equivalent to the number of domains in the command line-F option NF browsing record NR number of records read OFS output domain separator ORS output record separator RS control record Separator

In addition, the $0 variable refers to the entire record. $1 indicates the first domain of the current row, $2 indicates the second domain of the current row, and so on.


Statistics/etc/passwd: file name, row number of each row, column number of each row, corresponding to the complete row content:

#awk  -F ':'  '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwdfilename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bashfilename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/shfilename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/shfilename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh


Use printf instead of print to make the code more concise and easy to read

 awk  -F ':'  '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd


Print and printf

Both print and printf are provided in awk.

The print function can be a variable, a value, or a string. The string must be referenced in double quotation marks and the parameters must be separated by commas. If there are no commas (,), the parameters are connected together and cannot be distinguished. Here, the comma serves the same purpose as the separator of the output file, except that the latter is a space.

The printf function is similar to the printf function in C language. It can format strings. When the output is complex, printf is easier to use and the code is easier to understand.


Awk Programming

Variables and assignments

In addition to the built-in variables of awk, awk can also customize variables.

The following table lists the number of accounts in/etc/passwd.

awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwdroot:x:0:0:root:/root:/bin/bash......user count is  40

Count is a custom variable. In the previous action {}, only one print exists. In fact, print is only a statement, and action {} can have multiple statements separated by a comma.


The count is not initialized here. Although the default value is 0, it is recommended to initialize it as 0:

awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd[start]user count is  0root:x:0:0:root:/root:/bin/bash...[end]user count is  40


Count the number of bytes occupied by files in a folder

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'[end]size is  8657198


If the unit is M:

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}' [end]size is  8.25889 M

Note: statistics do not include subdirectories of folders.


Condition Statement

The condition statements in the awk are used for reference in the C language. See the following declaration method:

if (expression) {    statement;    statement;    ... ...}if (expression) {    statement;} else {    statement2;}if (expression) {    statement1;} else if (expression1) {    statement2;} else {    statement3;}


Count the number of bytes occupied by files in a folder and filter out files of 4096 size (usually folders ):

ls -l |awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}' [end]size is  8.22339 M


Loop statement

The loop statements in awk are also used in C language and support while, do/while, for, break, and continue. These keywords have the same semantics as those in C language.



Because the subscript of an array in awk can be numbers and letters, the subscript of an array is usually called a key ). Both values and keywords are stored in an internal table that uses hash for key/value applications. Because hash is not stored in sequence, you will find that the array content is not displayed in the expected order. Arrays and variables are automatically created when they are used, and awk automatically determines whether they are stored as numbers or strings. In general, arrays in awk are used to collect information from records. They can be used to calculate the sum, count words, and track the number of times the template is matched.


Show/etc/passwd account

awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd0 root1 daemon2 bin3 sys4 sync5 games......

Here we use the for loop to traverse the Array

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.