Linux awk command details

Source: Internet
Author: User

Linux awk command details

Introduction

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.

Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.

Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.

Usage

Awk '{pattern + action}' {filenames}

Although the operation may be complex, the syntax is always like this. pattern indicates the content that AWK searches for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.

The most basic function of the awk language is to browse and extract information based on specified rules in a file or string. Only after awk extracts information can other text operations be performed. A complete awk script is usually used to format information in a text file.

In general, awk is a row of files for processing. Every time an awk receives a line of files, it then executes the corresponding command to process the text.

Call awk

There are three methods to call awk

1. Command Line
Awk [-F field-separator] 'commands' input-file (s)
Commands is a real awk command, and the [-F domain separator] is optional. Input-file (s) is a file to be processed.
In awk, each line of a file is called a domain separated by a domain separator. Generally, the default domain separator is a space without specifying the-F domain separator.

2. shell script
Insert all the awk commands into a file and make the awk program executable. Then, the awk command interpreter serves as the first line of the script and is called by typing the script name again.
Equivalent to the first line of shell script :#! /Bin/sh
Can be changed :#! /Bin/awk

3. Insert all the awk commands into a separate file, and then call:
Awk-f awk-script-file input-file (s)
Among them, the-f option loads the awk script in the awk-script-file, and the input-file (s) is the same as above.

 

This chapter focuses on the command line method.

Entry instance

Assume that the output of last-n 5 is as follows:

[Root @ www ~] # Last-n 5 <= retrieve only the first five rows
Root pts/1 192.168.1.100 Tue Feb 10 :21 still logged in
Root pts/1 192.168.1.100 Tue Feb 10)
Root pts/1 192.168.1.100 Mon Feb 9)
Dmtsai pts/1 192.168.1.100 Mon Feb 9)
Root tty1 Fri Sep 5)

If only the five most recently logged on accounts are displayed

# Last-n 5 | awk '{print $1 }'
Root
Root
Root
Dmtsai
Root

This is an example of awk + action. action {print $1} is executed on each line }.

-F specifies that the domain separator is ':'.

If only the/etc/passwd account and shell corresponding to the account are displayed, the account and shell are separated by the tab key.

# Cat/etc/passwd | awk-F': ''{print $1" \ t "$7 }'
Root/bin/bash
Daemon/bin/sh
Bin/sh
Sys/bin/sh

If only the shell corresponding to the/etc/passwd account and account is displayed, the account and shell are separated by commas, and the name and shell column are added to all rows, add "blue,/bin/nosh" to the last line ".

Cat/etc/passwd | awk-F': ''BEGIN {print" name, shell "} {print $1", "$7} END {print" blue, /bin/nosh "}'
Name, shell
Root,/bin/bash
Daemon,/bin/sh
Bin,/bin/sh
Sys,/bin/sh
....
Blue,/bin/nosh

The awk workflow is as follows: first execute BEGING, then read the file, read a record with/n line breaks, then divide the record into Domains Based on the specified domain separator, and fill in the domain, 0 indicates all domains, 0 indicates all domains, 1 indicates the first domain, $ n indicates the nth domain, and then start the action corresponding to the execution mode. Then read the second record until all the records are read and the END operation is executed.

Search for all rows with the root keyword in/etc/passwd.

# Awk-F: '/root/'/etc/passwd
Root: x: 0: 0: root:/bin/bash

This is an example of pattern. Only the row matching pattern (root here) can execute action (no action is specified, and the content of each row is output by default ).

Regular Expressions are supported in search, for example, awk-F: '/^ root/'/etc/passwd.

Search for all rows with the root keyword in/etc/passwd and display the corresponding shell

# Awk-F: '/root/{print $7}'/etc/passwd
/Bin/bash

Action {print $7} is specified here}

Awk built-in Variables

Awk has many built-in variables used to set environment information. These variables can be changed. The following lists the most common variables.

Number of ARGC command line parameters
ARGV command line parameter arrangement
ENVIRON supports the use of system environment variables in the queue
FILENAME awk browsed file name
Number of FNR browsing file records
FS sets the input domain separator, which is equivalent to the command line-F Option
Number of NF browsing records
Number of records read by NR
OFS output domain Separator
ORS output record Separator
RS control record delimiter

In addition, the 0 variable refers to the entire record. The value 0 indicates the entire record. 1 indicates the first domain of the current row, $2 indicates the second domain of the current row, and so on.

Statistics/etc/passwd: file name, row number of each row, column number of each row, corresponding to the complete row content:

# Awk-F': ''{print" filename: "FILENAME", linenumber: "NR", columns: "NF", linecontent: "$0} '/etc/passwd
Filename:/etc/passwd, linenumber: 1, columns: 7, linecontent: root: x: 0: 0: root:/bin/bash
Filename:/etc/passwd, linenumber: 2, columns: 7, linecontent: daemon: x: 1: 1: daemon:/usr/sbin:/bin/sh
Filename:/etc/passwd, linenumber: 3, columns: 7, linecontent: bin: x: 2: bin:/bin/sh
Filename:/etc/passwd, linenumber: 4, columns: 7, linecontent: sys: x: 3: 3: sys:/dev:/bin/sh

Use printf instead of print to make the code more concise and easy to read

Awk-F': ''{printf (" filename: % 10 s, linenumber: % s, columns: % s, linecontent: % s \ n ", FILENAME, NR, NF, $0)} '/etc/passwd

Print and printf

Both print and printf are provided in awk.

The print function can be a variable, a value, or a string. The string must be referenced in double quotation marks and the parameters must be separated by commas. If there are no commas (,), the parameters are connected together and cannot be distinguished. Here, the comma serves the same purpose as the separator of the output file, except that the latter is a space.

The printf function is similar to the printf function in C language. It can format strings. When the output is complex, printf is easier to use and the code is easier to understand.

Awk Programming

Variables and assignments

In addition to the built-in variables of awk, awk can also customize variables.

The following table lists the number of accounts in/etc/passwd.

Awk '{count ++; print $0;} END {print "user count is", count}'/etc/passwd
Root: x: 0: 0: root:/bin/bash
......
User count is 40

Count is a custom variable. In the previous action {}, only one print exists. In fact, print is only a statement, and action {} can have multiple statements separated by a comma.

The count is not initialized here. Although the default value is 0, it is recommended to initialize it as 0:

Awk 'in in {count = 0; print "[start] user count is", count} {count = count + 1; print $0 ;} END {print "[end] user count is", count} '/etc/passwd
[Start] user count is 0
Root: x: 0: 0: root:/bin/bash
...
[End] user count is 40

Count the number of bytes occupied by files in a folder

Ls-l | awk 'in in {size = 0 ;}{ size = size + $5 ;}end {print "[END] size is", size }'
[End] size is 8657198

If the unit is M:

Ls-l | awk 'in in {size = 0 ;}{ size = size + $5 ;}end {print "[END] size is", size/1024/1024, "M "}'
[End] size is 8.25889 M

Note: statistics do not include subdirectories of folders.

Condition Statement

The condition statements in the awk are used for reference in the C language. See the following declaration method:

If (expression ){
Statement;
Statement;
......
}

If (expression ){
Statement;
} Else {
Statement2;
}

If (expression ){
Statement1;
} Else if (expression1 ){
Statement2;
} Else {
Statement3;
}

Count the number of bytes occupied by files in a folder and filter out files of 4096 size (usually folders ):

Ls-l | awk 'in in {size = 0; print "[start] size is", size} {if ($5! = 4096) {size = size + $5 ;}end {print "[END] size is", size/1024/1024, "M "}'
[End] size is 8.22339 M

Loop statement

The loop statements in awk are also used in C language and support while, do/while, for, break, and continue. These keywords have the same semantics as those in C language.

Array

Because the subscript of an array in awk can be numbers and letters, the subscript of an array is usually called a key ). Both values and keywords are stored in an internal table that uses hash for key/value applications. Because hash is not stored in sequence, you will find that the array content is not displayed in the expected order. Arrays and variables are automatically created when they are used, and awk automatically determines whether they are stored as numbers or strings. In general, arrays in awk are used to collect information from records. They can be used to calculate the sum, count words, and track the number of times the template is matched.

Show/etc/passwd account

Awk-F': ''BEGIN {count = 0 ;}{ name [count] = $1; count ++ ;}; END {for (I = 0; I <NR; I ++) print I, name [I]} '/etc/passwd
Root
Daemon
Bin
Sys
Sync
Games
......

Here we use the for loop to traverse the Array

Linux Text Formatting tool awk

Introduction and use of AWK

Linux awk Text Analysis Tool

Linux text processing tool awk

How to Use the awk command in Linux

Text Analysis Tool-awk

Use awk to format output text

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.