The introduction to awk tutorial under Linux

Source: Internet
Author: User
Tags numeric value printf regular expression sort split

Brief introduction

Awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing and generating reports on data, compared to grep lookup and sed editing. To put it simply, awk reads the file line by row, using a space as the default delimiter to slice each row, cut the section, and then perform various analytical processing.

AWK has 3 different versions: AWK, Nawk, and gawk, which are not specifically described, and generally refer to Gawk,gawk as the GNU version of awk.

Awk has its name from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan. In fact Awk does have its own language: The AWK programming language, which the three-bit creator has formally defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, as well as countless other features.

How to use

awk ' {pattern + action} ' {filenames} ' although the operation can be complex, the syntax is always the case, where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.

The most basic function of the awk language is to browse and extract information based on specified rules in a file or string, and awk extracts the information before it can perform other text operations. A complete awk script is typically used to format the information in a text file.

Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command.

Invoke awk

There are three ways of calling Awk

1. Command line mode

awk [f field-separator] ' commands ' input-file (s)

Where commands is the true awk command, [-f field separator] is optional. Input-file (s) is the file to be processed.

In awk, each item in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space without naming the-f field separator.

2.shell Scripting Way

Insert all of the awk commands into a file and make the awk program executable, and then the awk command interpreter is invoked as the first line of the script, once again by typing the script name.

Equivalent to the first line of a shell script: #!/bin/sh

Can be replaced by: #!/bin/awk

3. Insert all awk commands into a separate file, and then call:

Awk-f awk-script-file Input-file (s)

Where the-f option loads the awk script in Awk-script-file, Input-file (s) is the same as above. This chapter focuses on the command-line approach.

Getting Started example

Suppose the output of Last-n 5 is as follows

The code is as follows:

[Root@www ~]# last-n 5 <== Only remove the first five elements

Root pts/1 192.168.1.100 Tue Feb 11:21 still in

Root PTS/1 192.168.1.100 Tue Feb 10 00:46-02:28 (01:41)

Root PTS/1 192.168.1.100 Mon Feb 9 11:41-18:30 (06:48)

Dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41-11:41 (00:00)

Root tty1 Fri Sep 5 14:09-14:10 (00:01) If only 5 accounts that have recently logged in are displayed

< p> #last-N 5 | awk ' {print $} ' Rootrootrootdmtsairootawk

The workflow is as follows: Read a record with an ' n ' newline character split, and then divide the record by the specified field delimiter, fill the field, and $ = all fields, representing the first field, $n the nth field. The default Domain delimiter is the blank key or the [tab] key, so it represents the Logged-in user, the $ $ means the logged-on user IP, and so on.

If you just show/etc/passwd's account

The code is as follows:

#cat/etc/passwd |awk-f ': ' {print $} '

Root

Daemon

Bin

Sys

This is an example of awk+action, where each row executes action{print $}.

-f Specifies that the field separator is ': '.

If you just display the/etc/passwd account and the corresponding shell of the account, and the account and the shell are separated by the TAB key

The code is as follows:

#cat/etc/passwd |awk-f ': ' {print ' t ' $} '

Root/bin/bash

Daemon/bin/sh

Bin/bin/sh

Sys/bin/sh

Add "Blue,/bin/nosh" on the last line if you just display the/etc/passwd account and the corresponding shell of the account, and the account is separated from the shell by a comma, and the column name Name,shell is added to all rows.

The code is as follows:

CAT/ETC/PASSWD |awk-f ': ' BEGIN {print ' Name,shell '} {print $ ', ' $} end {print ' Blue,/bin/nosh '} '

Name,shell

Root,/bin/bash

Daemon,/bin/sh

Bin,/bin/sh

Sys,/bin/sh

....

Blue,/bin/noshawk

The workflow is this: Execute beging First, then read the file, reads a record with the/n newline character split, then divides the record by the specified field delimiter, fills the field, and $ represents all fields, the first field, $n represents the Nth field, and then the action action for the pattern is started. Then start reading the second record. Until all the records have been read, the end operation is performed.

Search for all rows with the root keyword/etc/passwd

The code is as follows:

#awk-F: '/root/'/etc/passwd

Root:x:0:0:root:/root:/bin/bash

This is an example of pattern usage, where the line that matches the pattern (here is root) executes the action (no action is specified and the content of each row is output by default).

Search support Regular, for example to start with root: awk-f: '/^root/'/etc/passwd

Search for all lines with the root keyword in the/etc/passwd and display the corresponding shell

The code is as follows:

# awk-f: '/root/{print $} '/etc/passwd

/bin/bash

Action{print $} was specified here

awk Built-in variables

Awk has a number of built-in variables to set up environment information, which can be changed, and some of the most commonly used variables are given below.

Number of ARGC command line arguments

ARGV Command line parameter arrangement

The use of system environment variables in ENVIRON support queues

FileName awk Browse file name

Number of records FNR browsing files

FS Set input field separator, equivalent to command line-f option

NF browsing the number of fields recorded

The number of records that NR has read

OFS Output Field Separator

ORS Output Record Separator

RS Control Record Separator In addition, the $ variable refers to the entire record. Represents the first field in the current row, and $ $ represents the second field of the current row,...... Analogy

Statistics/etc/passwd: File name, line number for each line, number of columns per row, corresponding full line content:

The code is as follows:

#awk-F ': ' {print ' filename: "filename", linenumber: "NR", Columns: "NF", Linecontent: "$}"/etc/passwd

Filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash

Filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh

Filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh

Use printf instead of print to make your code simpler and easier to read

The code is as follows:

Awk-f ': ' {printf ("filename:%10s,linenumber:%s,columns:%s,linecontent:%sn", Filename,nr,nf,$0)} '/etc/passwd

Print and printf

The functions of print and printf two kinds of printouts are also available in awk.

Where the print function argument can be a variable, a numeric value, or a string. The string must be quoted in double quotes and the arguments are separated by commas. If there are no commas, the arguments are concatenated together without distinction. Here, the function of the comma is the same as the delimiter of the output file, except that the latter is a space.

printf functions, which are basically similar to printf in the C language, can format strings, and when output is complex, printf works better and the code is easier to understand.

Awk Day often uses learning notes:

# Remove the same part from two files

The code is as follows:

awk ' Nr==fnr{a[$0]=0;next}{if ($ in a) {print $}} ' file1 file2

# take out two different parts of a file

The code is as follows:

awk ' Nr==fnr{a[$0]=0;next}{if ( $ in a)) {print $}} ' file1 file2

# COMPUTE Nginx Log access Top 10 IP

The code is as follows:

awk ' {a[$1]++}end{for (i in a) print a[i],i} ' Access.log | Sort-rn | Head-10

#统计各个科目的数量

The code is as follows:

# cat Test.txt

XQQ Chinese Mathematics

XQ English language

X Mathematical Art

awk ' {for (i=2;i<=nf;i++) a[$i]++}end{for (i in a) print I,a[i]} ' test.txt

# Get System IP

The code is as follows:

Ifconfig eth0 | awk ' Nr==2{print $} ' | cut-d:-f2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.