awk Usage Introduction

Last Update:2015-01-20 Source: Internet

Author: User

Tags scalar

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The structure of the AWK program is as follows:

awk ' begin{print ' start '} pattern {commands} end{print "END"}  File

The awk program consists of 3 parts: A BEGIN statement block, an end statement block, and a generic statement that can be matched using a pattern (regular expression/relationship expression), any part of which can be omitted, and the BEGIN statement block executes at the very beginning of the program, usually doing some initialization work, The end statement block is executed at the end of the program, usually in the final format of the output result. The mode section works as follows: Gets a row first, checks whether the row matches the provided style, and executes the statement in the {} corresponding to the style if it matches.

Simply put, the BEGIN statement block (if any) is executed first, then the data in the file is checked row by line to see if it satisfies the pattern, and if it does, executes the statement in {}, if it is not satisfied, does nothing, and executes the end statement block (if any) at the end of the program.

You can also get the data through the pipeline, as follows:

Cat File | awk ' begin{print ' start '} pattern {commands} end{print "END"}

2.2 Hello,world

Let's take a look at how to print Hello,world in awk, as mentioned above, the AWK program consists of three parts (Begin statement block, pattern, end statement block), any part of the three parts can be omitted, the simplest way to print Hello,world As follows:

Only the BEGIN statement block

echo | awk ' begin{print ' Hello, World '} '

Only the end statement block

echo | awk ' end{print ' Hello, World '} '

Only the pattern part

In the pattern section, because pattern is empty, there is only {} statement blocks, so the program considers that each row of data satisfies the schema and executes the contents of the {} statement block.

2.3 In-depth understanding of three statement blocks

Again, AWK executes the BEGIN statement block before the program executes, then reads a row of data, checks to see if the row's data satisfies the pattern, executes the contents of the {} statement block corresponding to the pattern, reads the next row of data again, performs a second pattern check until all the data is read, and at the end of the program, Executes the statement in the end statement block. A standard awk program is as follows:

Seq 5 | awk ' begin{print ' BEGIN '} $ > 2 {print} end{print "END"} '

Output Result:

Begin345end

The program first executes the BEGIN statement block, outputs begin, and then checks to see if the schema $ > 2 (explained later) is satisfied, executes the statement in the {} block, that is, print, in awk, there is nothing behind print, which means that the contents of the line are printed. The end statement block is executed at the end of the program, printing end.

2.4 Special variables in awk

AWK handles text very conveniently because it provides a number of built-in features and variables that allow us to easily manipulate the data.

NR: Indicates the amount of records (number of record), which is equivalent to the current line number during execution
NF: Indicates the number of fields in the current row
$ A scalar contains the textual content of the current line during execution
This scalar contains the contents of the first field in the current row
The contents of the second field in the current row
FS domain separator, equivalent to the-D option in the Sort command and the cut command
OFS output, the domain separator, the default is "\ T"

2.5 Examples

Here are a few simple examples, assuming you have a file emp.data with the following data:

Beth    4.00    0Dan 3.75    0Kathy   4.00    10Mark    5.00    20Mary    5.50    22Susie   4.25    18

Among them, the first column is the employee's name, the second column is hourly wages, the 3rd column is the length of work, in order to get the wages of each employee, only need the following statement:

awk ' $ > 0 {print $, $ emp.data} '

Since awk has divided the data for us, we just need to quote directly, and when we use the variable, it means the first field of the current row, that is, the name, when we use the variable $ $, it refers to the 3rd field, that is, the length of the work.

If you want to see which employees are not working, that is, employees who work 0 hours, just like this:

awk ' $ = = 0 {print} ' Emp.data

or this:

awk ' $ = = 0 {print} ' Emp.data

The above two statements for the demo, print without any parameters, is to print all the contents of the current line, and also represents all the contents of the current line, so, the above two statements are the same, if you want to see the name of the employee who is working 0, then:

awk ' $ = = 0 {print $ emp.data} '

You can also print out the line number to make it easy to see how several employees are not working:

awk ' $ = = 0 {print NR, ': \ t ', ' $  ' emp.data

If, you just want to see Kathy's salary, then:

awk ' $ = = Kathy {print $ $ * $ $} ' Emp.data

or use regular expressions:

awk ' $ ~/kathy/{print $, $ * $ $} ' Emp.data

The output can also be formatted with printf in Awk, and printf is used in the same way as printf in the C language, such as:

awk ' $ > 0{printf ("Total pay for%s is%.2f\n", $ $ * $ $)} ' Emp.data

Patterns can be used in conjunction, such as:

awk ' $ >= 4 | | $ >= {print $} ' Emp.data

The output results are as follows:

Beth    4.00    0Kathy   4.00    10Mark    5.00    20Mary    5.50    22Susie   4.25    18

2.6 Data validation

Patterns are typically used to select the data that needs to be processed, for example, above we use $ > A to select employees who do not have 0 working hours, and can also go to the following:

awk ' nr < 5 {} '         # handles only the first 4 rows, the line number is less than 5 of the row awk ' nr = = 1, NR = = 4{} ' # line number between 1 and 4 awk '/linux/{} '          # contains the lines of the style Linux awk '!/lunux /{} '         # does not contain lines of style Linux

In addition, the pattern can be used to validate the data as follows:

NF! = 3   {print $, "Number of fields are not equal to 3"}$2 < 3.35 {print $, "rate is below minimum wage"         }$ 2 >   {print $, "rate exceeds $ $ per hour"          }$3 < 0    {print $, "negative hours worked"              }$3 >   {print $, "Too many hours worked"              }

2.7 The function of the BEGIN statement block

The BEGIN statement block is typically used to output header information, or to process additional information beforehand. For example, we can print the caption like this:

awk ' begin{print ' NAME    rate    HOURS "; print" "} {print $} ' Emp.data

The output information is as follows:

NAME rate   hoursbeth    4.00    0Dan 3.75    0Kathy   4.00    10Mark    5.00    20Mary    5.50    22Susie   4.25    18

The more common usage is to execute FS in the BEGIN statement block, which is the domain delimiter, as previously said, the domain delimiter is similar to the-D option in the Sort command and the cut command, for example, you want to view each used and his home directory, using Cut, the method is as follows:

Cut-d:-f1,6/etc/passwd

It's also easy to use awk

awk ' begin{fs= ': '} {print $, $6} '/etc/passwd

We can also write awk to the file

#print. Awk-print user and it ' s home dirbegin{fs = ":"} {    print $, $6}

You only need to enter the following command when executing:

Awk-f print.awk/etc/passwd

cat/etc/passwd | Awk-f Print.awk

2.8 End Statement Block

The end statement block is primarily used to output some aggregated information, for example, we want to know exactly how many employees:

awk ' END {print NR, ' employees '} ' Emp.data

We can also easily get the total wage and the average salary per employee:

#count. Awk-compute the average pay {pay = Pay + $ $    * $    }end{print NR, employees     print "Total pay are", pay     print "Average pay is", Pay/nr}awk-f Count.awk Emp.data

The output results are as follows:

6 employeestotal pay are 337.5average pay is 56.25

2.9 Control Statements in awk

The control statements in AWK are the same as C usage and C, but there is no switch.

#reverse-print Input in reverse order by line{LINE[NR] = $} #remember each input lineend{i = NR while    (i > 0 {        print line[i]        i--    }} #reverse-print input in reverse order by line{LINE[NR] = $} #remember each input l ineend{for    (i = NR; i > 0; i--)        print line[i]    }

In addition to this, the for loop is slightly different, with the For loop in Awk in two ways:

for (i = 0; i < i++) {print $i;} For (i in array) {print array[$i];}

2.10 Associative arrays

The so-called associative array is an array that can be used either as a number or as a string. To demonstrate the use of associative arrays, consider the following examples. Suppose there are file files, the data is as follows:

item1,200item2,500item3,900item2,800item1,600

In file files, some item appears several times, assuming you want to sum the same item:

Awk-f, ' {a[$1] + = $ $ $ end{for (i in a) print I "," a[i]} ' file

The output results are as follows:

Item1, 800ITEM2, 1300ITEM3, 900

There are two new points of knowledge, one using associative arrays, we don't need to initialize an array, we don't have to know how many elements are in the array, because we can output the contents of the array in the form of a second for loop, the second knowledge point is the print statement, and the print statement is not separated by commas. Instead, connect the fields directly and concatenate the strings in awk as follows:

str = "Hello" "World" "!"

If we just want to print the largest number of items in each item, instead of adding them up, you can do the following:

Awk-f, ' {if (a[$1] < $) a[$1] = $ end{for (i in a) {print I, A[i]}} ' ofs=, file

The output results are as follows:

item1,600item2,800item3,900

Count the occurrences of each item:

Awk-f, ' {a[$1]++}end{for (I in a) print I, A[i]} ' file

The output results are as follows:

Item1 2item2 2item3 1

Print the first occurrence of each item:

Awk-f, '!a[$1]++ ' file

The output results are as follows:

item1,200item2,500item3,900

Here, only the schema, without the {} statement block, is output by default.

2.11 Built-in functions in AWK

AWK provides a number of functions to help users work more efficiently

3. A handful of Userful "one-liners" print the total number of the input lines

END {print NR}

Print the tenth input line:

NR = = 10

Print the last field of every input line

{Print $NF}

Print the last field of the last input line

{field = $NF} END {print Field}

Print every input line + than four fields

NF > 4

Print every input line in which the last field was more than 4

$NF > 4

Print the total number of fields in all input lines:

{NF = nf + NF} END {print NF}

Print the total number of lines that contain Beth

/beth/{nlines + = 1}end {print Nlines}

Print the largest first field and the line this contains it (assumes some is positive)

$ > Max {max = $ maxline = $}end {print Max, maxline}

Print every line, have at least one field

NF > 0

Print every line longer than characters

Length ($) > 80

Print the number of fields in every line followed by the line itself:

{print NF, $}

Print the first in opposite order, of every line

{print $, $ $}

Exchange the first every ine and then print the line

{temp = $; $ = $ = $; $ = temp; Print}

Print every line with the first field replaced by the line number

{$ = NR; Print}

Print every line after erasing the second filed

{$ = ""; Print

Print in reverse order the fields of every line

{for (i = 1; I <= NF; ++i) printf ("%s", $i) print ""}

Print the sums of the Every line

{sum = 0 for (i = 1; I <= NF; ++i) sum + = $i Print sum}

Add up all lines and print the sum

{for (i = 1; I <= NF; ++i) sum + = $iEND {print sum}

Print every line after replacing all field by its absolute value

{for (i = 1; I <= NF; ++i) if ($i < 0) $i =-$i Print}

4. Application of the AWK program

4.1 Count word Occurrences

The first program we analyzed is a program that counts the number of occurrences of a word, because awk provides an associative array, and by default initializes the variable to 0, so awk solves the problem with a handy, I've discussed it in the previous article, and I tried to use C,c++,shell to solve it in that article.

But this problem uses awk to solve the most simple, but also easy to deal with the problem of punctuation, the program is as follows:

#wordfreq-print Number of occurences of each word#input:text#output:number-word pairs sorted by number{gsub (/[.=,:;!? () {}]/, "") #remove punctuation for    (i = 1; I <= NF; i++)        count[$i]++        }end {for    (W in count)        print count[w], W | "Sort-rn"}

4.2 Data processing

Although awk is not all-powerful, it solves a lot of problems, but it is best at data processing, assuming the following:

USSR    8649    275 asiacanada  3852    North  americachina 3705    1032    Asiausa 3615    237 North Americabrazil  3286    134 South Americaindia   1267    746 Asiamexico  762  North Americafrance  211  europejapan   144 asiagermany  56 Europeengland 94  Europe

The first is the country, the second is the size of the country, the third is the population, the last is the continent, and if we want to sort by the name of the continent in ascending order and then in descending order of the population density of each country, what should we do?

I don't know what Excel does, it should be, the benefit of learning awk is that you can solve a lot of Excel problems with awk without having to study the fool-like Excel, and obviously, awk is more flexible.

To solve this problem, the basic idea for AWK to solve this kind of problem is to prepare the data--sort (or other processing)--to format the output.

4.3 Markov chain algorithm

Definition of Markov chain algorithm see this article, I discussed in that article the implementation of Markov chain algorithm in various languages, this is simply a question of demonstrating the merits of awk, the specific procedures are as follows, no longer discussed in detail.

# Copyright (C) 1999 Lucent technologies# excerpted from ' The Practice of programming ' # by Brian W. Kernighan and Rob Pike # Markov.awk:markov chain algorithm for 2-word prefixesbegin {maxgen = 10000; Nonword = "\ n"; W1 = W2 = Nonword} {for   (i = 1; I <= NF; i++) {     # read all words        statetab[w1,w2,++nsuffix[w1,w2]] = $i 
   
    W1 = W2        W2 = $i    }}end {    statetab[w1,w2,++nsuffix[w1,w2]] = nonword  # add tail    w1 = W2 = Nonword    F or (i = 0; i < Maxgen; i++) {  # generate        r = Int (rand () *NSUFFIX[W1,W2]) + 1  # nsuffix >= 1        p = s TATETAB[W1,W2,R]        if (p = = nonword)            exit        Print p        w1 = W2         # advance chain        w2 = P    }}

Transferred from: http://www.lvtao.net/tool/awk.html

awk Usage Introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More