AWK Concise Tutorial

Source: Internet
Author: User

Some netizens read the first two days of "Linux should know the skills" hope I can teach them to use awk and sed, so, appeared this article. I estimate that these young friends of Gen Y may be a little unfamiliar with the ancient artifacts such as awk/sed, so I need this old guy to fry rehash. Moreover, awk is the artifact of the 1977-year Bell Labs , which is the year of the snake, the birth of awk, and the same age as me, so it is necessary to write an article for him.

Awk was called because it took the first character of the three-founder Alfred Aho,peter Weinberger, and Brian Kernighan's family name. To learn awk, you have to mention a fairly classic book of AWK, the awk programming Language, which scored 9.4 points on the watercress! It sells for 1022.30 dollars on Amazon.

My tutorial here does not want to be exhaustive, this article and my previous go language introduction, are all examples, basically no nonsense.

I only want to achieve two purposes:

1) You can go to work by bus, or when you sit on the toilet and pull your stool (guaranteed to be a bubble stool).

2) I just want this blog post to be like a hot stripper to stir up your interest, and then you have to do it yourself.

Talk less, we start to take off (note: Here is just topless).

Start on stage

I extracted the following information from the netstat command as a use case:

$ cat Netstat.txtproto recv-q send-q local-address foreign-address statetcp 0 0 0.0.0.0:3 306 0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* LIS        Tentcp 0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0 coolshell.cn:80        124.205.5.146:18245 time_waittcp 0 0 coolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 110.194.134.189:1032 establishedtcp 0 0 coolshell.cn:80 1        23.169.124.111:49809 establishedtcp 0 0 coolshell.cn:80 116.234.127.77:11502 fin_wait2tcp 0 0 coolshell.cn:80 123.169.124.111:49829 establishedtcp 0 0 coolshell.cn:80 1        83.60.215.36:36970 time_waittcp 0 4166 coolshell.cn:80 61.148.242.38:30901 establishedtcp 0 1 Coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 0 coolshell.cn:80 110.194.134.189:4796 Establishedtcp 0 0 coolshell.cn:80 183.60.212.163:51082 time_waittcp 0 1 Coolshell. cn:80 208.115.113.92:50601 last_acktcp 0 0 coolshell.cn:80 123.169.124.111:49840 ES                  Tablishedtcp 0 0 coolshell.cn:80 117.136.20.85:50025 fin_wait2tcp 0 0::: 22 :::* LISTEN

The following is the simplest and most commonly used example of awk, which outputs columns 1th and 4th,

    • Where the curly braces in the single quotation mark are the statements of awk, note that they can only be enclosed in quotation marks.
    • One of the $: $n represents the first example. Note: The whole row is represented by $ A.
$ Awk ' {print $, $4} ' Netstat.txtproto local-addresstcp 0.0.0.0:3306tcp 0.0.0.0:80tcp 127.0.0.1:9000tcp coolshell.cn : 80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn : 80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn : 80tcp coolshell.cn:80tcp::: 22

Let's take a look at Awk's formatted output, which is no different from the C-language printf:

$ awk ' {printf '%-8s%-8s%-8s%-18s%-22s%-15s\n ", $1,$2,$3,$4,$5,$6} ' Netstat.txtproto recv-q send-q local-addres        s foreign-address statetcp 0 0 0.0.0.0:3306 0.0.0.0:* listentcp 0              0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0.0.0.0:* Listentcp 0 0 coolshell.cn:80 124.205.5.146:18245 time_waittcp 0 0 C oolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 110.194.134.189:1032 EST Ablishedtcp 0 0 coolshell.cn:80 123.169.124.111:49809 establishedtcp 0 0 Coolshe ll.cn:80 116.234.127.77:11502 fin_wait2tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISH     Edtcp 0 0 coolshell.cn:80 183.60.215.36:36970 time_waittcp 0 4166 coolshell.cn:80 61.148.242.38:30901   ESTABLISHEDTCP 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 0 COO lshell.cn:80 110.194.134.189:4796 establishedtcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIM E_WAITTCP 0 1 coolshell.cn:80 208.115.113.92:50601 last_acktcp 0 0 coolshell.cn       : 123.169.124.111:49840 establishedtcp 0 0 coolshell.cn:80 117.136.20.85:50025 fin_wait2tcp 0 0::: $:::* LISTEN
Take off your coat, filter records.

Let's take a look at how to filter records (the following filter is: The value of the third column is 0 && 6th column is listen)

$ Awk ' $3==0 && $6== "LISTEN" ' netstat.txttcp        0      0 0.0.0.0:3306               0.0.0.0:*              listentcp        0      0 0.0.0.0:80                 0.0.0.0:*              listentcp        0      0 127.0.0.1:9000             0.0.0.0:*              listentcp        0      0: ::                      LISTEN:::*                   

where the "= =" is the comparison operator. Other comparison operators:! =, <, >=, <=

Let's look at the various ways to filter records:

$ Awk ' $3>0 {print $} ' Netstat.txtproto recv-q send-q local-address          foreign-address             statetcp        0   4166 coolshell.cn:80        61.148.242.38:30901         establishedtcp        0      1 coolshell.cn:80        124.152.181.209:26825       fin_wait1tcp        0      1 coolshell.cn:80        208.115.113.92:50601        last_ack

If we need a table header, we can introduce the built-in variable nr:

$ Awk ' $3==0 && $6== "LISTEN" | | Nr==1 ' Netstat.txtproto recv-q send-q local-address          foreign-address             statetcp        0      0 0.0.0.0:3306           0.0.0.0:*                   Listentcp        0      0 0.0.0.0:80             0.0.0.0:*                   listentcp        0      0 127.0.0.1:9000         0.0.0.0:*                   listentcp        0 0::: $:                  ::*                        LISTEN

Plus formatted output:

$ Awk ' $3==0 && $6== "LISTEN" | | nr==1 {printf "%-20s%-20s%s\n", $4,$5,$6} ' netstat.txtlocal-address        foreign-address      state0.0.0.0:3306         0.0.0.0:*            listen0.0.0.0:80           0.0.0.0:*            listen127.0.0.1:9000       0.0.0.0:*            listen:::22                :::*                 LISTEN
built-in variables

Speaking of built-in variables, we can look at some of the built-in variables of awk:

$0 current record (this variable holds the contents of the entire row)
$1~ $n the nth field of the current record, separated by FS
fs the input field delimiter default is a space or tab
nf The number of fields in the current record, is how many columns
nr The number of records that have been read, is the line number, starting from 1, if there are multiple file words, this value is constantly accumulating.
fnr The current record count, unlike NR, this value will be the individual file's own line number
rs Enter the record delimiter, which defaults to line break
ofs output field delimiter, default is also a space
ors output record delimiter, default to line break
filename name of the current input file

How to use it, for example: if we want to output line number:

$ Awk ' $3==0 && $6== "established" | | nr==1 {printf "%02s%s%-20s%-20s%s\n", NR, FNR, $4,$5,$6} ' netstat.txt01 1 local-address        foreign-address      State0 7 7 coolshell.cn:80      110.194.134.189:1032 ESTABLISHED08 8 coolshell.cn:80      123.169.124.111:49809 ESTABLISHED10 Ten coolshell.cn:80      123.169.124.111:49829 ESTABLISHED14 coolshell.cn:80      110.194.134.189:4796 ESTABLISHED17 coolshell.cn:80      123.169.124.111:49840 established
Specify delimiter
$  awk  ' begin{fs= ': '} {print $1,$3,$6} '/etc/passwdroot 0/rootbin 1/bindaemon 2/sbinadm 3/VAR/ADMLP 4/var/spoo L/lpdsync 5/sbinshutdown 6/sbinhalt 7/sbin

The above command is also equivalent to: (-F means specifying a delimiter)

$ awk-  f: ' {print $1,$3,$6} '/etc/passwd

Note: If you want to specify more than one delimiter, you can do this:

Awk-f ' [;:] '

Let's take a look at the example of using \ t as the delimiter output (the following uses the/etc/passwd file, which is delimited by:):

$ awk-  f: ' {print $1,$3,$6} ' ofs= "\ T"/etc/passwdroot    0       /rootbin     1       /bindaemon  2       / Sbinadm     3       /VAR/ADMLP      4       /var/spool/lpdsync    5       /sbin
Take off the shirt string match

Let's look at some examples of string matching:

$ awk $6 ~/fin/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1 local-address foreign-address State6 coolshell.cn:80 61.140 .101.185:37538 fin_wait29 coolshell.cn:80 116.234.127.77:11502 fin_wait213 coolshell.cn:80 124.152.181.20 9:26825 fin_wait118 coolshell.cn:80 117.136.20.85:50025 fin_wait2$ $ awk ' $6 ~/wait/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1 local-address foreign-address State5 coolshell.cn:80 124.20 5.5.146:18245 time_wait6 coolshell.cn:80 61.140.101.185:37538 fin_wait29 coolshell.cn:80 116.234.127.77   : 11502 fin_wait211 coolshell.cn:80 183.60.215.36:36970 time_wait13 coolshell.cn:80 124.152.181.209:26825 fin_wait115 coolshell.cn:80 183.60.212.163:51082 time_wait18 coolshell.cn:80 117.136.20.85:50025 FIN_WA IT2 

The first example above matches the fin status, and the second example matches the status of the wait typeface. In fact ~ indicates that the pattern begins. //Medium is the mode. This is the match of a regular expression.

In fact, awk can match the first line like grep, like this:

$ awk '/listen/' netstat.txttcp        0      0 0.0.0.0:3306            0.0.0.0:*               listentcp        0      0 0.0.0.0:80              0.0.0.0:*               listentcp        0      0 127.0.0.1:9000          0.0.0.0:*               listentcp        0      0::                   : +:::*                    LISTEN

We can use the/fin| time/"to match FIN or time:

$ awk ' $6 ~/fin| time/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1       local-address   foreign-address State5       coolshell.cn:80 124.205.5.146:18245     time_wait6       coolshell.cn:80 61.140.101.185:37538    fin_wait29       coolshell.cn:80 116.234.127.77:11502    fin_wait211      coolshell.cn:80 183.60.215.36:36970     time_wait13      coolshell.cn : 124.152.181.209:26825   fin_wait115      coolshell.cn:80 183.60.212.163:51082    time_wait18      coolshell.cn:80 117.136.20.85:50025     Fin_wait2

Let's take a look at the example of pattern inversion:

$ Awk ' $6!~/wait/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1       local-address   foreign-address State2       0.0.0.0:3306    0.0.0.0:*       LISTEN3       0.0.0.0:80      0.0.0.0:*       LISTEN4       127.0.0.1:9000  0.0.0.0:*       LISTEN7       coolshell.cn:80 110.194.134.189:1032    ESTABLISHED8       coolshell.cn:80 123.169.124.111:49809   ESTABLISHED10      coolshell.cn:80 123.169.124.111:49829   ESTABLISHED12      coolshell.cn:80 61.148.242.38:30901     ESTABLISHED14      coolshell.cn:80 110.194.134.189:4796    ESTABLISHED16      coolshell.cn:80 208.115.113.92:50601    last_ ACK17      coolshell.cn:80 123.169.124.111:49840   ESTABLISHED19      ::: +:   ::*    LISTEN

Or:

awk '!/wait/' netstat.txt

Split file

The

awk split file is simple, so it's OK to use redirection. The following example separates the file by the 6th case, which is fairly simple (where the nr!=1 indicates that the table header is not processed).

$ Awk ' nr!=1{print > $6} ' netstat.txt$ lsestablished fin_wait1 fin_wait2 last_ack LISTEN netstat.txt time_wait$ C At establishedtcp 0 0 coolshell.cn:80 110.194.134.189:1032 establishedtcp 0 0 Coolsh       ell.cn:80 123.169.124.111:49809 establishedtcp 0 0 coolshell.cn:80 123.169.124.111:49829 Establishedtcp 0 4166 coolshell.cn:80 61.148.242.38:30901 establishedtcp 0 0 Cool        shell.cn:80 110.194.134.189:4796 establishedtcp 0 0 coolshell.cn:80 123.169.124.111:49840 established$ Cat fin_wait1tcp 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1$ cat FIN        _wait2tcp 0 0 coolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 116.234.127.77:11502 fin_wait2tcp 0 0 coolshell.cn:80 117.136.20.85:50025 Fin_wai     t2$ Cat Last_acktcp   0 1 coolshell.cn:80 208.115.113.92:50601 last_ack$ cat listentcp 0 0 0.0.0.0:3306        0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* listentcp                        0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0::: +:::* listen$ Cat time_waittcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAITTC P 0 0 coolshell.cn:80 183.60.215.36:36970 time_waittcp 0 0 coolshell.cn:80 1 83.60.212.163:51082 time_wait

You can also export the specified column to a file:

awk ' Nr!=1{print $4,$5 > $6} ' netstat.txt

A bit more complicated: (Note the IF-ELSE-IF statement, which shows that awk is actually a scripting interpreter)

$ Awk ' nr!=1{if ($6 ~/time| established/) print > "1.txt", else if ($6 ~/listen/) print > "2.txt", else print > "3.txt"} ' netstat.txt$ ls?.        Txt1.txt 2.txt 3.txt$ Cat 1.txttcp 0 0 coolshell.cn:80 124.205.5.146:18245 time_waittcp 0 0 coolshell.cn:80 110.194.134.189:1032 establishedtcp 0 0 coolshell.cn:80 123.16        9.124.111:49809 establishedtcp 0 0 coolshell.cn:80 123.169.124.111:49829 establishedtcp 0 0 coolshell.cn:80 183.60.215.36:36970 time_waittcp 0 4166 coolshell.cn:80 61.148        .242.38:30901 establishedtcp 0 0 coolshell.cn:80 110.194.134.189:4796 establishedtcp 0 0 coolshell.cn:80 183.60.212.163:51082 time_waittcp 0 0 coolshell.cn:80 123.16 9.124.111:49840 established$ Cat 2.txttcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTENTCP 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0. 0.0.0:* listentcp 0 0::: $:::* listen$ Cat 3.TXTTC P 0 0 coolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 1        16.234.127.77:11502 fin_wait2tcp 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 1 coolshell.cn:80 208.115.113.92:50601 last_acktcp 0 0 coolshell.cn:80 117.13 6.20.85:50025 Fin_wait2
Statistics

The following command calculates the sum of file sizes for all C files, CPP files, and H files.

$ ls-l  *.cpp *.c *.h | awk ' {sum+=$5} END {print sum} ' 2511401

Let's take a look at the usage of the stats for each connection state: (we can see some of the programming shadows, everyone is a programmer and I don't explain.) Note the usage of the array)

$ Awk ' nr!=1{a[$6]++;} END {for (i in a) print I "," A[i ";} ' Netstat.txttime_wait, 3fin_wait1, 1ESTABLISHED, 6fin_wait2, 3last_ack, 1LISTEN, 4

Take a look at how much memory is counted for each user's process (note: Sum's RSS column)

$ ps aux | awk ' nr!=1{a[$1]+=$6;} END {for (i in a) print I "," a[i] "KB";} ' Dbus, 540KBmysql, 99928KBwww, 3264924KBroot, 63644KBhchen, 6020KB
Take off the underwear awk script

In the above we can see an end keyword. End means "to process the identity of all the lines", that is to say the end it is necessary to introduce the BEGIN, these two keywords mean before and after the execution of the meaning, the syntax is as follows:

    • Begin{This puts the pre-execution statement}
    • END {This contains the statement to be executed after all the rows have been processed}
    • {This contains the statement to be executed for each row}

To get this straight, let's take a look at the following example:

Suppose there is such a file (Student score table):

$ cat Score.txtmarry   2143 77Jack    2321 45Tom     2122 71Mike    2537     2415 40 57 62

Our awk script is as follows (I did not write on the command line because it is not easy to read on the command line, and another usage is introduced):

$ cat cal.awk#!/bin/awk-f# run before begin {    math = 0    中文版 = 0    computer = 0    printf "NAME    No.   MATH  中文版  computer   total\n "    printf"---------------------------------------------\ n "} #运行中 {    math+=$3    english+=$4    computer+=$5    printf "%-6s%-6s%4d%8d%8d%8d\n", $, $, $3,$4,$5, $3+$4+$5} #运行后END {    printf "---------------------------------------------\ n"    printf "  total:%10d%8d%8d \ n", Math, 中文版, computer    printf "average:%10.2f%8.2f%8.2f\n", Math/nr, English/nr, COMPUTER/NR}

Let's take a look at the execution result: (You can also run this./cal.awk score.txt)

$ awk-f Cal.awk score.txtname    NO.   MATH  中文版  computer total   ---------------------------------------------Marry  2143      239Jack 2321,      189Tom       2122      196Mike   2537      279Bob    2415      159-------------- -------------------------------Total  :       319      393      350AVERAGE:     63.80    78.60    70.00
Environment variables

Now that we have a script, let's see how we interact with environment variables: (using the-v parameter and environ, the environment variable using environ needs export)

$ x=5$ y=10$ export y$ echo $x $y 5 10$ awk-v val= $x ' {print $, $ $, $ $, $4+val, $5+environ["y"]} ' ofs= ' \ t ' Score.txtmarry   2143      87Jack    2321      55Tom      2122      81Mike    2537      102     105Bob     2415      72
A few flowers live

Finally, let's look at a few small examples:

Do it yourself.

For some of these points of knowledge, refer to Gawk's manual:

    • Built-in variables, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din-Variables
    • For flow control, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Statements
    • Built-in functions, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din
    • Regular expressions, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Regexp

(End of full text)

AWK Concise Tutorial

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.