Some netizens read the first two days of "Linux should know the skills" hope I can teach them to use awk and sed, so, appeared this article. I estimate that these young friends of Gen Y may be a little unfamiliar with the ancient artifacts such as awk/sed, so I need this old guy to fry rehash. Moreover, awk is the artifact of the 1977-year Bell Labs , which is the year of the snake, the birth of awk, and the same age as me, so it is necessary to write an article for him.
Awk was called because it took the first character of the three-founder Alfred Aho,peter Weinberger, and Brian Kernighan's family name. To learn awk, you have to mention a fairly classic book of AWK, the awk programming Language, which scored 9.4 points on the watercress! It sells for 1022.30 dollars on Amazon.
My tutorial here does not want to be exhaustive, this article and my previous go language introduction, are all examples, basically no nonsense.
I only want to achieve two purposes:
1) You can go to work by bus, or when you sit on the toilet and pull your stool (guaranteed to be a bubble stool).
2) I just want this blog post to be like a hot stripper to stir up your interest, and then you have to do it yourself.
Talk less, we start to take off (note: Here is just topless).
Start on stage
I extracted the following information from the netstat command as a use case:
$ cat Netstat.txtproto recv-q send-q local-address foreign-address statetcp 0 0 0.0.0.0:3 306 0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* LIS Tentcp 0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0 coolshell.cn:80 124.205.5.146:18245 time_waittcp 0 0 coolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 110.194.134.189:1032 establishedtcp 0 0 coolshell.cn:80 1 23.169.124.111:49809 establishedtcp 0 0 coolshell.cn:80 116.234.127.77:11502 fin_wait2tcp 0 0 coolshell.cn:80 123.169.124.111:49829 establishedtcp 0 0 coolshell.cn:80 1 83.60.215.36:36970 time_waittcp 0 4166 coolshell.cn:80 61.148.242.38:30901 establishedtcp 0 1 Coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 0 coolshell.cn:80 110.194.134.189:4796 Establishedtcp 0 0 coolshell.cn:80 183.60.212.163:51082 time_waittcp 0 1 Coolshell. cn:80 208.115.113.92:50601 last_acktcp 0 0 coolshell.cn:80 123.169.124.111:49840 ES Tablishedtcp 0 0 coolshell.cn:80 117.136.20.85:50025 fin_wait2tcp 0 0::: 22 :::* LISTEN
The following is the simplest and most commonly used example of awk, which outputs columns 1th and 4th,
- Where the curly braces in the single quotation mark are the statements of awk, note that they can only be enclosed in quotation marks.
- One of the $: $n represents the first example. Note: The whole row is represented by $ A.
$ Awk ' {print $, $4} ' Netstat.txtproto local-addresstcp 0.0.0.0:3306tcp 0.0.0.0:80tcp 127.0.0.1:9000tcp coolshell.cn : 80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn : 80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn:80tcp coolshell.cn : 80tcp coolshell.cn:80tcp::: 22
Let's take a look at Awk's formatted output, which is no different from the C-language printf:
$ awk ' {printf '%-8s%-8s%-8s%-18s%-22s%-15s\n ", $1,$2,$3,$4,$5,$6} ' Netstat.txtproto recv-q send-q local-addres s foreign-address statetcp 0 0 0.0.0.0:3306 0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0.0.0.0:* Listentcp 0 0 coolshell.cn:80 124.205.5.146:18245 time_waittcp 0 0 C oolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 110.194.134.189:1032 EST Ablishedtcp 0 0 coolshell.cn:80 123.169.124.111:49809 establishedtcp 0 0 Coolshe ll.cn:80 116.234.127.77:11502 fin_wait2tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISH Edtcp 0 0 coolshell.cn:80 183.60.215.36:36970 time_waittcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHEDTCP 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 0 COO lshell.cn:80 110.194.134.189:4796 establishedtcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIM E_WAITTCP 0 1 coolshell.cn:80 208.115.113.92:50601 last_acktcp 0 0 coolshell.cn : 123.169.124.111:49840 establishedtcp 0 0 coolshell.cn:80 117.136.20.85:50025 fin_wait2tcp 0 0::: $:::* LISTEN
Take off your coat, filter records.
Let's take a look at how to filter records (the following filter is: The value of the third column is 0 && 6th column is listen)
$ Awk ' $3==0 && $6== "LISTEN" ' netstat.txttcp 0 0 0.0.0.0:3306 0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0: :: LISTEN:::*
where the "= =" is the comparison operator. Other comparison operators:! =, <, >=, <=
Let's look at the various ways to filter records:
$ Awk ' $3>0 {print $} ' Netstat.txtproto recv-q send-q local-address foreign-address statetcp 0 4166 coolshell.cn:80 61.148.242.38:30901 establishedtcp 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 1 coolshell.cn:80 208.115.113.92:50601 last_ack
If we need a table header, we can introduce the built-in variable nr:
$ Awk ' $3==0 && $6== "LISTEN" | | Nr==1 ' Netstat.txtproto recv-q send-q local-address foreign-address statetcp 0 0 0.0.0.0:3306 0.0.0.0:* Listentcp 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0::: $: ::* LISTEN
Plus formatted output:
$ Awk ' $3==0 && $6== "LISTEN" | | nr==1 {printf "%-20s%-20s%s\n", $4,$5,$6} ' netstat.txtlocal-address foreign-address state0.0.0.0:3306 0.0.0.0:* listen0.0.0.0:80 0.0.0.0:* listen127.0.0.1:9000 0.0.0.0:* listen:::22 :::* LISTEN
built-in variables
Speaking of built-in variables, we can look at some of the built-in variables of awk:
$0 |
current record (this variable holds the contents of the entire row) |
$1~ $n |
the nth field of the current record, separated by FS |
fs |
the input field delimiter default is a space or tab |
nf |
The number of fields in the current record, is how many columns |
nr |
The number of records that have been read, is the line number, starting from 1, if there are multiple file words, this value is constantly accumulating. |
fnr |
The current record count, unlike NR, this value will be the individual file's own line number |
rs |
Enter the record delimiter, which defaults to line break |
ofs |
output field delimiter, default is also a space |
ors |
output record delimiter, default to line break |
filename |
name of the current input file |
How to use it, for example: if we want to output line number:
$ Awk ' $3==0 && $6== "established" | | nr==1 {printf "%02s%s%-20s%-20s%s\n", NR, FNR, $4,$5,$6} ' netstat.txt01 1 local-address foreign-address State0 7 7 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED08 8 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED10 Ten coolshell.cn:80 123.169.124.111:49829 ESTABLISHED14 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED17 coolshell.cn:80 123.169.124.111:49840 established
Specify delimiter
$ awk ' begin{fs= ': '} {print $1,$3,$6} '/etc/passwdroot 0/rootbin 1/bindaemon 2/sbinadm 3/VAR/ADMLP 4/var/spoo L/lpdsync 5/sbinshutdown 6/sbinhalt 7/sbin
The above command is also equivalent to: (-F means specifying a delimiter)
$ awk- f: ' {print $1,$3,$6} '/etc/passwd
Note: If you want to specify more than one delimiter, you can do this:
Awk-f ' [;:] '
Let's take a look at the example of using \ t as the delimiter output (the following uses the/etc/passwd file, which is delimited by:):
$ awk- f: ' {print $1,$3,$6} ' ofs= "\ T"/etc/passwdroot 0 /rootbin 1 /bindaemon 2 / Sbinadm 3 /VAR/ADMLP 4 /var/spool/lpdsync 5 /sbin
Take off the shirt string match
Let's look at some examples of string matching:
$ awk $6 ~/fin/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1 local-address foreign-address State6 coolshell.cn:80 61.140 .101.185:37538 fin_wait29 coolshell.cn:80 116.234.127.77:11502 fin_wait213 coolshell.cn:80 124.152.181.20 9:26825 fin_wait118 coolshell.cn:80 117.136.20.85:50025 fin_wait2$ $ awk ' $6 ~/wait/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1 local-address foreign-address State5 coolshell.cn:80 124.20 5.5.146:18245 time_wait6 coolshell.cn:80 61.140.101.185:37538 fin_wait29 coolshell.cn:80 116.234.127.77 : 11502 fin_wait211 coolshell.cn:80 183.60.215.36:36970 time_wait13 coolshell.cn:80 124.152.181.209:26825 fin_wait115 coolshell.cn:80 183.60.212.163:51082 time_wait18 coolshell.cn:80 117.136.20.85:50025 FIN_WA IT2
The first example above matches the fin status, and the second example matches the status of the wait typeface. In fact ~ indicates that the pattern begins. //Medium is the mode. This is the match of a regular expression.
In fact, awk can match the first line like grep, like this:
$ awk '/listen/' netstat.txttcp 0 0 0.0.0.0:3306 0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0:: : +:::* LISTEN
We can use the/fin| time/"to match FIN or time:
$ awk ' $6 ~/fin| time/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1 local-address foreign-address State5 coolshell.cn:80 124.205.5.146:18245 time_wait6 coolshell.cn:80 61.140.101.185:37538 fin_wait29 coolshell.cn:80 116.234.127.77:11502 fin_wait211 coolshell.cn:80 183.60.215.36:36970 time_wait13 coolshell.cn : 124.152.181.209:26825 fin_wait115 coolshell.cn:80 183.60.212.163:51082 time_wait18 coolshell.cn:80 117.136.20.85:50025 Fin_wait2
Let's take a look at the example of pattern inversion:
$ Awk ' $6!~/wait/| | nr==1 {print nr,$4,$5,$6} ' ofs= "\ T" netstat.txt1 local-address foreign-address State2 0.0.0.0:3306 0.0.0.0:* LISTEN3 0.0.0.0:80 0.0.0.0:* LISTEN4 127.0.0.1:9000 0.0.0.0:* LISTEN7 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED8 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED10 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED12 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED14 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED16 coolshell.cn:80 208.115.113.92:50601 last_ ACK17 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED19 ::: +: ::* LISTEN
Or:
awk '!/wait/' netstat.txt
Split file
The
awk split file is simple, so it's OK to use redirection. The following example separates the file by the 6th case, which is fairly simple (where the nr!=1 indicates that the table header is not processed).
$ Awk ' nr!=1{print > $6} ' netstat.txt$ lsestablished fin_wait1 fin_wait2 last_ack LISTEN netstat.txt time_wait$ C At establishedtcp 0 0 coolshell.cn:80 110.194.134.189:1032 establishedtcp 0 0 Coolsh ell.cn:80 123.169.124.111:49809 establishedtcp 0 0 coolshell.cn:80 123.169.124.111:49829 Establishedtcp 0 4166 coolshell.cn:80 61.148.242.38:30901 establishedtcp 0 0 Cool shell.cn:80 110.194.134.189:4796 establishedtcp 0 0 coolshell.cn:80 123.169.124.111:49840 established$ Cat fin_wait1tcp 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1$ cat FIN _wait2tcp 0 0 coolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 116.234.127.77:11502 fin_wait2tcp 0 0 coolshell.cn:80 117.136.20.85:50025 Fin_wai t2$ Cat Last_acktcp 0 1 coolshell.cn:80 208.115.113.92:50601 last_ack$ cat listentcp 0 0 0.0.0.0:3306 0.0.0.0:* listentcp 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0.0.0.0:* listentcp 0 0::: +:::* listen$ Cat time_waittcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAITTC P 0 0 coolshell.cn:80 183.60.215.36:36970 time_waittcp 0 0 coolshell.cn:80 1 83.60.212.163:51082 time_wait
You can also export the specified column to a file:
awk ' Nr!=1{print $4,$5 > $6} ' netstat.txt
A bit more complicated: (Note the IF-ELSE-IF statement, which shows that awk is actually a scripting interpreter)
$ Awk ' nr!=1{if ($6 ~/time| established/) print > "1.txt", else if ($6 ~/listen/) print > "2.txt", else print > "3.txt"} ' netstat.txt$ ls?. Txt1.txt 2.txt 3.txt$ Cat 1.txttcp 0 0 coolshell.cn:80 124.205.5.146:18245 time_waittcp 0 0 coolshell.cn:80 110.194.134.189:1032 establishedtcp 0 0 coolshell.cn:80 123.16 9.124.111:49809 establishedtcp 0 0 coolshell.cn:80 123.169.124.111:49829 establishedtcp 0 0 coolshell.cn:80 183.60.215.36:36970 time_waittcp 0 4166 coolshell.cn:80 61.148 .242.38:30901 establishedtcp 0 0 coolshell.cn:80 110.194.134.189:4796 establishedtcp 0 0 coolshell.cn:80 183.60.212.163:51082 time_waittcp 0 0 coolshell.cn:80 123.16 9.124.111:49840 established$ Cat 2.txttcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTENTCP 0 0 0.0.0.0:80 0.0.0.0:* listentcp 0 0 127.0.0.1:9000 0. 0.0.0:* listentcp 0 0::: $:::* listen$ Cat 3.TXTTC P 0 0 coolshell.cn:80 61.140.101.185:37538 fin_wait2tcp 0 0 coolshell.cn:80 1 16.234.127.77:11502 fin_wait2tcp 0 1 coolshell.cn:80 124.152.181.209:26825 fin_wait1tcp 0 1 coolshell.cn:80 208.115.113.92:50601 last_acktcp 0 0 coolshell.cn:80 117.13 6.20.85:50025 Fin_wait2
Statistics
The following command calculates the sum of file sizes for all C files, CPP files, and H files.
$ ls-l *.cpp *.c *.h | awk ' {sum+=$5} END {print sum} ' 2511401
Let's take a look at the usage of the stats for each connection state: (we can see some of the programming shadows, everyone is a programmer and I don't explain.) Note the usage of the array)
$ Awk ' nr!=1{a[$6]++;} END {for (i in a) print I "," A[i ";} ' Netstat.txttime_wait, 3fin_wait1, 1ESTABLISHED, 6fin_wait2, 3last_ack, 1LISTEN, 4
Take a look at how much memory is counted for each user's process (note: Sum's RSS column)
$ ps aux | awk ' nr!=1{a[$1]+=$6;} END {for (i in a) print I "," a[i] "KB";} ' Dbus, 540KBmysql, 99928KBwww, 3264924KBroot, 63644KBhchen, 6020KB
Take off the underwear awk script
In the above we can see an end keyword. End means "to process the identity of all the lines", that is to say the end it is necessary to introduce the BEGIN, these two keywords mean before and after the execution of the meaning, the syntax is as follows:
- Begin{This puts the pre-execution statement}
- END {This contains the statement to be executed after all the rows have been processed}
- {This contains the statement to be executed for each row}
To get this straight, let's take a look at the following example:
Suppose there is such a file (Student score table):
$ cat Score.txtmarry 2143 77Jack 2321 45Tom 2122 71Mike 2537 2415 40 57 62
Our awk script is as follows (I did not write on the command line because it is not easy to read on the command line, and another usage is introduced):
$ cat cal.awk#!/bin/awk-f# run before begin { math = 0 中文版 = 0 computer = 0 printf "NAME No. MATH 中文版 computer total\n " printf"---------------------------------------------\ n "} #运行中 { math+=$3 english+=$4 computer+=$5 printf "%-6s%-6s%4d%8d%8d%8d\n", $, $, $3,$4,$5, $3+$4+$5} #运行后END { printf "---------------------------------------------\ n" printf " total:%10d%8d%8d \ n", Math, 中文版, computer printf "average:%10.2f%8.2f%8.2f\n", Math/nr, English/nr, COMPUTER/NR}
Let's take a look at the execution result: (You can also run this./cal.awk score.txt)
$ awk-f Cal.awk score.txtname NO. MATH 中文版 computer total ---------------------------------------------Marry 2143 239Jack 2321, 189Tom 2122 196Mike 2537 279Bob 2415 159-------------- -------------------------------Total : 319 393 350AVERAGE: 63.80 78.60 70.00
Environment variables
Now that we have a script, let's see how we interact with environment variables: (using the-v parameter and environ, the environment variable using environ needs export)
$ x=5$ y=10$ export y$ echo $x $y 5 10$ awk-v val= $x ' {print $, $ $, $ $, $4+val, $5+environ["y"]} ' ofs= ' \ t ' Score.txtmarry 2143 87Jack 2321 55Tom 2122 81Mike 2537 102 105Bob 2415 72
A few flowers live
Finally, let's look at a few small examples:
Do it yourself.
For some of these points of knowledge, refer to Gawk's manual:
- Built-in variables, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din-Variables
- For flow control, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Statements
- Built-in functions, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din
- Regular expressions, see: Http://www.gnu.org/software/gawk/manual/gawk.html#Regexp
(End of full text)
AWK Concise Tutorial