Qiniu Seven cow problem solution
For a large number of log reads, learning awk is a good basis for quick statistics. Let me take you to the true colors of awk. The part01 source files used in this article are delimited by default spaces. awk is a row processor: Compared to the advantages of screen processing, there is no memory overflow or slow handling problem when handling large files, usually used to format text information, awk processing: sequentially processing each row, processing after completion of statistics and then output.
Example: This is one of the simplest examples. Here we are to study hard.
HDFs dfs-cat/flume/2015-03-25/req_io/15-30/*|awk-f ' t ' $7==401{print $} ' | sort | uniq-c | sort-rn
This should not be written here, you should directly write a specific string, otherwise the program will not recognize the value of the variable. command format:
awk command form:
awk [-f|-f|-v] ' begin{}//{command1 Command2} end{} ' file
[-f|-f|-v] Large parameter,-f specifies the delimiter,-F invokes the script,-v defines the variable Var=value
' Reference code block
BEGIN to initialize the code block, initialize the code before processing each row, mainly referencing global variables, setting the FS separator
Match code block, can be a string or regular expression
{} command code block, containing one or more commands
; Multiple commands use semicolons to separate
End code block, a block of code that is executed after each row is processed, mainly for final calculation or end of output summary information
Special points:
0 indicates that the entire current line 0 represents the entire current line 1 the first field per line
NF Field Quantity Variable
NR per line record number, multiple file record increment
FNR is similar to NR, but multiple file records are not incremented and each file starts at 1
\ t tab
\ n Line Feed
Define separator at FS begin
The record delimiter entered by RS, which defaults to line breaks (that is, text is entered on one line)
~ match, not exact compared with = =
!~ mismatch, imprecise comparison
= = equals, must be all equal, exact comparison
!= Not equal, exact comparison
&& Logic and
|| Logical OR
+ 1 or 1 more when matched
/[0-9][0-9]+/two or more than two digits
/[0-9][0-9]*/one or more digits
filename filename
OFS output field separator, default is also a space, can be changed to tabs, etc.
ORS the record separator for output, which defaults to line breaks, that is, the processing result is one line output to the screen
-F ' [: #/] ' Define three separator below let's see how to use it, hands-on teaching you to use it: 1, print the contents of the log part01.
Liuhanlindemac:downloads yishiyaonie$ awk ' {print} ' part01
2, print a line of more than the same line.
Liuhanlindemac:downloads yishiyaonie$ awk ' {print ' a '} ' part01
3,-f defines the separator character. The default is to use a space as a separator.
Liuhanlindemac:downloads yishiyaonie$ awk-f ' "" ' {print} ' part01
"" can be any character. A token that is stopped as a segment. It's good with Europe. 4, print the field branches.
Liuhanlindemac:downloads yishiyaonie$ awk ' {print $1;print $} ' part01
5, the output data format
Liuhanlindemac:downloads yishiyaonie$ awk ' {print $1,$3,$4} ' ofs= ' \ t ' part01
The 6,awk call script performs operations on the file.
Scripting Script code:
begin{
fs= ":"
}
{print $}
Liuhanlindemac:downloads yishiyaonie$ awk-f Script part01
7,awk Custom Output:
Liuhanlindemac:downloads yishiyaonie$ awk-f ' ' {print ' username: ' $ ' \t\t UID: ' $} ' part01
awk ' {print $} ' part01 //$1 is exported with $, and does not separate
awk ' {print $1,$3} ' part01 /+ a comma, which is separated from $ using a space Manually add space-delimited between awk ' {print ' ' $} ' part01 //$1 and $
8,awk The output that determines the number of records in a record field.
awk ' {print $NF} ' part01 //Print the value of the NF field per line the
awk ' nf==4 {print} ' part01 //display rows
with only 4 fields awk ' Nf>2{print $} ' part01 //show rows with more than 2 fields per line
9,awk handling of rows:
awk ' {print nr,nf, $NF, ' \ t ', $} ' part01//Print line number, field number, last field value, tab, per line content
awk ' Nr==5{print} ' part01 / /Show line 5th
awk ' nr==5 | | Nr==6{print} ' part01 //Show lines 5th and 6th
awk ' Nr!=1{print} ' part01 //Do not display the first line
10, matching character processing:
Pure character Match !//pure character mismatch ~//field value match !~//field value mismatch
awk '/183.198.46.6/' part01
awk '/mail/,/mysql/{print} '/etc/passwd //Interval match
awk '/[2][7][7]*/{print $} '/etc/passwd / Matches a row that contains 27 digits starting with a number, such as 27,277,2777
... awk ' $1~/mail/{print} '/etc/passwd //$1 match the specified content to show
awk ' {if ($1~/mail/) print '} '/etc/passwd Same as above
awk ' $1!~/mail/{print $} '/etc/passwd //mismatch
The 11,if statement must be used in {}, and the comparison content is expanded with ()
awk ' {if ($1~/mail/) print '/etc/passwd //shorthand
awk ' {if ($1~/mail/) {print}} ' /etc/ passwd //All writes
awk ' {if ($1~/mail/) {print} else {print $}} '/etc/passwd //if...else ...
12, conditional expression = =!= > >=
awk ' $1== ' 183.198.46.6 ' {print $} ' part01
awk ' {if ($1== "MySQL") print $} '/etc/passwd //Same awk as above ' $1!= ' mysql ' {print $} '/etc/passwd //Not equal to
awk ' $3>1000{print $} '/etc/passwd //greater than
awk ' $3>=100{print $} '/etc/passwd //greater than or equal to
awk ' $3<1{print $} '/etc/passwd //less than
awk ' $3<=1{print $} '/etc/passwd //less than equal
13, logical operators && | |
awk ' $1~/183.198.46.6/&& $4~/2015:19:14:40/{print$7} ' part01
awk ' $1~/mail/&& $3>8 { Print} ' part01//logical with, Match mail, and $3>8
awk ' {if ($1~/mail/&& $3>8) print} '/etc/passwd
awk ' $1~/mail/| | $3>1000 {PRINT} '/etc/passwd //logic or
awk
14, numerical operations
awk ' $ > '/etc/passwd
awk ' $ > | | $ < 5 '/etc/passwd
awk ' $3+$4 > '/etc/pas SWD
awk ' /mysql|mail/{print $3+10} '/etc/passwd //third field plus 10 print
awk '/mysql/{print $3-$4} '/etc/ passwd //subtraction
awk '/mysql/{print $3*$4} '/etc/passwd //Product
awk '/memfree/{print $2/ 1024} '/proc/meminfo /Division
awk '/memfree/{print int ($2/1024)} '/proc/meminfo //rounding
15, output separator OFS
awk ' $1~/183.198.46.6/| | Nr==1 {print NR, $1,$4,$7} ' osf= ' \ t ' part01
//Output field 1 matches the 183.198.46.6 row, where output is the line number per line, field 1,4,7, and tab-separated fields
are used The main purpose is to show good looks.
16, output processing results to the file
Using redirects to Output awk ' $1~/183.198.46.6/| | Nr==1 {print NR, $1,$4,$7} ' osf= ' \ t ' part01 >1.txt
directly output awk ' Nr!=1{print > ' in the command code block ./fs "}"
17, formatted output
awk ' {printf '%-8d%-8s%-10s\n, $1,$2,$3} ' part01
printf represents format output
% formatted output separator,-8 length 8 digits, s for string type, print three fields per line, Specifies the first field output string type (length 8), the second field output string type (length 8), and the third field output string type (length)
Netstat-anp|awk ' $6== ' LISTEN ' | | nr==1 {printf "%-10s%-10s%-10s \ n", $1,$2,$3} ' Netstat-anp|awk ' $6== ' LISTEN '
| | nr==1 {printf "%-3s%-10s%-10s%-10s \ n", nr,$1,$2,$3} '
18,if Statement
awk ' {if ($1~/183.198.46.6/) {print ' OK} ' else {print ' NOP '}} ' part01
awk ' begin{a=0;b=0} {if ($1~/183.198.46.6/) {a++ ;p rint "OK"} else {b++;p rint "NOP"} end {print a,b} osf= ' \t\t ' ' part01 //id is greater than 100,a plus 1, otherwise b
awk ' {if ($3< ) Next; else print} '/etc/passwd //Less than 100 skipped, otherwise show
awk ' Begin{i=1} {if (I<NF) print nr,nf,i++} '/etc/passwd
awk ' Begin{i=1} {if (I<NF) {print nr,nf} i++} '/etc/passwd
another form of
awk ' {print $3>100? ' Yes ': ' No '} ' /etc/passwd
awk ' {print ($3>100 $: \tyes ': $ $: \tno ')} ' /etc/passwd
19,while Statement
awk ' Begin{i=1} {while (I<NF) print NF, $i, i++} ' part01
20, Array
awk ' nr!=1 {a[$1]++} end {to (I in a) {print a[i], "\ t", i}} ' part01
awk ' nr!=1{a[$6]++} end{for (i in a) printf "%-20s %-10s%-5s \ n ", I," \ T ", a[i]} '
The first field is summarized by the first, and the number of repetitions in the field is counted. Well, so far the syntax is pretty much, and we need more practice to use the powerful text statistics tool of awk.
Application 1 awk-f: ' {print NF} ' helloworld.sh//output file How many fields per line awk-f: ' {print $1,$2,$3,$4,$5} ' helloworld.sh/Output First 5 fields Awk-f: ' {print $1,$2,$3,$4,$5} ' ofs= ' \ t ' hellow orld.sh//Output the first 5 fields and use tab-delimited output awk-f: ' {print nr,$1,$2,$3,$4,$5} ' ofs= ' \ t ' helloworld.sh/Tab delimited output The first 5 fields and print line numbers apply 2 awk-f ' [: #] ' {print NF} ' helloworld.sh//Specify multiple delimiters: #, output How many fields per line Awk-f ' [: #] ' {print $1,$2,$3,$4,$5,$6,$7} ' ofs= ' \ t ' helloworld.sh//tab-delimited output multi-field Application 3 Awk-f ' [: #/] ' ' {print NF} ' h ELLOWORLD.SH//Specifies three delimiters and outputs the number of fields per row awk-f ' [: #/] ' {print $1,$2,$3,$4,$5,$6,$7,$ 8,$9,$10,$11,$12} ' helloworld.sh/Tab delimited output multi-field Application 4 calculates the size of the normal file in the/home directory, using KB as the unit Ls-l|awk ' begin{sum=0}!/^d/{sum+=$5} End{print "Total size are:", sum/1024, "KB"} ' Ls-l|awk ' begin{sum=0}!/^d/{sum+=$5} end{print ' total size is: ', int (sum/1024 ), "KB"} '//int is roundingThe meaning of the application 5 statistics NETSTAT-ANP State for LISTEN and connect number of connections is how much Netstat-anp|awk ' $6~/listen|
connected/{sum[$6]++} end{for (i in sum) printf "%-10s%-6s%-3s \ n", I, "", Sum[i]} ' applies 6 statistics/home The total number of ordinary files for different users in the directory. Ls-l|awk ' Nr!=1 &&!/^d/{sum[$3]++} end{for (i in sum) printf "%-6s%-5s \ n", I, "",%-3s]} ' MySQL
199 root 374 counts the total size of the normal files for different users in the/home directory. Ls-l|awk ' Nr!=1 &&!/^d/{sum[$3]+=$5} end{for (i in sum) printf,%-6s%-5s%-3s%-2s \ n ", I," ", sum[i]/1024/1024," MB "} ' applies the 7 Output score table awk ' begin{math=0;eng=0;com=0;printf ' Lineno. Name No. Math 中文版 Computer total\n ";p rintf"------------------------------------------------------------\ n "}{math+=$3; e ng+=$4; com+=$5;printf "%-8s%-7s%-7s%-7s%-9s%-10s%-7s \ n", nr,$1,$2,$3,$4,$5,$3+$4+$5} end{printf "--------