I. AWK description
Awk is a programming language that is used to process text and data under Linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script.
Awk's way of working with text and data: It scans the file row by line, from the first line to the last line, looking for rows that match a particular pattern, and doing the actions you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen), and if no pattern is specified, all rows specified by the operation are processed.
Awk represents the first letter of its author's last name, respectively. Because its author is three people, respectively is Alfred Aho, Brian Kernighan, Peter Weinberger.
Gawk is the GNU version of AWK, which provides some extensions to the Bell Lab and GNU. Awk is described below as an example of the gawk of the gun, which has been linked to gawk in the Linux system, so all of this is described in awk below.
Two. awk command format and options
2.1. AWK has two forms of syntax
awk [Options] ' script ' Var=value file (s)
awk [Options]-F scriptfile var=value file (s)
2.2. Command options
(1)-F FS or--field-separator FS: Specifies the input file delimiter, FS is a string or is a regular expression, such as-f:.
(2)-V Var=value or--asign var=value: Assigns a user-defined variable.
(3)-F scripfile or--file ScriptFile: reads the awk command from the script file.
(4)-MF nnn AND-MR nnn: Set intrinsic limits on NNN value,-MF option limits the maximum number of blocks assigned to NNN;-MR option limits the maximum number of records. These two features are the extended functionality of the Bell Lab version of AWK and are not available in standard awk.
(5)-W compact or--compat,-w traditional or--traditional: runs awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.
(6)-W copyleft or--copyleft,-w copyright or--copyright: print short copyright information.
(7)-W Help or--help,-w usage or--usage: Prints all awk options and a short description of each option.
(8)-W lint or--lint: Prints a warning of a structure that cannot be ported to a traditional UNIX platform.
(9)-W lint-old or--lint-old: Prints a warning about structures that cannot be ported to traditional UNIX platforms.
(Ten)-W POSIX: Turn on compatibility mode. However, the following limitations are not recognized:/x, function keyword, func, swap sequence, and when FS is a space, the new row is used as a domain delimiter, and the operators * and **= cannot replace ^ and ^=;fflush.
(one)-W re-interval or--re-inerval: Allows the use of interval regular expressions, reference (POSIX character class in grep), such as parenthesis expression [[: Alpha:]].
-W source Program-text or--source program-text: Use Program-text as the source code, which can be mixed with the-f command.
-W version or--version: Prints the version of the bug report information.
Three. Modes and operations
awk scripts are made up of patterns and operations:
The pattern {action} is like $ awk '/root/' test, or the awk ' $ < ' test.
Both are optional, and if there is no pattern, the action is applied to all records, and if there is no action, the output matches all records. By default, each input line is a record, but the user can specify a different delimiter to delimit by using the RS variable.
3.1. Mode
The pattern can be any one of the following:
(1) Regular expression: a set of extensions that use wildcard characters.
(2) Relational expression: You can use the relational operator in the following operator table, which can be a comparison of a character (3) string or a number, such as $2>%1 to select a row with a second field that is longer than the first word.
(4) Pattern matching expression: with operator ~ (match) and ~ ~ (not matched).
(5) mode, Mode: Specifies the range of a row. The syntax cannot include the begin and end patterns.
(6) BEGIN: Let the user specify the action that occurs before the first input record is processed, and you can usually set the global variable here.
(7) End: The action that occurs after the last input record has been read by the user.
3.2. Operation
An action consists of one or more commands, functions, and expressions, separated by a newline or semicolon, and enclosed in curly braces. There are four main parts:
(1) Assigning values to variables or arrays
(2) Output command
(3) built-in functions
(4) Control Flow command
Four. AWK's environment variables
Variable |
Describe |
$n |
The nth field of the current record, separated by FS between the fields. |
$ |
The complete input record. |
ARGC |
The number of command-line arguments. |
Argind |
The location of the current file in the command line, starting at 0. |
Argv |
An array that contains the command-line arguments. |
Convfmt |
Number conversion format (default is%.6g) |
ENVIRON |
An associative array of environment variables. |
Errno |
Description of the last system error. |
FieldWidths |
A list of field widths separated by a space key. |
FILENAME |
The current file name. |
FNR |
Same as NR, but relative to the current file. |
Fs |
The field delimiter (the default is any space). |
IGNORECASE |
If true, the matching of the case is ignored. |
Nf |
The number of fields in the current record. |
Nr |
The current number of records. |
Ofmt |
The output format of the number (the default value is%.6g). |
OFS |
The Output field delimiter (the default value is a space). |
ORS |
The output record delimiter (the default value is a newline character). |
Rlength |
The length of the string that is matched by the match function. |
Rs |
Record delimiter (default is a line break). |
Rstart |
The first position of a string that is matched by the match function. |
Subsep |
Array subscript delimiter (default is/034). |
Five. Awk operator
Operator |
Describe |
= += -= *= /= %= ^= **= |
Assign value |
?: |
C-Conditional expression |
|| |
Logical OR |
&& |
Logic and |
~ ~! |
Match regular expressions and mismatched regular expressions |
< <= > >= = = = |
Relational operators |
Space |
Connection |
+ - |
Add, Subtract |
*/& |
Multiply, divide and seek remainder |
+ - ! |
Unary Plus, minus and logical non- |
^ *** |
exponentiation |
++ -- |
To increase or decrease, as a prefix or suffix. |
$ |
Field reference |
Inch |
Array members |
Six. Records and Domains
6.1. Recording
Awk calls each line that ends with a newline character a record.
Record delimiter: The default input and output separators are carriage returns, which are stored in the built-in variables ors and Rs.
Variable: it refers to the entire record. such as $ Awk ' {print $} ' test will output all records in the test file.
Variable NR: A counter that increases the value of NR by 1 per record after processing.
such as $ Awk ' {print nr,$0} ' test outputs all records in the test file and displays the record number before recording.
6.2. Domain
Each word in the record is called a field, separated by a space or tab by default. Awk can track the number of fields and save the value in the built-in variable NF. such as $ Awk ' {print $1,$3} ' test will print the first and third columns (fields) separated by spaces in the test file.
6.3. Domain Separators
The built-in variable FS holds the value of the input field delimiter, which is the default space or tab. We can modify the value of FS with the-F command-line option. such as $ awk-f: ' {print $1,$5} ' test will print the contents of the first, fifth column with a colon delimiter.
You can use multiple domain separators at the same time, you should write the delimiter in square brackets, such as $awk-f ' [:/t] ' {print $1,$3} ' test, which represents a space, colon, and tab as delimiters.
The delimiter for the output field is a space by default and is saved in OFS. such as $ awk-f: ' {print $1,$5} ' test,$1 and $ A comma is the value of OFS.
Seven. Gawk dedicated regular expression meta-characters
The following are gawk-specific, awk that is not suitable for UNIX versions.
(1)/y: Matches an empty string at the beginning or end of a word.
(2)/b: Matches an empty string within a word.
(3)/<</span>: An empty string that matches the beginning of a word, anchoring begins.
(4)/>: Matches an empty string at the end of a word, anchoring the end.
(5)/w: Matches a word that consists of an alphanumeric number.
(6)/w: Matches a non-alphanumeric word.
(7)/': matches an empty string at the beginning of the string.
(8)/': matches an empty string at the end of the string.
Eight. Match operator (~)
Used to match a regular expression within a record or domain. such as $ awk ' ~/^root/' test will display the row in the first column of the test file that starts with root.
Nine. Compare expressions
Conditional expression1? Expression2:expression3,
For example: $ Awk ' {max = {$ > $ $ $3:print} ' test. If the first field is larger than the third field, $ $ is assigned to Max, otherwise $ $ is assigned to Max.
$ Awk ' $ + $ < ' test. If the first and second fields are added greater than 100, the rows are printed.
$ Awk ' $ > 5 && $ < ' test if the first field is greater than 5, and the second field is less than 10, the lines are printed.
10. Scope templates
A range template matches all rows from the first occurrence of the first template to the first occurrence of the second template. If a template does not appear, it matches to the beginning or end. such as $ awk '/root/,/mysql/' test will show the first time that root appears to all rows between MySQL first occurrence.
Eleven. Example
1. The awk '/101/' file displays a matching line containing 101 of the files.
awk '/101/,/105/' file
awk ' $ = = 5 ' file
awk ' $ = = ' CT ' file note must be enclosed with double quotes
Awk ' $ * $ >100 ' file
awk ' $ >5 && $2<=15 ' file
2. awk ' {print nr,nf,$1, $NF,} ' file displays the current record number, number of fields, and the first and last fields of each row of the files.
awk '/101/{print $1,$2 + ten} ' file displays the first to second field plus 10 of the matching line of the files.
awk '/101/{print $1$2} ' file
awk '/101/{print $ A} ' file displays the first to second field of a matching row of file files, but there is no delimiter in the middle of the display.
3, DF | awk ' $4>1000000 ' obtains input through pipe breaks, such as: Displays rows where the 4th field satisfies a condition.
4, awk-f "|" ' {print '} ' file follows the new delimiter ' | ' to operate.
awk ' BEGIN {fs= ' [:/t|] '}
{print $1,$2,$3} ' file by setting the input delimiter (fs= "[:/t|]" ) to modify the input delimiter.
Sep= "|"
Awk-f $Sep ' {print '} ' file is delimited by the value of the environment variable SEP.
Awk-f ' [:/t|] ' The ' {print $} ' file is delimited by the value of the regular expression, which represents a space,:, TAB, | And as a delimiter.
Awk-f ' [] ' {print $} ' file is delimited by the value of the regular expression, which stands for [,]
5, awk-f awkfile file through the contents of awkfile files in turn control.
Cat Awkfile
/101/{print "/047 hello! /047 "}--print ' hello! after encountering a matching line './047 stands for single quotation marks.
{Print $1,$2}-Prints the first two fields of each row because there is no mode control.
6. awk ' $ ~/101/{print $} ' file displays the row (record) of the first field in the document that matches 101.
7. awk ' BEGIN {ofs= '% '}
{print $1,$2} ' file modifies the output format by setting the output delimiter (ofs= "%").
8. awk ' BEGIN {max=100;p rint "max=" Max}
BEGIN represents the operation that is performed before any rows are processed.
{max= ($ >max $1:max); print $, "Now Max is" max "file gets the maximum value of the first domain of the files.
9. awk ' $ * $ >100 {print $} ' file displays the first field in the file that matches 101 of the row (record).
10. awk ' {$ = = ' Chi ' {$ = ' China '; print} ' file replaces the 3rd field and then displays the row (record) after it finds a matching row.
awk ' {$7%= 3; print $7} ' file will divide the 7th field by 3 and assign the remainder to the 7th domain and reprint.
11. awk '/tom/{wage=$2+$3; printf wage} ' file finds a matching row and assigns a value to the variable wage and prints the variable.
12. awk '/tom/{count++;}
END {print "Tom was found" count "Times"} ' file
End indicates that processing is done after all input lines have been processed.
13. awk ' Gsub (//$/, ""), Gsub (/,/, ""); cost+=$4;
END {print "The total is $" cost> "filename"} ' file
The GSUB function replaces $ and with an empty string, and then outputs the result to filename.
1 2 3 $1,200.00
1 2 3 $2,300.00
1 2 3 $4,000.00
awk ' {gsub (//$/, ""); Gsub (/,/, "");
if ($4>1000&&$4<2000) c1+=$4;
else if ($4>2000&&$4<3000) c2+=$4;
else if ($4>3000&&$4<4000) c3+=$4;
else c4+=$4; }
END {printf "c1=[%d];c2=[%d];c3=[%d];c4=[%d]/n", c1,c2,c3,c4} "' File
Completion of conditional statements by if and else if
awk ' {gsub (//$/, ""); Gsub (/,/, "");
if ($4>3000&&$4<4000) exit;
else c4+=$4; }
END {printf "c1=[%d];c2=[%d];c3=[%d];c4=[%d]/n", c1,c2,c3,c4} "' File
Exits on a condition by exit, but the end operation is still performed.
awk ' {gsub (//$/, ""); Gsub (/,/, "");
if ($4>3000) next;
else c4+=$4; }
END {printf "c4=[%d]/n", C4} "' File
The row is skipped on a condition by next, and the action is performed on the next line.
14. awk ' {print filename,$0} ' file1 file2 file3>fileall
Write the contents of File1, File2 and File3 to Fileall, in the form of the print file and the file name.
15. awk ' $1!=previous {close (previous); previous=$1}
{Print substr ($0,index, "") + 1) >$1} ' Fileall
Re-split the merged file into 3 files. and consistent with the original document.
16. awk ' BEGIN {"date" |getline D; print d} '
The execution result of date is sent to Getline by pipeline and assigned to variable D, then printed.
17. awk ' BEGIN {System ("echo" Input your Name://c "") getline D;print "/nyour name is", D, "/b!/n"} '
Enter name through the getline command and display it.
awk ' BEGIN {fs= ': "; while (getline< "/etc/passwd" >0) {if ($1~ "050[0-9]_") print "}} '
Print the user name in the/etc/passwd file that contains the 050x_ user name.
18. awk ' {i=1;while (i) loops through the while statement.
awk ' {for (i=1;i) loops through the For statement.
Type file|awk-f "/" '
{for (i=1;i
{if (i==nf-1) {printf "%s", $i}
else {printf "%s/", $i}} '
Displays the full path of a file.
Display dates with for and if
awk ' BEGIN {
for (j=1;j<=12;j++)
{flag=0;
printf "/n%d month/n", J;
for (i=1;i<=31;i++)
{
if (j==2&&i>28) flag=1;
if (j==4| | j==6| | j==9| | j==11) &&i>30) flag=1;
if (flag==0) {printf "dd", j,i}
}
}
}‘
19. Calling system variables in awk must be in single quotation marks, or in double quotes, to indicate a string
Flag=abcd
awk ' {print ' $Flag '} ' result is ABCD
awk ' {print ' $Flag '} ' result is $flag
20. Other small examples
$ Awk '/^ (no|so)/' Test-----Prints all lines that begin with mode no or so.
$ Awk '/^[ns]/{print $ ' test-----Print this record if the record starts with N or S.
$ Awk ' $ ~/[0-9][0-9]$/(print $) test-----If the first field prints this record at the end of two digits.
$ Awk ' $ = = 100 | | $ < ' test-----if the first or equal 100 or the second field is less than 50, the line is printed.
$ Awk ' = Ten ' test-----print the first field if it is not equal to 10.
$ Awk '/test/{print $10} ' test-----If the record contains a regular expression test, the first field is added to and printed out.
$ Awk ' {print ($ > 5? "OK" $: "Error" ($)} ' test-----Print the expression value after the question mark if the first field is greater than 5, otherwise the expression value after the colon is printed.
The $ awk '/^root/,/^mysql/' test----prints all records in the range of records that begin with the regular expression root with a record that begins with the regular expressions MySQL. If a record of the beginning of a new regular expression root is found, continue printing until the next record begins with the regular expression MySQL, or to the end of the file.
awk ' {$x = ' ";p rint} ' filename----except for the X field, the others are output
Awk-v var= "'" ' {print Var$1var} ' Urfile
Awk-v var= "'" ' {print "UPDATE pim.rms_resourceinfo SET r_ext5=" Var$2var "where r_code=" Var$1var ";"} '
Description of the Linux awk command