1. Introduction to Awk
Awk is a programming language that is used to process text and data under Linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script. awk handles text and data in such a way that it scans the file line-by-row, from the first line to the last line, looks for rows that match a particular pattern, and makes the actions you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen), and if no pattern is specified, all rows specified by the operation are processed. Awk represents the first letter of its author's last name, respectively. Because its author is three people, respectively is Alfred Aho, Brian Kernighan, Peter Weinberger. Gawk is the GNU version of AWK, which provides some extensions to the Bell Lab and GNU. Awk is described below as an example of the gawk of the gun, which has been linked to gawk in the Linux system, so all of this is described in awk below.
2. awk command format and option 2.1. AWK's command format, basic usage, and pattern
· awk [Options] ' script ' Var=value file (s)
· awk [Options]-F scriptfile var=value file (s)
Basic usage:
1) Invoke awk on the shell command Line Input command:
awk [-F Domain delimiter] ' awk program segment ' Input file
2) Insert the AWK program segment into the script file and then
The awk command calls it:
Awk-f awk script file input file "the first line of the script file does not start with #!/bin/awk–f"
3) after inserting the awk command into the script file, set the script file as executable and execute the script file directly in the format:
./awk script file Input file "the first line of the script file starts with #!/bin/awk–f"
awk '
BEGIN {Actions}
/pattern/{Actions}
/pattern/{Actions}
END {Actions}
' Files
2.2. Command options
-F FS or--field-separator FS
Specifies the input file delimiter, FS is a string, or is a regular expression, such as-f:.
-V Var=value or--asign var=value
Assigns a user-defined variable.
-F scripfile or--file ScriptFile
Reads the awk command from the script file.
-MF nnn AND-MR nnn
Set intrinsic limits on the NNN value, the-MF option limits the maximum number of blocks assigned to NNN, and the-MR option limits the maximum number of records. These two features are the extended functionality of the Bell Lab version of AWK and are not available in standard awk.
-W compact or--compat,-w traditional or--traditional
Run awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.
-W copyleft or--copyleft,-w copyright or--copyright
Print a brief copyright message.
-W Help or--help,-w usage or--usage
Print all awk options and a short description of each option.
-W Lint or--lint
Print warnings for structures that cannot be ported to traditional UNIX platforms.
-W lint-old or--lint-old
Print a warning about a structure that cannot be ported to a traditional UNIX platform.
-W POSIX
Turn on compatibility mode. However, the following limitations are not recognized: \x, function keyword, func, swap sequence, and when FS is a space, the new row is used as a domain delimiter, and the operators * * and **= are not valid in lieu of ^ and ^=;fflush.
-W re-interval or--re-inerval
Allows the use of interval regular expressions, reference (POSIX character class in grep), such as parenthesis expression [[: Alpha:]].
-W source Program-text or--source Program-text
Use Program-text as the source code, which can be mixed with the-f command.
-W version or--version
Print the version of the bug report information.
3. Modes and operations
awk scripts are made up of patterns and operations:
The pattern {action} is like $ awk '/root/' test, or the awk ' $ < ' test.
Both are optional, and if there is no pattern, the action is applied to all records, and if there is no action, the output matches all records. By default, each input line is a record, but the user can specify a different delimiter to delimit by using the RS variable.
3.1. Mode
The pattern can be any one of the following:
· /Regular expression/: An extension set that uses wildcard characters.
· Relational expressions: You can use the relational operator in the following operator table, which can be a string or numeric comparison, such as $2>$1 to select a row with a second field that is longer than the first word.
· Pattern matching expression: with operator ~ (match) and ~ ~ (not matched).
· Mode, Mode: Specifies the range of a row. The syntax cannot include the begin and end patterns.
· BEGIN: Lets the user specify the action that occurs before the first input record is processed, which is where the global variable is usually set.
· End: The action that occurs after the last input record has been read by the user.
3.2. Operation
An action consists of one or more commands, functions, and expressions, separated by a newline or semicolon, and enclosed in curly braces. There are four main parts:
· Variable or array assignment
· Output command
· Built-in functions
· Control Flow Command
4. AWK's Environment variables
Table 1. awk the environment Variables
Variable |
Describe |
$n |
the current record N Fields , between the fields by a FS separated. |
$ |
The complete input record. |
ARGC |
The number of command-line arguments. |
Argind |
The location of the current file in the command line, starting at 0. |
Argv |
An array that contains the command-line arguments. |
Convfmt |
Number conversion format (default is%.6g) |
ENVIRON |
An associative array of environment variables. |
Errno |
Description of the last system error. |
FieldWidths |
A list of field widths separated by a space key. |
FILENAME |
The current file name. |
FNR |
with NR , but relative to the current file. |
Fs |
Field Delimiter ( default is any space ) . |
IGNORECASE |
If true, the matching of the case is ignored. |
Nf |
The number of fields in the current record. |
Nr |
The current number of records. |
Ofmt |
The output format of the number (the default value is%.6g). |
OFS |
output Field delimiter ( The default value is a space ) . |
ORS |
The output record delimiter (the default value is a newline character). |
Rlength |
The length of the string that is matched by the match function. |
Rs |
Record delimiter (default is a line break). |
Rstart |
The first position of a string that is matched by the match function. |
Subsep |
Array subscript delimiter (the default value is \034). |
5. awk operator
Table 2. operator
Operator |
Describe |
= += -= *= /= %= ^= **= |
Assign value |
?: |
C-Conditional expression |
|| |
Logical OR |
&& |
Logic and |
~ ~! |
Match regular expressions and mismatched regular expressions |
< <= > >= = = = |
Relational operators |
Space |
Connection |
+ - |
Add, Subtract |
*/& |
Multiply, divide and seek remainder |
+ - ! |
Unary Plus, minus and logical non- |
^ *** |
exponentiation |
++ -- |
To increase or decrease, as a prefix or suffix. |
$ |
Field reference |
Inch |
Array members |
6. Logging and Domain 6.1. Recording
Awk calls each line that ends with a newline character a record.
Record delimiter: The default input and output separators are carriage returns, which are stored in the built-in variables ors and Rs.
Variable: it refers to the entire record. such as $ Awk ' {print $} ' test will output all records in the test file.
Variable NR: A counter that increases the value of NR by 1 per record after processing. such as $ Awk ' {print nr,$0} ' test outputs all records in the test file and displays the record number before recording.
6.2. Domain
Each word in the record is called a field, separated by a space or tab by default. Awk can track the number of fields and save the value in the built-in variable NF. such as $ Awk ' {print $1,$3} ' test will print the first and third columns (fields) separated by spaces in the test file.
6.3. Domain Separators
built-in variables FS Saves the value of the input field delimiter, which is either a space or tab by default. We can modify the value of FS with the-F command-line option. such as $ awk-f: ' {print $1,$5} ' test will print the contents of the first, fifth column with a colon delimiter.
You can use multiple domain separators at the same time, you should write the delimiter in square brackets, such as $awk-f ' [: \ t] ' {print $1,$3} ' test, which represents a space, colon, and tab as delimiters.
delimiter for output fields OFS Default is a space. such as $ awk-f: ' {print $1,$5} ' test,$1 and $ A comma is the value of OFS.
7. Gawk dedicated regular expression meta-characters
General Common meta character sets are not spoken, can refer to my sed and grep learning notes. The following are gawk-specific, awk that is not suitable for UNIX versions.
\y |
Matches an empty string at the beginning or end of a word. |
\b |
Matches an empty string within a word. |
\< |
Matches an empty string at the beginning of a word, anchoring begins. |
\> |
Matches an empty string at the end of a word, anchoring the end. |
\w |
Match A word that consists of an alphanumeric number. |
\w |
Matches a word that consists of a non-alphanumeric number. |
\‘ |
An empty string that matches the beginning of the string. |
\‘ |
Matches an empty string at the end of the string. |
8. POSIX character Set
Refer to my grep learning notes
9. Match operator (~)
Used to match a regular expression within a record or domain. such as $ awk ' ~/^root/' test will display the row in the first column of the test file that starts with root.
10. Compare expressions
Conditional expression1? Expression2:expression3, for example: $ Awk ' {max = {$ > $ "$: $3:print max} ' test. If the first field is larger than the third field, $ $ is assigned to Max, otherwise $ $ is assigned to Max.
$ Awk ' $ + $ < ' test. If the first and second fields are added greater than 100, the rows are printed.
$ Awk ' $ > 5 && $ < ' test if the first field is greater than 5, and the second field is less than 10, the lines are printed.
11. Scope templates
A range template matches all rows from the first occurrence of the first template to the first occurrence of the second template. If a template does not appear, it matches to the beginning or end. such as $ awk '/root/,/mysql/' test will show the first time that root appears to all rows between MySQL first occurrence.
12. An example of verifying the validity of a passwd file
$ CAT/ETC/PASSWD | Awk-f: ' \
NF! = 7{\
printf ("line%d,does not having 7 fields:%s\n", nr,$0)}\
$!~/[a-za-z0-9]/{printf ("line%d,non Alpha and numeric user id:%d:%s\n,nr,$0)}\
$ = = "*" {printf ("line%d, no password:%s\n", nr,$0)} '
Cat outputs the result to Awk,awk to set the delimiter between the fields to a colon.
If the number of domains (NF) is not equal to 7, execute the following program.
printf print string "line??" Does not has 7 fields "and displays the record.
If the first field does not contain any letters and numbers, printf prints "No alpha and numeric user ID" and displays the number of records and records.
If the second field is an asterisk, the string "No passwd" is printed, followed by the number of records displayed and the record itself.
13. Multiple command execution
Write directly, without having to add the-e parameter as SED does. For example
awk '/[1-9]\. [0-9] [0-9]$/{print $, ' * '}/0\. [1-9] [1-9]/{print;} ' Zdd.txt
After a fruit with a price above $1 is added * to attract attention, there are two modes and actions on
14. Format Printing
The%s parameter, which is used to print a string, can specify width, insufficient fill space, positive number for right alignment, and negative number for left alignment. %3s indicates that the string width is 3 columns, the right side is aligned, and if the actual width of the string is greater than 3, the actual width is taken.
Left-aligned file name, size left-justified
Ls-l | awk ' {printf '%-16s%\t%-16s\n ', $9, $;} '
File name left-aligned, size right-aligned
Ls-l | awk ' {printf '%-16s%\t%16s\n ', $9, $;} '
Right-aligned file name, size left-justified
Ls-l | awk ' {printf '%16s%\t%-16s\n ', $9, $;} '
Right-aligned file name, size left-justified
Ls-l | awk ' {printf '%16s%\t%16s\n ', $9, $;} '
15. Several examples
ø$ awk ' {print $} ' Test-----intercept the contents of a third field (column).
ø$ awk '/^ (no|so)/' Test-----Prints all lines that begin with mode no or so.
ø$ awk '/^[ns]/{print $ ' test-----Print this record if the record starts with N or S.
ø$ awk ' $ ~/[0-9][0-9]$/{print $ ' Test-----If the first field prints this record at the end of two digits.
ø$ awk ' $ = = 100 | | $ < ' test-----if the first or equal 100 or the second field is less than 50, the line is printed.
ø$ awk '! = ' Test-----If the first field is not equal to 10, print the line.
ø$ awk '/test/{print $10} ' test-----If the record contains a regular expression test, the first field is added to and printed out.
ø$ awk ' {print ($ > 5? "OK" $: "Error" ($)} ' test-----Print the expression value after the question mark if the first field is greater than 5, otherwise the expression value after the colon is printed.
The
ø$ awk '/^root/,/^mysql/' test----prints all records in the range of records that begin with the regular expression root with the record at the beginning of the regular expressions MySQL. If a record of the beginning of a new regular expression root is found, continue printing until the next record begins with the regular expression MySQL, or to the end of the file.