Http://blog.chinaunix.net/uid-25120309-id-3801250.html
I. AWK description
Awk is a programming language that is used to process text and data under Linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script.
Awk's way of working with text and data: It scans the file row by line, from the first line to the last line, looking for rows that match a particular pattern, and doing the actions you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen), and if no pattern is specified, all rows specified by the operation are processed.
Awk represents the first letter of its author's last name, respectively. Because its author is three people, respectively is Alfred Aho, Brian Kernighan, Peter Weinberger.
Gawk is the GNU version of AWK, which provides some extensions to the Bell Lab and GNU. Awk is described below as an example of the gawk of the gun, which has been linked to gawk in the Linux system, so all of this is described in awk below.
Two. awk command format and options
2.1 awk has two forms of syntax
1. Command-line mode
awk [-F field-separator] ' commands ' input-file (s)
Where commands is the true awk command, [-F domain delimiter] is optional. Input-file (s) is the file to be processed.
In awk, each line in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space without naming the-F domain delimiter.
2. Insert all the awk commands into a separate file and then invoke:
Awk-f awk-script-file Input-file (s)
Where the-f option loads the awk script in Awk-script-file, Input-file (s) is the same as above.
2.2 Command Options
(1)-F FS or--field-separator FS: Specifies the input file delimiter, FS is a string or is a regular expression, such as-f:.
(2)-V Var=value or--asign var=value: Assigns a user-defined variable.
(3)-F scripfile or--file ScriptFile: reads the awk command from the script file.
(4)-MF nnn AND-MR nnn: Set intrinsic limits on NNN value,-MF option limits the maximum number of blocks assigned to NNN;-MR option limits the maximum number of records. These two features are the extended functionality of the Bell Lab version of AWK and are not available in standard awk.
(5)-W compact or--compat,-w traditional or--traditional: runs awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.
(6)-W copyleft or--copyleft,-w copyright or--copyright: print short copyright information.
(7)-W Help or--help,-w usage or--usage: Prints all awk options and a short description of each option.
(8)-W lint or--lint: Prints a warning of a structure that cannot be ported to a traditional UNIX platform.
(9)-W lint-old or--lint-old: Prints a warning about structures that cannot be ported to traditional UNIX platforms.
(Ten)-W POSIX: Turn on compatibility mode. However, the following limitations are not recognized:/x, function keyword, func, swap sequence, and when FS is a space, the new row is used as a domain delimiter, and the operators * and **= cannot replace ^ and ^=;fflush.
(one)-W re-interval or--re-inerval: Allows the use of interval regular expressions, reference (POSIX character class in grep), such as parenthesis expression [[: Alpha:]].
-W source Program-text or--source program-text: Use Program-text as the source code, which can be mixed with the-f command.
-W version or--version: Prints the version of the bug report information.
Three. How to use
#awk ' {pattern + action} ' {filenames}
Although the operation can be complex, the syntax is always the same, where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when a match is found. Curly braces ({}) do not need to always appear in the program, but they are used to group a series of instructions according to a particular pattern. pattern is the regular expression to be represented, surrounded by slashes.
The most basic function of the awk language is to browse and extract information in a file or string based on the specified rules, before awk extracts the information for additional text operations. A complete awk script is typically used to format the information in a text file.
Typically, awk is treated as a unit of a file's behavior. awk processes the text by executing the corresponding command for each line that receives the file.
Four. Modes and operations
awk scripts are made up of patterns and operations:
The pattern {action} is like $ awk '/root/' test, or the awk ' $ < ' test.
Both are optional, and if there is no pattern, the action is applied to all records, and if there is no action, the output matches all records. By default, each input line is a record, but the user can specify a different delimiter to delimit by using the RS variable.
4.1. Mode
The pattern can be any one of the following:
(1) Regular expression: a set of extensions that use wildcard characters.
(2) Relational expression: You can use the relational operator in the following operator table, which can be a comparison of a character (3) string or a number, such as $2>%1 to select a row with a second field that is longer than the first word.
(4) Pattern matching expression: with operator ~ (match) and ~ ~ (not matched).
(5) mode, Mode: Specifies the range of a row. The syntax cannot include the begin and end patterns.
(6) BEGIN: Let the user specify the action that occurs before the first input record is processed, and you can usually set the global variable here.
(7) End: The action that occurs after the last input record has been read by the user.
4.2. Operation
An action consists of one or more commands, functions, and expressions, separated by a newline or semicolon, and enclosed in curly braces. There are four main parts:
(1) Assigning values to variables or arrays
(2) Output command
(3) built-in functions
(4) Control Flow command
Five. AWK's environment variables
Variable |
Describe |
$n |
The nth field of the current record, separated by FS between the fields. |
$ |
The complete input record. |
ARGC |
The number of command-line arguments. |
Argind |
The location of the current file in the command line, starting at 0. |
Argv |
An array that contains the command-line arguments. |
Convfmt |
Number conversion format (default is%.6g) |
ENVIRON |
An associative array of environment variables. |
Errno |
Description of the last system error. |
FieldWidths |
A list of field widths separated by a space key. |
FILENAME |
The current file name. |
FNR |
Same as NR, but relative to the current file. |
Fs |
The field delimiter (the default is any space). |
IGNORECASE |
If true, the matching of the case is ignored. |
Nf |
The number of fields in the current record. |
Nr |
The current number of records. |
Ofmt |
The output format of the number (the default value is%.6g). |
OFS |
The Output field delimiter (the default value is a space). |
ORS |
The output record delimiter (the default value is a newline character). |
Rlength |
The length of the string that is matched by the match function. |
Rs |
Record delimiter (default is a line break). |
Rstart |
The first position of a string that is matched by the match function. |
Subsep |
Array subscript delimiter (default is/034). |
Six. awk operator
Operator |
Describe |
= += -= *= /= %= ^= **= |
Assign value |
?: |
C-Conditional expression |
|| |
Logical OR |
&& |
Logic and |
~ ~! |
Match regular expressions and mismatched regular expressions |
< <= > >= = = = |
Relational operators |
Space |
Connection |
+ - |
Add, Subtract |
*/& |
Multiply, divide and seek remainder |
+ - ! |
Unary Plus, minus and logical non- |
^ *** |
exponentiation |
++ -- |
To increase or decrease, as a prefix or suffix. |
$ |
Field reference |
Inch |
Array members |
Seven. Records and Domains
7.1. Recording
Awk calls each line that ends with a newline character a record.
Record delimiter: The default input and output separators are carriage returns, which are stored in the built-in variables ors and Rs.
Variable: it refers to the entire record. such as $ Awk ' {print $} ' test will output all records in the test file.
Variable NR: A counter that increases the value of NR by 1 per record after processing.
such as $ Awk ' {print nr,$0} ' test outputs all records in the test file and displays the record number before recording.
7.2. Domain
Each word in the record is called a field, separated by a space or tab by default. Awk can track the number of fields and save the value in the built-in variable NF. such as $ Awk ' {print $1,$3} ' test will print the first and third columns (fields) separated by spaces in the test file.
7.3. Domain Separators
The built-in variable FS holds the value of the input field delimiter, which is the default space or tab. We can modify the value of FS with the-F command-line option. such as $ awk-f: ' {print $1,$5} ' test will print the contents of the first, fifth column with a colon delimiter.
You can use multiple domain separators at the same time, you should write the delimiter in square brackets, such as $awk-f ' [:/t] ' {print $1,$3} ' test, which represents a space, colon, and tab as delimiters.
The delimiter for the output field is a space by default and is saved in OFS. such as $ awk-f: ' {print $1,$5} ' test,$1 and $ A comma is the value of OFS.
Eight. Match operator (~)
Used to match a regular expression within a record or domain. such as $ awk ' ~/^root/' test will display the row in the first column of the test file that starts with root.
Nine. Compare expressions
Conditional expression1? Expression2:expression3,
For example:
$ Awk ' {max = {$ > $ $3:print} ' test. If the first field is larger than the third field, $ $ is assigned to Max, otherwise $ $ is assigned to Max.
$ Awk ' $ + $ < ' test. If the first and second fields are added greater than 100, the rows are printed.
$ Awk ' $ > 5 && $ < ' test if the first field is greater than 5, and the second field is less than 10, the lines are printed.
10. Scope templates
A range template matches all rows from the first occurrence of the first template to the first occurrence of the second template. If a template does not appear, it matches to the beginning or end. such as $ awk '/root/,/mysql/' test will show the first time that root appears to all rows between MySQL first occurrence.
Eleven. Example
1. Getting Started instance
1.1 shows the 5 most recently logged in accounts:
#last-N 5 | awk ' {print '} ' rootrootrootdmtsairoot
1.2 If only the/ETC/PASSWD account is displayed:
#cat/etc/passwd |awk-f ': ' {print $} '
Rootdaemonbinsys
1.3 If you only show the/etc/passwd account and the shell of the account, and the account and the Shell tab-key segmentation:
#cat/etc/passwd |awk-f ': ' {print $ \ t ' $7} '
Root/bin/bashdaemon/bin/sh
Bin/bin/sh
Sys/bin/sh
1.4 If you just show/etc/passwd's account and the shell of the account, and the account and shell are separated by commas, and add the column name Name,shell to all rows, add "Blue,/bin/nosh" to the last line:
#cat/etc/passwd |awk-f ': ' BEGIN {print ' Name,shell '} {print $ ', ' $7} END {print ' Blue,/bin/nosh '} ' Name,shellroot, /bin/bashdaemon,/bin/sh
Bin,/bin/sh
Sys,/bin/sh
..... blue,/bin/nosh
1.5 Search/etc/passwd All lines that have the root keyword:
#awk-F: '/root/'/etc/passwd
Root:x:0:0:root:/root:/bin/bash
This is an example of the use of pattern, which matches the line of pattern (this is root) to execute the action (without specifying an action, the default output of the contents of each row).
Search support for the regular, for example, root start: awk-f: '/^root/'/etc/passwd
1.6 Search/etc/passwd All rows with the root keyword and display the corresponding shell
# awk-f: '/root/{print $7} '/etc/passwd
/bin/bash
1.7 Other small examples:
$ Awk '/^ (no|so)/' Test-----Prints all lines that begin with mode no or so.
$ Awk '/^[ns]/{print $ ' test-----Print this record if the record starts with N or S.
$ Awk ' $ ~/[0-9][0-9]$/(print $) test-----If the first field prints this record at the end of two digits.
$ Awk ' $ = = 100 | | $ < ' test-----if the first or equal 100 or the second field is less than 50, the line is printed.
$ Awk ' = Ten ' test-----print the first field if it is not equal to 10.
$ Awk '/test/{print $10} ' test-----If the record contains a regular expression test, the first field is added to and printed out.
$ Awk ' {print ($ > 5? "OK" $: "Error" ($)} ' test-----Print the expression value after the question mark if the first field is greater than 5, otherwise the expression value after the colon is printed.
The $ awk '/^root/,/^mysql/' test----prints all records in the range of records that begin with the regular expression root with a record that begins with the regular expressions MySQL. If a record of the beginning of a new regular expression root is found, continue printing until the next record begins with the regular expression MySQL, or to the end of the file.
2. Examples of awk built-in variables
Statistics/etc/passwd: File name, line number per line, number of columns per row, corresponding full line contents:
#awk-F ': ' {print ' filename: ' filename ', linenumber: ' NR ', columns: ' NF ', linecontent: ' $ '/etc/passwd
FILENAME:/ETC/PASSWD,LINENUMBER:1,COLUMNS:7,LINECONTENT:ROOT:X:0:0:ROOT:/ROOT:/BIN/BASHFILENAME:/ETC/PASSWD, Linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh Filename:/etc/passwd,linenumber:3, Columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x : 3:3:sys:/dev:/bin/sh
Use printf instead of print to make your code more concise and easy to read
#awk-F ': ' {printf ("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n", Filename,nr,nf,$0)} '/etc/passwd
The functions of print and printf two printouts are also available in awk.
The parameters of the print function can be variables, values, or strings. The string must be quoted in double quotation marks, and the arguments are separated by commas. If there are no commas, the parameters are concatenated together and cannot be distinguished. Here, the function of the comma is the same as the delimiter of the output file, except that the latter is a space.
The printf function, whose usage is basically similar to printf in the C language, can format strings, and when the output is complex, printf is more useful and the code more understandable.
3. awk Custom Variables
3.1. The following statistics/etc/passwd account number:
Count is a custom variable. The previous action{} has only one print, in fact print is just a statement, and action{} can have more than one statement, separated by a number.
3.2. The count is not initialized here, although the default is 0, but the proper approach is initialized to 0:
#awk ' BEGIN {count=0;print ' [Start]user count is ', count} {Count=count+1;print $;} End{print "[End]user Count is", count} '/etc/passwd
[Start]user count is 0 root:x:0:0:root:/root:/bin/bash ... [End]user Count is 40
3.3. Count the number of bytes in a file under a folder:
#ls-L |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size} ' [End]size is 8657198
3.4 If the display is in M:
#ls-L |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size/1024/1024, "M"} ' [End]size is 8.25889 M
Note that the statistics do not include subdirectories of folders.
4. Conditional statements
Count the number of bytes in a file under a folder, filtering files of 4096 size (typically folders):
#ls-L |awk ' BEGIN {size=0;print ' [start]size is ', size} {if ($5!=4096) {size=size+$5;}} End{print "[End]size is", size/1024/1024, "M"} '
[End]size is 8.22339 M
5. Looping Statements
Show/ETC/PASSWD's account:
#awk-F ': ' BEGIN {count=0;} {Name[count] = $1;count++;}; End{for (i = 0; i < NR; i++) print I, Name[i]} '/etc/passwd
0 Root
1 daemon
2 bin
3 SYS
4 Sync
5 games ...
The Linux awk command is detailed?????????? Research