Brief introduction
Awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing and generating reports on data, compared to grep lookup and sed editing. To put it simply, awk reads the file line by row, using a space as the default delimiter to slice each row, cut the section, and then perform various analytical processing.
AWK has 3 different versions: AWK, Nawk, and gawk, which are not specifically described, and generally refer to Gawk,gawk as the GNU version of awk.
Awk has its name from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan. In fact Awk does have its own language: The AWK programming language, which the three-bit creator has formally defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, as well as countless other features.
How to use
awk ' {pattern + action} ' {filenames}
Although the operation can be complex, the syntax is always the same, where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.
The most basic function of the awk language is to browse and extract information based on specified rules in a file or string, and awk extracts the information before it can perform other text operations. A complete awk script is typically used to format the information in a text file.
Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command.
Invoke awk
There are three ways of calling Awk
Description
Awk is designed to be used in data flow to manipulate columns and rows. and SED is more of a match, for replacement and deletion.
Awk has a number of built-in functions, such as arrays, functions, and so on. Flexibility is the biggest advantage of awk.
The structure of awk
awk '
begin{print "Start"}
Pattern {Commands}
end{print "End"} '
File
In order to be biased to watch, I hit the carriage return, is actually a line
An awk script is usually a 3-part
1. BEGIN statement Block
2. Common statement blocks capable of using pattern matching
3. End Statement block
Any part of them could not be in the script now. Scripts are usually enclosed in double quotes or single quotes.
For example:
awk ' Begin{i=0}{i++}end{print i} ' filename
Working principle
The awk command works in the following ways:
1. Execute the statement in the Begin{commands} statement block
2. Read a row from a file or stdin, and then execute Pattern{commands}. Iterations until all reads complete
3. Final execution End{commands} statement block
Remind again that any one of them can be without
And awk is far more functional than that.
Getting Started example:
Copy Code code as follows:
echo | awk ' {var1= ' v1 '; var2= ' v2 '; var3= ' v3 '; print var1,var2,var3;} '
Print: V1 v2 v3
Explanation: Comma is delimiter (delimiter)
echo | awk ' {var1= ' v1 '; var2= ' v2 '; var3= ' v3 '; print var1 '-' var2 '-' var3;} '
Print V1-v2-v3
Explanation: double quotes as connectors
No other symbols, no normal output v1,v2,v3
Read--help (a very large and complex help document, the official use of 410 pages of space PDF to introduce, if I have a word, you believe I do not believe. )
Usage: awk [POSIX or GNU style option]-F script file [--] file ...
Usage: awk [POSIX or GNU style options] [--] ' program ' File ...
POSIX options: GNU long option:
-F Script File--file= script file
-F FS--field-separator=fs
Specifies the input text separator, which is either a string or a regular expression.
-V Var=val--assign=var=val
To pay the external variable value to Var
-M[FR] Val
-O--optimize
Enables an internal representation of some optimizations.
-W Compat--compat
Run awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.
-W copyleft--copyleft
Print Short copyright information
-W Copyright--copyright
Print a short version of the generic Public License, and then exit
-W Dump-variables[=file]--dump-variables[=file]
Prints the global variable, its type, and the sorted list of the final values submitted.
-W Exec=file--exec=file
Similar to-F, but it's two different from him, (I'll go back and upload the relevant document, too long)
-W Gen-po--gen-po
(Too much content)
-W help--help printing assistance
-W Lint[=fatal]--lint[=fatal]
Structure that warns suspicious or not ported to other AWK implementations
-W Lint-old--lint-old
Print a warning about a structure that cannot be ported to a traditional UNIX platform
-W Non-decimal-data--non-decimal-data
Enable interpretation of automatic input data, octal and hexadecimal values
-W Profile[=file]--profile[=file]
Enable awk Profiling
-W POSIX--posix
In the strict sense of POSIX mode operation.
-W Re-interval--re-interval
Allow interval expressions on regular expressions
-W Source=program-text--source=program-text
-W Traditional--traditional
Regular expression matching for traditional Unix awk
-W Usage--usage
-W use-lc-numeric--use-lc-numeric
Decimal characters in a language environment that is enforced when numeric input is parsed
Data
-W version--version
Submit error reports refer to the "Bugs" page in "Gawk.info", which is located in the print version of the "Reporting
Problems and Bugs "section
Note: Gawk is the GNU version of awk, even if help is needed to install gawk under Ubuntu
This time we do not read, in order to increase the information and fun, first to some basic:
Some special variables:
NR: Indicates the number of records, corresponding to line numbers during execution
NF: The number of fields that correspond to the number of fields in the current row during execution
$: This variable contains the text content of the current line during execution
$: The text content of the first field
$: Text content for the second field
Example:
Example 1.
Copy Code code as follows:
Echo-e "line1 F2 f3\nline2 f4 f5\nline3 f6 F7" |\ #这个 \ is used to write multiline commands in a window
awk ' {
Print "Line no:" NR ", No of Fields:" NF, "$0=" $, "$1=" $, "$2=" $, "$3=" $
}'
Note: $ is printed first, $NF prints the last field, $ (NF-1) prints the penultimate
Example 2.
Seq 5 | awk ' begin{sum=0;print ' summation: "}{print $ +"; sum+=1}end{print "= =";p rint sum} '
This example uses the basic format.
Sum is initialized in begin, print summation
The middle module prints the first column and then gives the sum+1
Sum was printed in end
Example 3. About-V external variables
Copy Code code as follows:
$ var=10000
$echo | Awk–v variable= $VAR ' {print varable} '
There is another flexible way to pass multiple external variables to awk, for example:
Copy Code code as follows:
$var 1= "value1" var2= "value2"
$echo | awk ' {print v1,v2} ' v1= $var 1 v2= $var 2
If you are from a file
awk ' {print v1,v2} ' v1= $var 1 v2= $var 2 filename
Example 4
$ awk 'NR <5' #line number is less than 5
$ awk 'nr == 1, nr == 4' #Lines with line numbers between 1 and 5
$ awk '/ linux /' #include linux line (you can specify a style with a regular expression)
$ awk '! / linux /' # lines without style linux
This is the first time to write this, and strive for a more comprehensive understanding of awk in 2 pages.
awk Supplement
Before we learned the basics of awk, I was pleasantly surprised to find that there was a detailed article in awk, a writing idea that could not be reproduced and translated into its own way to write some.
Speaker built-in variables and partial string functions
Built-in variables (with translation of special variables and environment variables, according to official translations into built-in variables)
Variable |
Description |
$n |
The nth field in the current record, separated by FS between fields. |
$ |
The full input record. |
ARGC |
The number of command line arguments. |
Argind |
The location of the current file on the command line, starting at 0. |
Argv |
An array that contains command line arguments. |
Binmode |
On a non-POSIX system, this variable specifies all I/O uses binary mode |
Convfmt |
numeric conversion format (default value is%.6g) |
ENVIRON |
An associative array of environment variables. |
Errno |
Description of the last system error. |
FieldWidths |
A list of field widths, separated by a space key. |
FILENAME |
The current file name. |
FNR |
with NR, but relative to the current file |
Fpat |
This is a regular expression (string) that tells Gawk to create a field based on the text that matches the regular expression |
Fs |
The field delimiter (the default is any space). |
IGNORECASE |
If true, a match is ignored for case-insensitive. |
LINT |
When this variable is true (not 0 or non-null), gawk behaves as if the "--lint" command line option |
Nf |
The number of fields in the current record. |
Nr |
The current number of records. |
Ofmt |
The output format of the number (the default value is%.6g). |
OFS |
The Output field delimiter (the default is a space). |
ORS |
The output record delimiter (the default is a newline character). |
ProcInfo |
The elements of this array provide access to the information that runs the awk program |
Rlength |
The length of the string that is matched by the match function. |
Rs |
Record delimiter (default is a newline character). |
Rt |
One record at a time is read by the settings |
Rstart |
The first position of the string that is matched by the match function. |
Subsep |
The array subscript delimiter (the default is \034). |
Textdomain |
This variable is used for internationalization of the program |
Blue is the newly added built-in variable.
Simple example:
1.
01.sed 1Q/ETC/PASSWD | awk ' {FS = ': '; Print} '
Print password first line, with colon separator
2.
Copy Code code as follows:
awk ' End{print FILENAME} ' awk.txt
Print text filename
3. SEQ 100 | awk ' nr==4,nr==6 '
Print 4 to 6 lines
Here are some of the string functions that are built into awk.
Length (String):
Returns the length of a string
Index (string,serch_string):
Returns the position that search_string appears in the string
Split (String,array,delimiter):
Generates a list of strings with delimiters and stores the list in an array
SUBSTR (String,array,delimiter):
Generates a substring in a string with the beginning and end of the character, and returns the substring
Sub (regex,replacement_str,string):
Replaces the first occurrence of a regular expression with a replacement_str
Gsub (regex,replacement_str,string):
Similar to sub (). But the function replaces all the content that the regular expression matches to
Match (regex,string):
Checks whether a regular expression can match a string. If it can match, return a value of not 0, otherwise, the return 0.match () has two related special variables, namely Rstart drink rlength. The variable Rstart contains the true position of what the regular expression matches, and the variable rlength contains the length of the content that the regular expression matches.
Example:
1.$ awk ' {sub (/test/, mytest); print} ' Testfile
Matches in the entire record, the substitution occurs only when the first match occurs
2.$ awk ' {sub (/test/, mytest); Print} ' Testfile
Matches in the first field of the entire record, and the substitution occurs only when the first match occurs
3.$ awk ' {print index (' Test ', ' mytest ')} ' testfile
Instance returns the position of test in mytest, and the result should be 3
4.$ awk ' {print length (' Test ')} '
Instance returns the length of the test string.
awk Add two
This section may be written more coarse, too little time.
A. Built-in functions
Note that a convention is commonly known as a grammatical habit: [A] represents an optional.
Number function (Numeric functions)
The name of the function |
Description |
ATAN2 (y,x) |
Returns the tangent of the y/x arc |
COS (x) |
Returns the cosine of X |
EXP (x) |
Returns the exponent of X |
int (x) |
Returns the nearest integer, the vane pointing to 0 |
Log (x) |
Returns the natural logarithm of X |
RAND () |
return random number |
Sin (x) |
Returns the sine of X |
sqrt (x) |
Returns the positive square root of X |
Srand ([x]) |
Generate random numbers, you can set the starting point |
String manipulation functions (String-manipulation functions)
Note: The blue section is gawk specific, and awk does not have this function.
The name of the function |
Description |
Asort (source [, dest [, how]]) |
Returns the number of elements in an array (more content) |
Asorti (source [, dest [, how]]) |
With Asort, (with subtle differences) |
Gensub (regexp, replacement, how [, Target]) |
Search for regular expression regexp matching regexp |
Gsub (regexp, replacement [, Target]) |
Replaces the first occurrence of a regular expression with a replacement_str |
Index (in, find) |
Returns the location where find appears in string in |
Length ([string]) |
Number of characters in string |
Match (String, regexp [, array]) |
Checks whether a regular expression matches a string |
Patsplit (String, array [, Fieldpat [, SEPs]]) |
Divide the pieces into a string defined by Fieldpat and stored in an array, the delimited string exists in the SEPs array |
Split (string, array [, Fieldsep [, SEPs]]) |
Generates a list of strings with delimiters and stores the list in an array |
sprintf (format, expression1, ...) |
Print |
Strtonum (str) |
Convert characters to numbers |
Sub (regexp, replacement [, Target]) |
Replaces the first occurrence of a regular expression with a replacement |
SUBSTR (String, start [, length]) |
Split string, depending on location and length |
ToLower (String) |
Convert to lowercase |
ToUpper (String) |
Convert to uppercase |
Input and output functions (Input/output functions)
Function |
Description |
Close (filename [, how]) |
Close file Input Output stream |
Fflush ([filename]) |
Refreshes any buffered output associated with the file name |
System (Command) |
Execute operating system command, return value to Awk program |
time function (functions)
Function |
Description |
Mktime (DATESPEC) |
Datespec is the timestamp format, as in the Systime () format |
strftime ([format [, timestamp [, Utc-flag]]]) |
Format the contents of the timestamp and return the date format |
Systime () |
Returns the system time, accurate to the second |
Bitwise action Function (Bit-manipulation functions)
Function |
Description |
and (v1, v2) |
V1,V2 and operating results |
Compl (Val) |
The inverse code of VAL |
LShift (Val, Count) |
Returns the value of the left-shift count bit of Val |
or (v1, v2) |
V1,v2 or operation |
Rshift (Val, Count) |
Returns the value of Val right shift count bit |
XOR (v1, v2) |
Returns the value of an V1,V2 |
Get type information (getting type information)
Function |
Description |
IsArray (x) |
Returns True if X is an array. otherwise false |
String conversion functions (String-translation functions)
Function |
Description |
Bindtextdomain (directory [, Domain]) |
Set up the directories and domains in which awk wants to search for information |
Dcgettext (string [, Domain [, category]]) |
The returned string string translates the language environment category for the Text field domain category |
Dcngettext (string1, string2, number [, Domain [, category]]) |
Returns the plural form of the number of translations for string1 and string2, string1,string2 in the text field of the language environment category |
Built-in functions also have some advanced features, such as many instances, and later have the opportunity to supplement.
Two. Custom function
Format into the following:
Copy Code code as follows:
function name ([parameter-list])
{
Body-of-function
}
Such as:
Copy Code code as follows:
function Myprint (num)
{
printf "%6.3g\n", num
}
The command of awk has many functions, and it is intended to be written so much. Later may be more in some instances, when combined with other commands, it will be mentioned.