One day a shell command Linux text content Operation series-awk command detailed _linux Shell

Source: Internet
Author: User
Tags numeric natural logarithm posix square root


Brief introduction



Awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing and generating reports on data, compared to grep lookup and sed editing. To put it simply, awk reads the file line by row, using a space as the default delimiter to slice each row, cut the section, and then perform various analytical processing.
AWK has 3 different versions: AWK, Nawk, and gawk, which are not specifically described, and generally refer to Gawk,gawk as the GNU version of awk.
Awk has its name from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan. In fact Awk does have its own language: The AWK programming language, which the three-bit creator has formally defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, as well as countless other features.



How to use



awk ' {pattern + action} ' {filenames}



Although the operation can be complex, the syntax is always the same, where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.



The most basic function of the awk language is to browse and extract information based on specified rules in a file or string, and awk extracts the information before it can perform other text operations. A complete awk script is typically used to format the information in a text file.



Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command.






Invoke awk



There are three ways of calling Awk



Description



Awk is designed to be used in data flow to manipulate columns and rows. and SED is more of a match, for replacement and deletion.
Awk has a number of built-in functions, such as arrays, functions, and so on. Flexibility is the biggest advantage of awk.



The structure of awk
awk '
begin{print "Start"}
Pattern {Commands}
end{print "End"} '
File
In order to be biased to watch, I hit the carriage return, is actually a line



An awk script is usually a 3-part
1. BEGIN statement Block
2. Common statement blocks capable of using pattern matching
3. End Statement block
Any part of them could not be in the script now. Scripts are usually enclosed in double quotes or single quotes.
For example:



awk ' Begin{i=0}{i++}end{print i} ' filename



Working principle



The awk command works in the following ways:



1. Execute the statement in the Begin{commands} statement block
2. Read a row from a file or stdin, and then execute Pattern{commands}. Iterations until all reads complete
3. Final execution End{commands} statement block



Remind again that any one of them can be without



And awk is far more functional than that.



Getting Started example:





Copy Code code as follows:

echo | awk ' {var1= ' v1 '; var2= ' v2 '; var3= ' v3 '; print var1,var2,var3;} '
Print: V1 v2 v3





Explanation: Comma is delimiter (delimiter)



echo | awk ' {var1= ' v1 '; var2= ' v2 '; var3= ' v3 '; print var1 '-' var2 '-' var3;} '



Print V1-v2-v3



Explanation: double quotes as connectors



No other symbols, no normal output v1,v2,v3



Read--help (a very large and complex help document, the official use of 410 pages of space PDF to introduce, if I have a word, you believe I do not believe. )


Usage: awk [POSIX or GNU style option]-F script file [--] file ...
Usage: awk [POSIX or GNU style options] [--] ' program ' File ...
POSIX options: GNU long option:
-F Script File--file= script file
-F FS--field-separator=fs
Specifies the input text separator, which is either a string or a regular expression.
-V Var=val--assign=var=val
To pay the external variable value to Var
-M[FR] Val
-O--optimize
Enables an internal representation of some optimizations.
-W Compat--compat
Run awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.
-W copyleft--copyleft
Print Short copyright information
-W Copyright--copyright
Print a short version of the generic Public License, and then exit
-W Dump-variables[=file]--dump-variables[=file]
Prints the global variable, its type, and the sorted list of the final values submitted.
-W Exec=file--exec=file
Similar to-F, but it's two different from him, (I'll go back and upload the relevant document, too long)
-W Gen-po--gen-po
(Too much content)
-W help--help printing assistance
-W Lint[=fatal]--lint[=fatal]
Structure that warns suspicious or not ported to other AWK implementations
-W Lint-old--lint-old
Print a warning about a structure that cannot be ported to a traditional UNIX platform
-W Non-decimal-data--non-decimal-data
Enable interpretation of automatic input data, octal and hexadecimal values
-W Profile[=file]--profile[=file]
Enable awk Profiling
-W POSIX--posix
In the strict sense of POSIX mode operation.
-W Re-interval--re-interval
Allow interval expressions on regular expressions
-W Source=program-text--source=program-text
-W Traditional--traditional
Regular expression matching for traditional Unix awk
-W Usage--usage
-W use-lc-numeric--use-lc-numeric
Decimal characters in a language environment that is enforced when numeric input is parsed
Data
-W version--version
Submit error reports refer to the "Bugs" page in "Gawk.info", which is located in the print version of the "Reporting
Problems and Bugs "section


Note: Gawk is the GNU version of awk, even if help is needed to install gawk under Ubuntu



This time we do not read, in order to increase the information and fun, first to some basic:



Some special variables:



NR: Indicates the number of records, corresponding to line numbers during execution
NF: The number of fields that correspond to the number of fields in the current row during execution
$: This variable contains the text content of the current line during execution
$: The text content of the first field
$: Text content for the second field



Example:



Example 1.





Copy Code code as follows:

Echo-e "line1 F2 f3\nline2 f4 f5\nline3 f6 F7" |\ #这个 \ is used to write multiline commands in a window
awk ' {
Print "Line no:" NR ", No of Fields:" NF, "$0=" $, "$1=" $, "$2=" $, "$3=" $
}'





Note: $ is printed first, $NF prints the last field, $ (NF-1) prints the penultimate



Example 2.



Seq 5 | awk ' begin{sum=0;print ' summation: "}{print $ +"; sum+=1}end{print "= =";p rint sum} '



This example uses the basic format.



Sum is initialized in begin, print summation
The middle module prints the first column and then gives the sum+1



Sum was printed in end



Example 3. About-V external variables





Copy Code code as follows:

$ var=10000
$echo | Awk–v variable= $VAR ' {print varable} '





There is another flexible way to pass multiple external variables to awk, for example:





Copy Code code as follows:

$var 1= "value1" var2= "value2"
$echo | awk ' {print v1,v2} ' v1= $var 1 v2= $var 2





If you are from a file



awk ' {print v1,v2} ' v1= $var 1 v2= $var 2 filename



Example 4



$ awk 'NR <5' #line number is less than 5
$ awk 'nr == 1, nr == 4' #Lines with line numbers between 1 and 5
$ awk '/ linux /' #include linux line (you can specify a style with a regular expression)
$ awk '! / linux /' # lines without style linux



This is the first time to write this, and strive for a more comprehensive understanding of awk in 2 pages.



awk Supplement



Before we learned the basics of awk, I was pleasantly surprised to find that there was a detailed article in awk, a writing idea that could not be reproduced and translated into its own way to write some.



Speaker built-in variables and partial string functions



Built-in variables (with translation of special variables and environment variables, according to official translations into built-in variables)





Variable

Description

$n The nth field in the current record, separated by FS between fields.
$ The full input record.
ARGC The number of command line arguments.
Argind The location of the current file on the command line, starting at 0.
Argv An array that contains command line arguments.
Binmode On a non-POSIX system, this variable specifies all I/O uses binary mode
Convfmt numeric conversion format (default value is%.6g)
ENVIRON An associative array of environment variables.
Errno Description of the last system error.
FieldWidths A list of field widths, separated by a space key.
FILENAME The current file name.
FNR with NR, but relative to the current file
Fpat

This is a regular expression (string) that tells Gawk to create a field based on the text that matches the regular expression

Fs The field delimiter (the default is any space).
IGNORECASE If true, a match is ignored for case-insensitive.
LINT

When this variable is true (not 0 or non-null), gawk behaves as if the "--lint" command line option

Nf The number of fields in the current record.
Nr The current number of records.
Ofmt The output format of the number (the default value is%.6g).
OFS The Output field delimiter (the default is a space).
ORS The output record delimiter (the default is a newline character).
ProcInfo

The elements of this array provide access to the information that runs the awk program

Rlength The length of the string that is matched by the match function.
Rs Record delimiter (default is a newline character).
Rt One record at a time is read by the settings
Rstart The first position of the string that is matched by the match function.


Subsep

The array subscript delimiter (the default is \034).

Textdomain This variable is used for internationalization of the program





Blue is the newly added built-in variable.






Simple example:



1.
01.sed 1Q/ETC/PASSWD | awk ' {FS = ': '; Print} '




Print password first line, with colon separator



2.


Copy Code code as follows:

awk ' End{print FILENAME} ' awk.txt





Print text filename



3. SEQ 100 | awk ' nr==4,nr==6 '



Print 4 to 6 lines



Here are some of the string functions that are built into awk.



Length (String):
Returns the length of a string



Index (string,serch_string):
Returns the position that search_string appears in the string



Split (String,array,delimiter):
Generates a list of strings with delimiters and stores the list in an array



SUBSTR (String,array,delimiter):
Generates a substring in a string with the beginning and end of the character, and returns the substring



Sub (regex,replacement_str,string):
Replaces the first occurrence of a regular expression with a replacement_str



Gsub (regex,replacement_str,string):
Similar to sub (). But the function replaces all the content that the regular expression matches to



Match (regex,string):
Checks whether a regular expression can match a string. If it can match, return a value of not 0, otherwise, the return 0.match () has two related special variables, namely Rstart drink rlength. The variable Rstart contains the true position of what the regular expression matches, and the variable rlength contains the length of the content that the regular expression matches.



Example:



1.$ awk ' {sub (/test/, mytest); print} ' Testfile



Matches in the entire record, the substitution occurs only when the first match occurs



2.$ awk ' {sub (/test/, mytest); Print} ' Testfile



Matches in the first field of the entire record, and the substitution occurs only when the first match occurs



3.$ awk ' {print index (' Test ', ' mytest ')} ' testfile



Instance returns the position of test in mytest, and the result should be 3



4.$ awk ' {print length (' Test ')} '



Instance returns the length of the test string.



awk Add two



This section may be written more coarse, too little time.



A. Built-in functions
Note that a convention is commonly known as a grammatical habit: [A] represents an optional.



Number function (Numeric functions)





The name of the function

Description

ATAN2 (y,x) Returns the tangent of the y/x arc
COS (x) Returns the cosine of X
EXP (x) Returns the exponent of X
int (x) Returns the nearest integer, the vane pointing to 0
Log (x) Returns the natural logarithm of X
RAND () return random number
Sin (x) Returns the sine of X
sqrt (x) Returns the positive square root of X
Srand ([x]) Generate random numbers, you can set the starting point





String manipulation functions (String-manipulation functions)
Note: The blue section is gawk specific, and awk does not have this function.





The name of the function

Description

Asort (source [, dest [, how]]) Returns the number of elements in an array (more content)
Asorti (source [, dest [, how]]) With Asort, (with subtle differences)
Gensub (regexp, replacement, how [, Target]) Search for regular expression regexp matching regexp
Gsub (regexp, replacement [, Target]) Replaces the first occurrence of a regular expression with a replacement_str
Index (in, find) Returns the location where find appears in string in
Length ([string]) Number of characters in string
Match (String, regexp [, array]) Checks whether a regular expression matches a string
Patsplit (String, array [, Fieldpat [, SEPs]])

Divide the pieces into a string defined by Fieldpat and stored in an array, the delimited string exists in the SEPs array

Split (string, array [, Fieldsep [, SEPs]]) Generates a list of strings with delimiters and stores the list in an array
sprintf (format, expression1, ...) Print
Strtonum (str) Convert characters to numbers
Sub (regexp, replacement [, Target]) Replaces the first occurrence of a regular expression with a replacement
SUBSTR (String, start [, length]) Split string, depending on location and length
ToLower (String) Convert to lowercase
ToUpper (String) Convert to uppercase





Input and output functions (Input/output functions)





Function

Description

Close (filename [, how]) Close file Input Output stream
Fflush ([filename]) Refreshes any buffered output associated with the file name
System (Command) Execute operating system command, return value to Awk program





time function (functions)





Function

Description

Mktime (DATESPEC) Datespec is the timestamp format, as in the Systime () format
strftime ([format [, timestamp [, Utc-flag]]]) Format the contents of the timestamp and return the date format
Systime () Returns the system time, accurate to the second





Bitwise action Function (Bit-manipulation functions)





Function

Description

and (v1, v2) V1,V2 and operating results
Compl (Val) The inverse code of VAL
LShift (Val, Count) Returns the value of the left-shift count bit of Val
or (v1, v2) V1,v2 or operation
Rshift (Val, Count) Returns the value of Val right shift count bit
XOR (v1, v2) Returns the value of an V1,V2





Get type information (getting type information)





Function

Description

IsArray (x) Returns True if X is an array. otherwise false





String conversion functions (String-translation functions)





Function

Description

Bindtextdomain (directory [, Domain]) Set up the directories and domains in which awk wants to search for information
Dcgettext (string [, Domain [, category]]) The returned string string translates the language environment category for the Text field domain category
Dcngettext (string1, string2, number [, Domain [, category]])

Returns the plural form of the number of translations for string1 and string2, string1,string2 in the text field of the language environment category





Built-in functions also have some advanced features, such as many instances, and later have the opportunity to supplement.



Two. Custom function



Format into the following:


Copy Code code as follows:

function name ([parameter-list])
{
Body-of-function
}





Such as:


Copy Code code as follows:

function Myprint (num)
{
printf "%6.3g\n", num
}





The command of awk has many functions, and it is intended to be written so much. Later may be more in some instances, when combined with other commands, it will be mentioned.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.