awk Introduction
Linux Text Processing Tools Three musketeers: grep, sed, awk. Where grep is a text filtering tool; The SED is a text line editor, and Awk is a report Builder that formats the file, but instead of formatting the file system, the formatting is a variety of "typography" of the contents of the file, which in turn formats the display On Linux we use the GNU awk abbreviation Gawk, and gawk is the link file for awk, so the awk and gawk used on the system are the same. Gawk is a process-programming language. Gawk also supports all the functions that can be used in programming languages, such as conditional judgments, arrays, loops, and so on, so you can also call gawk a scripting language interpreter.
1) Usage format, options
Basic format: awk [options] ' program ' File ...
Program:pattern{action STATEMENT;.}, usually in single and double quotes;
Program: programming language;
Parttern:
mode; partially determines when an action statement triggers and triggers an event (begin,end)
ACTION STATEMENT:
Action statements, which can consist of multiple statements, are separated by semicolons: such as print,printf
Options (optional):
-F: Indicates the field delimiter used in the input;
-V var=value: Custom variable
separators, fields, and records
When Awk executes, a delimiter-delimited field (field) is marked $1,$2: $n called the domain identity. $ $ $ For all domains, note: and Shell variable $ characters have different meanings
Each line of the file is called a record
Omit action, the default is print $.
2) How awk works
Principle:
Awk also reads one line of text at a time while working with the text, then slices it according to the input delimiter (the default is a space character), cuts into n fragments, and then saves each piece to a variable within awk that is named $1,$2,$3 ... Wait until the last one, awk can process these fragments individually, such as showing a segment, a specific paragraph, or even additional processing of some fragments, such as counting, arithmetic, etc.
Here's how it works:
• First step: Perform begin{action; ...} Statements in a statement block;
• Step two: Read a line from a file or standard input (stdin) and execute the pattern{action ...} Statement block, which scans the file row by line, repeating the process from the first line to the last line until the file is fully read.
• Step three: Perform end{action when reading to the end of the input stream ...} Statement block
· The BEGIN statement block is executed before awk begins to read rows from the input stream, which is an optional block of statements, such as variable initialization, table-top statements for printed output tables, which can usually be written in the BEGIN statement block
· The end statement block is executed after awk reads all the rows from the input stream, such as the analysis results for all rows, such as a summary of information that is done in the end statement block, which is also an optional statement block
The General command in the pattern statement block is the most important part and is optional. If the pattern statement block is not provided, the default is {print}, which prints every row read to, and every row that awk reads executes the statement block.
Usage examples:
[[email protected] ~]# awk ' Begin{print "hello,awk!"} '
hello,awk! #BEGIN操作是第一步, you do not need to manipulate the file
[[email protected] ~]# awk ' End{print "bye,awk!"} '/etc/passwd
bye,awk! #END的操作实在文本处理之后
[Email protected] ~]# awk-f: ' Begin{print ' hello,awk! '} ' /root/{print $1,$2} ' end{print ' bye,awk! '} '/etc/passwd
hello,awk!
Root x
operator X
bye,awk!
#首先是进行BEGIN操作; Find the row with the root character, print the first and second columns, and finally perform the end operation
Usage notes:
1. Print
Points:
(1) comma delimiter;
(2) Each item of the output can be a string, or it can be a numeric value; The field, variable, or awk expression of the current record;
(3) If the item is omitted, it is equivalent to print $;
2. Variables
2.1 Built-in variables
Fs:input field seperator, default to white space characters;
Ofs:output field seperator, default to white space characters;
Rs:input record Seperator, enter the line break;
Ors:output record seperator, line break at output;
Nf:number of field, number of fields
{print NF}, {print $NF}
Nr:number of record, number of rows;
FNR: Each file is counted, the number of rows;
FileName: Current file name;
ARGC: The number of command-line arguments;
ARGV: An array that holds the parameters given by the command line;
2.2 Custom variables
(1)-V Var=value
Variable names are case-sensitive;
(2) directly defined in program
3. printf command
Formatted output: printf format, item1, ITEM2, ...
(1) format must be given;
(2) does not wrap automatically, you need to explicitly give the line-break control, \ n
(3) in format, you need to specify a format symbol for each item that follows.
Format characters:
%c: The ASCII code that displays the characters;
%d,%i: Displays decimal integers;
%e,%e: Numerical display of scientific counting method;
%f: Displayed as floating point number;
%g,%g: Displays values in scientific notation or floating-point form;
%s: Display string;
%u: unsigned integer;
Percent: show% itself;
Modifier:
#[.#]: The width of the first digital control display; The second # indicates the precision after the decimal point;
%3.1f
-: Align Left
+: Display symbols for numeric values
4. Operator
Arithmetic operators:
X+y, X-y, x*y, x/y, X^y, x%y
-X
+x: converted to numerical value;
String operator: unsigned operator, string connection
Assignment operators:
=, +=, -=, *=, /=, %=, ^=
++, --
Comparison operators:
>=, <, <=,! =, = =
Pattern-matching characters:
~: whether match
!~: does not match
Logical operators:
&&
||
!
Function call:
Function_name (ARGU1, ARGU2, ...)
Conditional expression:
Selector?if-true-expression:if-false-expression
# awk-f: ' {$3>=1000?usertype= ' Common User ': usertype= "Sysadmin or Sysuser";p rintf "%15s:%-s\n", $1,usertype} '/etc/ passwd
5. PATTERN
(1) Empty: null mode, matching each line;
(2)/regular expression/: Only the rows that can be matched to the pattern here are processed;
(3) Relational expression: The relationship expressions, the result is "true" has "false", the result is "true" will be processed;
True: The result is a value other than 0, not an empty string;
(4) Line ranges: range,
startline,endline:/pat1/,/pat2/
Note: Formats that give numbers directly are not supported
~]# awk-f: ' (nr>=2&&nr<=10) {print '} '/etc/passwd
(5) Begin/end mode
begin{}: Executes only once before starting to process the text in the file;
end{}: Executes only once after the text processing is complete;
6. Commonly used action
(1) Expressions
(2) Control statements:if, while and so on;
(3) Compound statements: combined statement;
(4) Input statements
(5) Output statements
7. Control statements
if (condition) {statments}
if (condition) {statments} else {statements}
while (Conditon) {statments}
Do {statements} while (condition)
for (EXPR1;EXPR2;EXPR3) {statements}
Break
Continue
Delete Array[index]
Delete array
Exit
{statements}
7.1 If-else
Syntax: if (condition) statement [Else statement]
~]# awk-f: ' {if ($3>=1000) {printf ' Common User:%s\n ', $ ' else {printf ' root or Sysuser:%s\n ', ' $ '} '/etc/passwd
~]# awk-f: ' {if ($NF = = "/bin/bash") print $ '/etc/passwd
~]# awk ' {if (nf>5) print $} '/etc/fstab
~]# Df-h | awk-f[%] '/^\/dev/{print $ ' | awk ' {if ($NF >=20) print $} '
Usage scenario: Make a conditional judgment on the entire row or field obtained by awk;
7.2 While Loop
Syntax: while (condition) statement
The condition "true", enters the circulation, the condition "false", exits the circulation;
Usage Scenario: Use when processing multiple fields in a row one at a time, using each element of an array in a single process;
~]# awk '/^[[:space:]]*linux16/{i=1;while (i<=nf) {print $i, length ($i); i++}} '/etc/grub2.cfg
~]# awk '/^[[:space:]]*linux16/{i=1;while (I<=NF) {if (length ($i) >=7) {print $i, Length ($i)}; i++}} '/etc/ Grub2.cfg
7.3 Do-while Cycle
Syntax: do statement while (condition)
Meaning: At least one loop body is executed
7.4 For Loop
Syntax: for (EXPR1;EXPR2;EXPR3) statement
For (variable assignment;condition;iteration process) {For-body}
~]# awk '/^[[:space:]]*linux16/{for (i=1;i<=nf;i++) {print $i, Length ($i)}} '/etc/grub2.cfg
Special usage:
Ability to iterate through the elements in an array;
Syntax: for (var in array) {For-body}
7.5 Switch statement
Syntax: switch (expression) {case VALUE1 or/regexp/: statement, Case VALUE2 or/regexp2/: statement; ...; default:statement}
7.6 Break and continue
Break [n]
Continue
7.7 Next
End the processing of the bank in advance and go directly to the next line;
~]# awk-f: ' {if ($3%2!=0) next; print $1,$3} '/etc/passwd
8. Array
Associative array: array[index-expression]
Index-expression:
(1) You can use any string; string to use double quotation marks;
(2) If an array element does not exist beforehand, when referenced, awk automatically creates this element and initializes its value to "empty string";
To determine if an element exists in an array, use the "index in array" format;
weekdays[mon]= "Monday"
To iterate through each element in the array, use the For loop;
for (var in array) {For-body}
~]# awk ' begin{weekdays["Mon"]= "Monday" weekdays["Tue"]= "Tuesday"; for (I in weekdays) {print Weekdays[i]}} '
Note: Var iterates through each index of the array;
state["LISTEN"]++
state["established"]++
~]# Netstat-tan | awk '/^tcp\>/{state[$NF]++}end{for (i in state) {print I,state[i]}} '
~]# awk ' {ip[$1]++}end{for (i in IP) {print i,ip[i]}} '/var/log/httpd/access_log
Exercise 1: Count the number of occurrences of each file system type in the/etc/fstab file;
~]# awk '/^uuid/{fs[$3]++}end{for (i in FS) {print I,fs[i]}} '/etc/fstab
Exercise 2: Count the occurrences of each word in the specified file;
~]# awk ' {for (i=1;i<=nf;i++) {count[$i]++}}end{for (i in count) {print I,count[i]}} '/etc/fstab
9. Functions
9.1 Built-in functions
Numerical Processing:
RAND (): Returns a random number between 0 and 1;
String processing:
Length ([s]): Returns the length of the specified string;
Sub (r,s,[t]): Finds the matched content in the character represented by T in the pattern represented by R and replaces it with the content represented by S for the first time;
Gsub (R,s,[t]): Finds the matched content in the character represented by T in the pattern represented by R and replaces all occurrences with the content represented by S;
Split (S,a[,r]): Cuts the character s with the R delimiter and saves the resulting cut to the array represented by A;
~]# Netstat-tan | awk '/^tcp\>/{split ($5,ip, ":"); Count[ip[1]]++}end{for (i in count) {print I,count[i]}} '
This article from "Wang Liming" blog, declined reprint!
The text of the Three Musketeers on Linux