Linux Text Formatting tool awk
I. Text processing tools
Grep, sed, and awk are both text processing tools. Although both are text processing tools, they both have their own advantages and disadvantages. A text processing command cannot be completely replaced by another one, otherwise, there will be no three text processing commands. However, sed and awk are more powerful in comparison and have been introduced in an independent language.
Grep: text filter. If you only filter text, you can use grep, which is much more efficient than others. sed: Stream EDitor, Stream EDitor, which is only used for processing mode space by default, if you do not process the original data, you can use sed if the data you process is for rows;
Awk: Report Generator. formatted and displayed later. If you need to generate reports and other information for the processed data, or the data you process is processed by column, it is best to use awk.
Ii. functions that awk can accomplish
View text files as text databases composed of records and fields
Using variables to operate databases
Use Arithmetic and string Operators
Use common program design structures, such as loops and conditions
Help formatting report
Define functions
Run unix commands from scripts
Processing unix Command results
More clever processing of command line parameters
It is easier to process multiple input streams
Iii. Syntax format
12 |
# awk [options] 'script' file1 file2, ... # awk [options] 'PATTERN { action }' file1 file2, .. |
1. Options
-F fs or -- field-separator fs:
Specifies the delimiter used to break the input file. fs is a string or a regular expression, such as-F:
1234 |
#awk -F: '/root/{print $1,$NF}' /etc/passwd #awk -F: '/root/{print $1$NF}' /etc/passwd #awk -F: '/root/{print $1 $NF}' /etc/passwd #awk -F: '/root/{print $1"#"$NF}' /etc/passwd |
-The variables defined by Option v exist before the script runs. They can be called in the BEGIN process of the script;
2. awk output: print and printf
(1) print format:
Print item1, item2 ,...
Key points:
(1) separate projects with commas (,), and separate output items with blank characters;
② The output item can be a string or value, the field of the current record (such as $1), a variable or an awk expression. The value is converted to a string before being output;
③ The item after the print command can be omitted. In this case, its function is equivalent to print $0. Therefore, if you want to output blank lines, use print "";
Note: In AWK, $ indicates the field. You do not need to add $ to the user variable. This is the difference between AWK and shell or Perl! In shell, $ is not added when the variable is defined, and $ is used when the variable is referenced again. In Perl, $ is required for definition and reference. ($ in Perl indicates scalar, also, the "@" and "%" symbols indicate arrays and Hash variables ).
Instance
(2) Use format of printf
Printf format, item1, item2 ,...
The format indicators start with %, followed by a character,
% C
Convert the number to ASCII, for example, printf "% c". The 67 result is C.
% D, % I
Print a decimal integer, for example, printf "% dn". The value 6.745 is 6.
% E, % E
Convert the number to a scientific (exponential) symbol, for example, printf "% 4.3en". The 6745 result is 6.745e + 03.
% F
Print a number in floating-point notation, for example, printf "% 4.3f \ n". The 6745 result is 6745.0000000.
% S
Print the string, for example, printf "% 10s \ n". The result of 6745 is a ten-space plus 6745.
Changeable format:
N $
Location indicator to adjust the output position of a string. Printf "% s \ n", "I", "lOVE", "YOU" output: I LOVE YOU. Let's adjust the position, printf "% 3 $ s % 2 $ s % 1 $ s \ n", "YOU", "LOVE", "I", the output is: I LOVE YOU
Modifier
N: display width;
-: Left alignment; +: right alignment (it can also show positive and negative values );
3. modes and operations
(1) The mode can be any of the following:
/Regular expression/: an extension set that uses wildcards.
Relational Expression: You can use the Relational operators in the following operator table to perform operations. It can be a string or number comparison, for example, $2> % 1. Select a row whose second field is longer than the first field.
Expression for pattern matching: Operator ~ (Matching) and ~! (Mismatch ).
Mode: Specifies the range of rows. This syntax cannot include the BEGIN and END modes.
BEGIN: Specifies the action that occurs before the first input record is processed. You can set global variables here.
END: The action that occurs after the last input record is read.
(2) An operation is composed of one or more commands, functions, and expressions, separated by line breaks or semicolons, and enclosed in braces. There are four parts:
4. Variables
(1) recording variables with built-in awk Variables
FS: field separator |
When reading a file, the field separator is used |
RS: Record separator |
Linefeed used to input text information |
OFS: Output Filed Separator |
Output field separator (the default value is a space) |
ORS: Output Row Separator |
Output record separator (default is a line break) |
Note:
From $1, $2 to $ NF, the entire row is marked with $0. If $0 is assigned a new value, all $1, $2... and NF will be recalculated. Similarly, if $ I is changed, $0 will be recalculated using OFS.
(2) Data Variables of awk built-in Variables
NR: The number of input records |
Number of records processed by the awk command; If there are multiple files, this number will count the rows of the processed files in a unified manner. |
NF: Number of Field |
Number of fields in the current record |
FNR |
Number of relative records of the current file |
ARGV |
Array, save the command line string, such as awk '{print $0}' a.txt B .txtin this command, argv1_01_save awk, argv1_11_save a.txt |
ARGC |
Number of parameters of the awk command |
FILENAME |
Name of the file processed by the awk command |
ENVIRON |
An array associated with the current shell environment variables and their values |
NR usage
NF usage (separated by spaces by default)
FNR usage
ARGV usage
II.
FILENAME usage
ENVIRON usage
Note:
ARGV array consists of ARGV [0]... ARGV [ARGC-1], the first element is 0 rather than 1, which is different from the like array in AWK
The ENVIROND array is very useful in shell and AWK interaction. You can use ENVIRON ["PARA_NAME"] to obtain the value of the environment variable $ PARA_NAME, where the quotation mark "" is indispensable!
5. Standard output and redirection
(1) Output redirection
Print items> output-fileprint items | command (2), special file descriptor:/dev/stdin: Standard Input/dev/sdtout: standard output/dev/stderr: error output/dev/fd/N: a specific file descriptor. For example,/dev/stdin is equivalent to/dev/fd/0;
Instance:
12 |
# awk -F" " '{printf "%-15s %i\n",$1,$3 > "/dev/stderr" }' /etc/issue # awk -F" " '{printf "%-15s %i\n",$1,$3 > "/dev/null" }' /etc/issue |
6. awk operators:
(1) Arithmetic Operator:-x: negative value + x: convert to a numerical value; x ^ y: x ** y: x * y: Multiplication x/y: Division x + y: x-y:
X % y:
Instance
(2) string OPERATOR: there is only one operator, which is used for string connection without writing. (3) value assignment operator: = + =-= * =/= % = ^ = ** = ++ --
It should be noted that, if a mode is =,/=/may cause a syntax error, it should be replaced by/[=;
(4) Any non-0 or non-null string in the Boolean awk is True, and vice versa is false. Comparison and comparison operators: x <y True if x is less than y. x <= y True if x is less than or equal to y. x> y True if x is greater than y. x> = y True if x is greater than or equal to y. x = y True if x is equal to y. x! = Y True if x is not equal to y. x ~ Y True if the string x matches the regexp denoted by y. x !~ Y True if the string x does not match the regexp denoted by y. subscript in array True if the array has an element with the subscript. logical Relationship Between Expressions :&&
|
Instance:
Condition, condition expression: selector? If-true-exp: if-false-expif selector; then if-true-expelse if-false-expfi
Instance
Functions:
Function_name (para1, para2)
7. control statements
(1), if-else
Syntax:
If (condition) {then-body} else {[else-body]}
Instance:
1234 |
#awk '{if ($3==0) {print $1, "Adminitrator";} else { print $1,"Common User"}}' /etc/passwd #awk -F: '{if ($1=="root") print $1, "Admin"; else print $1, "Common User"}' /etc/passwd #awk -F: '{if ($1=="root") printf "%-15s: %s\n", $1,"Admin"; else printf "%-15s: %s\n", $1, "Common User"}' /etc/passwd #awk -F: -v sum=0 '{if ($3>=500) sum++}END{print sum}' /etc/passwd |
(2) while
Syntax:
While (condition) {statement1; statment2 ;...}
Instance:
1234 |
#awk -F: '{i=1;while (i<=3) {print $i;i++}}' /etc/passwd #awk -F: '{i=1;while (i<=NF) { if (length($i)>=4) {print $i}; i++ }}' /etc/passwd #awk '{i=1;while (i<=NF) {if ($i>=20000) print $i; i++}}' random.txt The contents of the random random.txt file are a bunch of random numbers. |
(3) do-while executes the loop body at least once, regardless of whether the conditions are met or not.
Syntax:
Do {statement1, statement2,...} while (condition)
Instance:
12345678 |
#awk 'BEGIN{ sum =0; i=0; do { sum +=i; i++; } while (i<=100) print sum ;}' |
(4),
Syntax: for (variable assignment; condition; iteration process) {statement1, statement2 ,...}
Instance:
1 |
#awk -F: '{for(i=1;i<=3;i++) { if (length($i)>=8) {print $i}}}' /etc/passwd |
The for loop can also be used to traverse array elements:
Syntax:
For (I in array) {statement1, statement2 ,...}
Instance:
1 |
#awk -F: '$NF!~/^$/{BASH[$NF]++}END{for(A in BASH){printf "%-15s:%i\n",A,BASH[A]}}' /etc/passwd |
Example, case
Syntax: switch (expression) {case VALUE or/REGEXP/: statement1, statement2 ,... default: statement1 ,...} begin, break, and continue are often used in loop or case statements to terminate processing of the text of the current row before begin and next, and then process the next line. For example, the following command displays users with an odd ID:
Instance:
1 |
# awk -F: '{if($3%2==0) next;print $1,$3}' /etc/passwd |
9. Use arrays in awk
(1) array [index-expression] index-expression can use any string. Note that if a data group element does not exist in advance, awk automatically creates this element and initializes it as an empty string. Therefore, to determine whether an element exists in a data group, you must use the index in array method. To traverse every element in the array, use the following special structure:
Syntax
For (var in array) {statement1 ,...}
Var is used to reference the array subscript instead of the element value. Example:
1 |
#netstat -ant | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}' |
(2) to delete an array index from a relational array, run the delete command. Use the built-in function split (string, array [, fieldsep [, seps]) in the format of delete array [index] 10 and awk: the string is separated by fieldsep, and the results are saved to an array named after array. The subscript of the array is a sequence starting from 1. Example:
123 |
# netstat -ant | awk '/:80\>/{split($5,clients,":");IP[clients[1]]++}END{for(i in IP){print IP[i],i}}' | sort -rn | head -50 # netstat -tan | awk '/:80\>/{split($5,clients,":");ip[clients[4]]++}END{for(a in ip) print ip[a],a}' | sort -rn | head -50 # df -lh | awk '!/^File/{split($5,percent,"%");if(percent[1]>=20){print $1}}' |
Length ([string])
Function: returns the number of characters in a string. substr (string, start [, length])
Function: Take the substrings in the string, starting from start and taking the length; start starts counting from 1;
1 |
# tail -10 /etc/passwd |awk -F: '{print substr($1,1,6)}' |
System (command)
Function: run the system command and return the result to the awk command.
1 |
# awk 'BEGIN{print system("ls -l")}' |
Systime ()
Function: returns the full number of seconds from January 1, January 1, 1970 to the current time (excluding the leap year ).
Tolower (s) function: converts all letters in s into lowercase toupper (s) function: converts all letters in s into uppercase letters.
1 |
# awk 'BEGIN{s="acl";print toupper(s)}' |
===================================================== ========================================================== ====
PS:
Awk simple application ends here!
Introduction and use of AWK
AWK introduction and Examples
Shell script-AWK text editor syntax
Learning and using AWK in Regular Expressions
AWK diagram of Text Data Processing
How to Use the awk command in Linux
Text Analysis Tool-awk
This article permanently updates the link address: