Linux Text Formatting tool awk

Source: Internet
Author: User
Tags print format

Linux Text Formatting tool awk

I. Text processing tools

Grep, sed, and awk are both text processing tools. Although both are text processing tools, they both have their own advantages and disadvantages. A text processing command cannot be completely replaced by another one, otherwise, there will be no three text processing commands. However, sed and awk are more powerful in comparison and have been introduced in an independent language.

Grep: text filter. If you only filter text, you can use grep, which is much more efficient than others. sed: Stream EDitor, Stream EDitor, which is only used for processing mode space by default, if you do not process the original data, you can use sed if the data you process is for rows;

Awk: Report Generator. formatted and displayed later. If you need to generate reports and other information for the processed data, or the data you process is processed by column, it is best to use awk.

Ii. functions that awk can accomplish

  • View text files as text databases composed of records and fields

  • Using variables to operate databases

  • Use Arithmetic and string Operators

  • Use common program design structures, such as loops and conditions

  • Help formatting report

  • Define functions

  • Run unix commands from scripts

  • Processing unix Command results

  • More clever processing of command line parameters

  • It is easier to process multiple input streams

Iii. Syntax format

12 # awk [options] 'script' file1 file2, ... # awk [options] 'PATTERN { action }' file1 file2, ..

1. Options

-F fs or -- field-separator fs:

Specifies the delimiter used to break the input file. fs is a string or a regular expression, such as-F:

1234 #awk -F: '/root/{print $1,$NF}' /etc/passwd #awk -F: '/root/{print $1$NF}' /etc/passwd #awk -F: '/root/{print $1 $NF}' /etc/passwd #awk -F: '/root/{print $1"#"$NF}' /etc/passwd

-The variables defined by Option v exist before the script runs. They can be called in the BEGIN process of the script;

2. awk output: print and printf

(1) print format:

Print item1, item2 ,...

Key points:

(1) separate projects with commas (,), and separate output items with blank characters;

② The output item can be a string or value, the field of the current record (such as $1), a variable or an awk expression. The value is converted to a string before being output;

③ The item after the print command can be omitted. In this case, its function is equivalent to print $0. Therefore, if you want to output blank lines, use print "";

Note: In AWK, $ indicates the field. You do not need to add $ to the user variable. This is the difference between AWK and shell or Perl! In shell, $ is not added when the variable is defined, and $ is used when the variable is referenced again. In Perl, $ is required for definition and reference. ($ in Perl indicates scalar, also, the "@" and "%" symbols indicate arrays and Hash variables ).

Instance

(2) Use format of printf

Printf format, item1, item2 ,...

The format indicators start with %, followed by a character,

% C

Convert the number to ASCII, for example, printf "% c". The 67 result is C.

% D, % I

Print a decimal integer, for example, printf "% dn". The value 6.745 is 6.

% E, % E

Convert the number to a scientific (exponential) symbol, for example, printf "% 4.3en". The 6745 result is 6.745e + 03.

% F

Print a number in floating-point notation, for example, printf "% 4.3f \ n". The 6745 result is 6745.0000000.

% S

Print the string, for example, printf "% 10s \ n". The result of 6745 is a ten-space plus 6745.

Changeable format:

N $

Location indicator to adjust the output position of a string. Printf "% s \ n", "I", "lOVE", "YOU" output: I LOVE YOU. Let's adjust the position, printf "% 3 $ s % 2 $ s % 1 $ s \ n", "YOU", "LOVE", "I", the output is: I LOVE YOU

 

Modifier

N: display width;

-: Left alignment; +: right alignment (it can also show positive and negative values );

3. modes and operations

(1) The mode can be any of the following:

  • /Regular expression/: an extension set that uses wildcards.

  • Relational Expression: You can use the Relational operators in the following operator table to perform operations. It can be a string or number comparison, for example, $2> % 1. Select a row whose second field is longer than the first field.

  • Expression for pattern matching: Operator ~ (Matching) and ~! (Mismatch ).

  • Mode: Specifies the range of rows. This syntax cannot include the BEGIN and END modes.

  • BEGIN: Specifies the action that occurs before the first input record is processed. You can set global variables here.

  • END: The action that occurs after the last input record is read.

(2) An operation is composed of one or more commands, functions, and expressions, separated by line breaks or semicolons, and enclosed in braces. There are four parts:

  • Assign values to variables or Arrays

  • Output command

  • Built-in functions

  • Control Flow command

4. Variables

(1) recording variables with built-in awk Variables

 

FS: field separator When reading a file, the field separator is used
RS: Record separator Linefeed used to input text information
OFS: Output Filed Separator Output field separator (the default value is a space)
ORS: Output Row Separator Output record separator (default is a line break)

Note:

From $1, $2 to $ NF, the entire row is marked with $0. If $0 is assigned a new value, all $1, $2... and NF will be recalculated. Similarly, if $ I is changed, $0 will be recalculated using OFS.

(2) Data Variables of awk built-in Variables

 

NR: The number of input records Number of records processed by the awk command; If there are multiple files, this number will count the rows of the processed files in a unified manner.
NF: Number of Field Number of fields in the current record
FNR Number of relative records of the current file
ARGV Array, save the command line string, such as awk '{print $0}' a.txt B .txtin this command, argv1_01_save awk, argv1_11_save a.txt
ARGC Number of parameters of the awk command
FILENAME Name of the file processed by the awk command
ENVIRON An array associated with the current shell environment variables and their values

NR usage

NF usage (separated by spaces by default)

FNR usage

ARGV usage

II.

FILENAME usage

ENVIRON usage

Note:

ARGV array consists of ARGV [0]... ARGV [ARGC-1], the first element is 0 rather than 1, which is different from the like array in AWK

The ENVIROND array is very useful in shell and AWK interaction. You can use ENVIRON ["PARA_NAME"] to obtain the value of the environment variable $ PARA_NAME, where the quotation mark "" is indispensable!

5. Standard output and redirection

(1) Output redirection

Print items> output-fileprint items | command (2), special file descriptor:/dev/stdin: Standard Input/dev/sdtout: standard output/dev/stderr: error output/dev/fd/N: a specific file descriptor. For example,/dev/stdin is equivalent to/dev/fd/0;

Instance:

12 # awk -F" " '{printf "%-15s %i\n",$1,$3 > "/dev/stderr" }' /etc/issue # awk -F" " '{printf "%-15s %i\n",$1,$3 > "/dev/null" }' /etc/issue

6. awk operators:

(1) Arithmetic Operator:-x: negative value + x: convert to a numerical value; x ^ y: x ** y: x * y: Multiplication x/y: Division x + y: x-y:

X % y:

Instance

(2) string OPERATOR: there is only one operator, which is used for string connection without writing. (3) value assignment operator: = + =-= * =/= % = ^ = ** = ++ --

It should be noted that, if a mode is =,/=/may cause a syntax error, it should be replaced by/[=;

(4) Any non-0 or non-null string in the Boolean awk is True, and vice versa is false. Comparison and comparison operators: x <y True if x is less than y. x <= y True if x is less than or equal to y. x> y True if x is greater than y. x> = y True if x is greater than or equal to y. x = y True if x is equal to y. x! = Y True if x is not equal to y. x ~ Y True if the string x matches the regexp denoted by y. x !~ Y True if the string x does not match the regexp denoted by y. subscript in array True if the array has an element with the subscript. logical Relationship Between Expressions :&&

|

Instance:

Condition, condition expression: selector? If-true-exp: if-false-expif selector; then if-true-expelse if-false-expfi

Instance

Functions:

Function_name (para1, para2)

7. control statements

(1), if-else

Syntax:

If (condition) {then-body} else {[else-body]}

Instance:

1234 #awk '{if ($3==0) {print $1, "Adminitrator";} else { print $1,"Common User"}}' /etc/passwd #awk -F: '{if ($1=="root") print $1, "Admin"; else print $1, "Common User"}' /etc/passwd #awk -F: '{if ($1=="root") printf "%-15s: %s\n", $1,"Admin"; else printf "%-15s: %s\n", $1, "Common User"}' /etc/passwd #awk -F: -v sum=0 '{if ($3>=500) sum++}END{print sum}' /etc/passwd

(2) while

Syntax:

While (condition) {statement1; statment2 ;...}

Instance:

1234 #awk -F: '{i=1;while (i<=3) {print $i;i++}}' /etc/passwd #awk -F: '{i=1;while (i<=NF) { if (length($i)>=4) {print $i}; i++ }}' /etc/passwd #awk '{i=1;while (i<=NF) {if ($i>=20000) print $i; i++}}' random.txt The contents of the random random.txt file are a bunch of random numbers.

(3) do-while executes the loop body at least once, regardless of whether the conditions are met or not.

Syntax:

Do {statement1, statement2,...} while (condition)

Instance:

12345678 #awk 'BEGIN{ sum=0; i=0; do{ sum+=i; i++; }while(i<=100) print sum;}'

(4),

Syntax: for (variable assignment; condition; iteration process) {statement1, statement2 ,...}

Instance:

1 #awk -F: '{for(i=1;i<=3;i++) { if (length($i)>=8) {print $i}}}' /etc/passwd

The for loop can also be used to traverse array elements:

Syntax:

For (I in array) {statement1, statement2 ,...}

Instance:

1 #awk -F: '$NF!~/^$/{BASH[$NF]++}END{for(A in BASH){printf "%-15s:%i\n",A,BASH[A]}}' /etc/passwd

Example, case

Syntax: switch (expression) {case VALUE or/REGEXP/: statement1, statement2 ,... default: statement1 ,...} begin, break, and continue are often used in loop or case statements to terminate processing of the text of the current row before begin and next, and then process the next line. For example, the following command displays users with an odd ID:

Instance:

1 # awk -F: '{if($3%2==0) next;print $1,$3}' /etc/passwd

9. Use arrays in awk

(1) array [index-expression] index-expression can use any string. Note that if a data group element does not exist in advance, awk automatically creates this element and initializes it as an empty string. Therefore, to determine whether an element exists in a data group, you must use the index in array method. To traverse every element in the array, use the following special structure:

Syntax

For (var in array) {statement1 ,...}

Var is used to reference the array subscript instead of the element value. Example:
1 #netstat -ant | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
(2) to delete an array index from a relational array, run the delete command. Use the built-in function split (string, array [, fieldsep [, seps]) in the format of delete array [index] 10 and awk: the string is separated by fieldsep, and the results are saved to an array named after array. The subscript of the array is a sequence starting from 1. Example:
123 # netstat -ant | awk '/:80\>/{split($5,clients,":");IP[clients[1]]++}END{for(i in IP){print IP[i],i}}' | sort -rn | head -50 # netstat -tan | awk '/:80\>/{split($5,clients,":");ip[clients[4]]++}END{for(a in ip) print ip[a],a}' | sort -rn | head -50 # df -lh | awk '!/^File/{split($5,percent,"%");if(percent[1]>=20){print $1}}'

Length ([string])

Function: returns the number of characters in a string. substr (string, start [, length])

Function: Take the substrings in the string, starting from start and taking the length; start starts counting from 1;

1 # tail -10 /etc/passwd |awk -F: '{print substr($1,1,6)}'

System (command)

Function: run the system command and return the result to the awk command.

1 # awk 'BEGIN{print system("ls -l")}'

Systime ()

Function: returns the full number of seconds from January 1, January 1, 1970 to the current time (excluding the leap year ).

Tolower (s) function: converts all letters in s into lowercase toupper (s) function: converts all letters in s into uppercase letters.
1 # awk 'BEGIN{s="acl";print toupper(s)}'

 

===================================================== ========================================================== ====

PS:

Awk simple application ends here!

Introduction and use of AWK

AWK introduction and Examples

Shell script-AWK text editor syntax

Learning and using AWK in Regular Expressions

AWK diagram of Text Data Processing

How to Use the awk command in Linux

Text Analysis Tool-awk

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.