Introduction to awk commands and awk commands

Source: Internet
Author: User

Introduction to awk commands and awk commands

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.

Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.

Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.

Style Processing

cat /etc/passwd|head -5#resultroot:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologinadm:x:3:4:adm:/var/adm:/sbin/nologinlp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

After processing:

cat /etc/passwd|head -10|awk  -F ':'  'BEGIN {print "name,shell"}  {print $1","$7} END {print "blue,/bin/nosh"}'#resultname,shellroot,/bin/bashbin,/sbin/nologindaemon,/sbin/nologinadm,/sbin/nologinlp,/sbin/nologinblue,/bin/nosh

The awk workflow is as follows: first execute BEGING, then read the file, read a record with/n line breaks, then divide the record into Domains Based on the specified domain separator, and fill in the domain, $0 indicates all domains, $1 indicates the first domain, $ n indicates the nth domain, and then starts the action corresponding to the execution mode. Then read the second record until all the records are read and the END operation is executed.

Template Matching

awk '{pattern + action}' {filenames}

Pattern indicates the content that awk looks for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.

For example, search for all rows with the root keyword in/etc/passwd.

awk -F: '/root/' /etc/passwd#resultroot:x:0:0:root:/root:/bin/bashoperator:x:11:0:operator:/root:/sbin/nologin

This is an example of pattern. Only the row matching pattern (root here) can execute action (no action is specified, and the content of each row is output by default ). Regular Expressions are supported in search, for example, awk-F: '/^ root/'/etc/passwd.
Search for all rows with the root keyword in/etc/passwd and display the corresponding shell

awk -F: '/root/{print $7}' /etc/passwd#result/bin/bash/sbin/nologin

Awk built-in Variables

Awk has many built-in variables used to set environment information. These variables can be changed. The following lists the most common variables.

ARGC command line parameter count ARGV command line parameter arrangement ENVIRON support system environment variables in the queue using FILENAME awk browsed file names FNR browsed file records NR read records NF browsed records domain number of FS sets the input domain separator, equivalent to the command line-F option OFS output domain delimiter RS control record separator ORS output record Separator

Statistics/etc/passwd: file name, row number of each row, column number of each row, corresponding to the complete row content:
awk  -F ':'  '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd

Use printf instead of print to make the code more concise and easy to read

awk  -F ':'  '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd

Both print and printf are provided in awk. The print function can be a variable, a value, or a string. The string must be referenced in double quotation marks and the parameters must be separated by commas. If there are no commas (,), the parameters are connected together and cannot be distinguished. The printf function is similar to the printf function in C language. It can format strings. When the output is complex, printf is easier to use and the code is easier to understand.

Awk Programming

Variables and assignments

In addition to the built-in variables of awk, awk can also customize variables. The following table lists the number of accounts in/etc/passwd.

awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd root:x:0:0:root:/root:/bin/bash ... user count is 40

The count is not initialized here. Although the default value is 0, it is recommended to initialize it as 0:

awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd [start]user count is 0 root:x:0:0:root:/root:/bin/bash ... [end]user count is 40

Count the number of bytes occupied by files in a folder

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}' [end]size is 439289

If the unit is M:

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}' [end]size is 0.418939 M

Condition Statement

The condition statements in the awk are used for reference in the C language. See the following declaration method:

if (expression) {    statement;    statement;    ... ...}if (expression) {    statement;} else {    statement2;}if (expression) {    statement1;} else if (expression1) {    statement2;} else {    statement3;}

Loop statement

The loop statements in awk are also used in C language and support while, do/while, for, break, and continue. These keywords have the same semantics as those in C language.

Array

Because the subscript of an array in awk can be numbers and letters, the subscript of an array is usually called a key ). Both values and keywords are stored in an internal table that uses hash for key/value applications. Because hash is not stored in sequence, you will find that the array content is not displayed in the expected order. Arrays and variables are automatically created when they are used, and awk automatically determines whether they are stored as numbers or strings. In general, arrays in awk are used to collect information from records. They can be used to calculate the sum, count words, and track the number of times the template is matched.

Show/etc/passwd account

awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd0 root1 bin2 daemon3 adm4 lp...

Calculate the difference set of two files

awk '{if(NR==FNR){a[$1]=0}if(NR!=FNR && !($1 in a))print}'  file1 file2

Awk programming content is very much, here only lists simple common usage, more please refer to http://www.gnu.org/software/gawk/manual/gawk.html


In shell, what is $0 of the awk command?

Awk is used to process the statement in units, and the statement in "{}" is executed for each line in 1.txt.
Two terminologies in awk:
Record (each row of text by default)
Field (by default, it is a string separated by spaces or tabs in each record)
$0 indicates a record, and $1 indicates the first field in the record.
Generally, print $0 is used to print the entire line (a backslash is not required before $0). print $1 indicates that only the first field of each line is printed.

How to Use the linux awk command?

Awk: used to split a row into several "fields" for processing. Suitable for processing small data.
Running Mode: awk 'condition type 1 {Action 1} Condition Type 2 {Action 2}... 'filename

# Last | awk '{print $1 "\ t" $3}' <= view registrant's data. Only the logon name and IP address are displayed and separated by [tab ].

Awk built-in Variables
Meanings of variable names

Total number of fields owned by each line of NF ($0)

NR the current awk processes the "nth row" Data

FS current delimiter, default Space key

Logical operators of awk
Meaning of the computing unit
> Greater
<Less
> = Greater than or equal
<= Less than or equal
= Equal
! = Not equal

Example:
Cat/etc/passwd | awk '{FS = ": "} $3 <10 {print $1" \ t "$3} '<= file/etc/passwd is separated, view the data smaller than 10 in the third column, and only the accounts and third columns are displayed.

The above is my summary of awk and I hope it will help you. I wrote it, not just copy it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.