Linux Text processing the Three Musketeers of awk

Source: Internet
Author: User
Tags mathematical functions natural logarithm rand square root


1. Introduction
awk is a command under Linux, and his output of other commands is very powerful for processing files.
Compared to grep's lookup, the SED editor, awk is especially strong when it comes to analyzing data and generating reports.
Big. In simple terms, awk reads the file line-by-row, slicing each row with a space as the default delimiter, cutting
Part of the analysis process. In fact, he is more like a programming language, he can customize variables, there is a bar
A statement, a loop, an array, a regular, a function, and so on. The way he reads the output, or the text, is a line
, a line of reading, based on the conditions you give to find, and in the row to find out the operation, feel his
Design ideas, really simple, but combined with the actual situation, the specific operation is not so simple.
There are three kinds of situations in awk, Awk,gawk,nawk, and awk, which is usually called gawk.

How the 2.awk works
awk ' begin{commands} pattern{commands} end{commands} '
The first step: Execute the statement in begin{commands} statement block;
Step two: Read a line from the file or standard input (stdin) and execute pattern{commands}
Statement block, which scans the file row by line, repeating the process from the first line to the last line until the text
All the pieces are read.
Step three: Execute the end{commands} statement block when reading to the end of the input stream. Begin statement
Block is executed before awk begins to read rows from the input stream, which is an optional block of statements,
For example, variable initialization, the table of the printed output table first-class statements can usually be written in the Begin language
In a sentence block. The end statement block is executed after awk reads all rows from the input stream
, such as printing all rows of analysis results such information summaries are done in the end statement block
, it is also an optional block of statements. The General command in the pattern statement block is the most important part
, it is also optional. If the pattern statement block is not provided, the default execution is {print}
, that is, every row read to is printed, and every line that awk reads executes the block of statements.

The 3.awk mode of operation can be any one of the following:/Regular expression/: An extension set using a wildcard character.    Relational expressions: Operations using operators, which can be comparison tests of strings or numbers.    Pattern-matching expressions: with operator ~ (match) and ~ ~ (mismatch). BEGIN statement block, pattern statement block, End statement block.

4. Example
Suppose the output of Last-n 5 is as follows
[[email protected] ~]# last-n 5 <== Remove the first five elements only
Root pts/1 192.168.1.100 Tue Feb 11:21 still logged inroot pts/1 192.168.1.100 Tue Feb 10 00:46-02:28 (01:41)
Root PTS/1 192.168.1.100 Mon Feb 9 11:41-18:30 (06:48)
Dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41-11:41 (00:00)
Root tty1 Fri Sep 5 14:09-14:10 (00:01)
If you only show the 5 most recently logged-in accounts
#last-N 5 | awk ' {print '} '
Root
Root
Root
Dmtsai
Rootawk

Work FlowThis is true: reads a record with a ' \ n ' line break, and then divides the record by the specified domain
Divide a field, fill a field, and $ A represents all fields, representing the first field, $n represents the nth field.
The default Domain delimiter is the "blank key" or "[tab] key", so the login user, $ $ represents the login User IP,
And so on

If you just show/etc/passwd's account
#cat/etc/passwd |awk-f ': ' {print $} ' root
Daemon
Bin
Sys
This is an example of awk+action, where each line executes action{print $.
-f Specifies the domain delimiter as ': '.

If you only display the/ETC/PASSWD account and the shell of the account, and the account and the shell are split by tab
#cat/etc/passwd |awk-f ': ' {print $ \ t ' $7} ' Root/bin/bash
Daemon/bin/shbin/bin/shsys/bin/sh

If you only show the/etc/passwd account and the shell of the account, and the account and the shell are separated by commas
Add the column name Name,shell to all rows and add "Blue,/bin/nosh" to the last line.
CAT/ETC/PASSWD |awk-f ': ' BEGIN {print ' Name,shell '} {print $ ', ' $7}
END {print "Blue,/bin/nosh"} ' Name,shell
Root,/bin/bash
Daemon,/bin/shbin,/bin/shsys,/bin/sh ....
Blue,/bin/nosh

Search all rows with the root keyword/etc/passwd
#awk-F: '/root/'/etc/passwdroot:x:0:0:root:/root:/bin/bash
This is an example of the use of pattern, which matches the line of pattern (this is root) before the action is executed (no
Specifies the action, which outputs the contents of each row by default.
Search support for the regular, for example, root start: awk-f: '/^root/'/etc/passwd

Search all lines that have the root keyword/etc/passwd and display the corresponding shell
# awk-f: '/root/{print $7} '/etc/passwd/bin/bash
Action{print $7} is specified here.

5, operator

Operator description
= + = = *=/=%= ^= **= Assignment
?: C-Conditional expression
|| Logical OR
&& Logic and
~ ~! Match regular expressions and mismatched regular expressions
< <= > >= = = = Relational operator
Space connection
+-Add, subtract
*/& Multiply, divide and seek surplus
+ - ! Unary Plus, minus and logical non-
^ * * exponentiation
+ +--increase or decrease, as prefix or suffix
$ field Reference
In array members

Example: awk ' begin{a= ' B ';p rint a++,++a;} ' 0 2

the regular of the 6,awk
Match Descriptor Description
\y matches an empty string at the beginning or end of a word
\b matches an empty string within a word
\< matches an empty string at the beginning of a word, anchoring begins
\> matches an empty string at the end of a word, anchoring the end
\w matches a non-alphanumeric word
\w matches a word that consists of an alphanumeric number
\ ' matches an empty string at the end of the string
\ ' matches an empty string at the beginning of the string

7, String function
Function Name Description
The sub matches the regular expression of the largest, leftmost substring in the record, replacing the strings with replacement strings. If you do not specify a target string, the entire record is used by default. Substitution only occurs at the time of the first match
Gsub a match throughout the document
Index returns the position where the substring was first matched, offset starting at position 1
SUBSTR returns a substring starting at position 1, returning the entire string if the specified length exceeds the actual length
Split splits the string into an array by the given delimiter. If the delimiter is not provided, it is split by the current FS value
Length returns the number of characters in a record
Match returns the index of the position of the expression in the string, and returns 0 if the specified regular expression is not found. The match function sets the built-in variable Rstart to the beginning of a substring of a string, rlength the number of characters to the end of the substring. SUBSTR can be beneficial for these variables to intercept the string
ToUpper and ToLower can be used for conversions between string sizes, which are only valid in Gawk

8, Mathematical functions
Function name return value
atan2 (x, y) y,x in the range of cotangent
cos (x) cosine function
EXP (x) exponentiation
int (x) rounding
Log (x) natural logarithm
RAND () random number
Sin (x) sine
sqrt (x) square root
Srand (x) x is the seed of the rand () function
int (x) rounding, process not rounded
Rand () produces a random number that is greater than or equal to 0 and less than 1


This article from the "Technology life, Simple not simple" blog, please be sure to keep this source http://willis.blog.51cto.com/11907152/1845918

Linux Text processing the Three Musketeers of awk

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.