International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Linux

05-linux Text Processing-awk

Last Update:2018-08-04 Source: Internet

Author: User

Tags arithmetic arithmetic operators locale logical operators numeric value recode uppercase character

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Awk is a programming language

For text processing and report generation
语法格式awk   [option] ‘pattern{action}‘ fileawk [参数]    ‘条件{动作}‘       文件

parameter options for the awk command

-f Specifies the field delimiter

-V define or modify a variable inside awk

Common features of the awk command

specify separators to display a few columns

  # Directly remove this column that shows the URL of the log file Awk-f "get| HTTP "' {print $} ' access.log

take out what you want with regular expressions

  # Analyze logs in the production environment to find out who is cracking the user password awk ' $6~/failed/{print $11} '/var/log/secure

shows content within a range

  # Show Files of 20 to 30 lines awk ' nr= =20,nr==30 ' filename

Statistical calculation by awk

  # Calculate sum awk ' {sum+=$0}end{print sum} ' test . txt

awk array calculation and de-weight

  # Log statistics and Count awk ' {array[$1]++}end{for (key in array) print Key,array[key]} ' access.log

Specify a delimiter to display a few columns

# 输出passwd的每一行awk ‘{print $0}‘ /etc/passwd

# 针对每行以‘:’为分割符，输出第一项awk -F":" ‘{ print $1 }‘ /etc/passwd

# 针对每行以‘:’为分割符，输出第一项和第二项awk -F":" ‘{ print $1 $3 }‘ /etc/passwd

# 针对每行以‘:’为分割符，输出第一项和第二项，并格式化输出awk -F":" ‘{ print $1 " " $3 }‘ /etc/passwdawk -F":" ‘{ print "username: " $1 "\t\tuid:" $3 }‘ /etc/passwd

Begin and End modules

The initialization code (the code block within begin) is executed before awk begins processing the text in the input file.
Awk executes the end block after all the lines in the input file have been processed.
The end block is used to perform final calculations or to print summary information that should appear at the end of the output stream.

Assignment operators

[[email protected] tmp]# awk ‘BEGIN{a=5;a+=5;print a}‘10

logical operators

[[email protected] tmp]# awk ‘BEGIN{a=1;b=2;print (a>2&&b>1,a=1||b>1)}‘0 1

The regular operator

[[email protected] tmp]# awk ‘BEGIN{a="100testaaa";if(a~/100/){print "ok"}}‘ok[[email protected] tmp]# echo|awk ‘BEGIN{a="100testaaa"}a~/100/{print "ok"}‘ok

Relational operators

Its > < can be used as a string comparison or as a numeric comparison.
The key is that if the operand is a string, it will be converted to a string comparison.
Two numbers are converted to numeric comparisons.
string comparison: in ASCII order comparison.

[[email protected] tmp]# awk ‘BEGIN{a=11;if(a>=9){print "ok"}}‘ok[[email protected] tmp]# awk ‘BEGIN{a;if(a>=b){print "ok"}}‘ok

Arithmetic operators

All operations are performed as arithmetic operators, the operands are automatically converted to numeric values,
All non-numeric values are changed to 0.

[[email protected] tmp]# awk ‘BEGIN{a="b";print a++,++a}‘0 2[[email protected] tmp]# awk ‘BEGIN{a="20b4";print a++,++a}‘20 22

Other operators

?: three mesh operator

[[email protected] tmp]# awk ‘BEGIN{a="b";print a=="b"?"ok":"err"}‘ok[[email protected] tmp]# awk ‘BEGIN{a="b";print a=="c"?"ok":"err"}‘err

awk built-in variables

Variable name	Properties
$	Current record
$1~ $n	Nth field of the current record
Fs	Enter field delimiter, default is space
Rs	Enter record delimiter, default to line break
Nf	The number of fields in the current record, that is, how many columns
Nr	The number of records that have been read is the line number, starting from 1
OFS	Output field separator, default is space
ORS	Output record delimiter, default to line break

Field Delimiter FS

Fs= "\t+" One or more tab-delimited

[[email protected] tmp]# cat tab.txtww        CC      IDD[[email protected] tmp]# awk ‘BEGIN{FS="\t+"}{print $1,$2,$3}‘ tab.txtww CC IDD

Fs= "[[: space:]+]" one or more blank spaces, the default

[[email protected] tmp]# cat space.txtwe are           studing awk now![[email protected] tmp]# awk -F [[:space:]+] ‘{print $1,$2}‘ space.txtwe are

Fs= "[" ":]+" with one or more spaces or: delimited

[[email protected] tmp]# cat hello.txtroot:x:0:0:root: /root:/bin/bash[[email protected] tmp]# awk -F [" ":]+ ‘{print $1,$2,$3}‘ hello.txtroot x 0

Number of fields NF

":" is a delimiter, the number of separated fields NF is 8 output the line

[[email protected] tmp]# cat hello.txtroot:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologin:888[[email protected] tmp]# awk -F ":" ‘NF==8{print $0}‘ hello.txtbin:x:1:1:bin:/bin:/sbin/nologin:888

Number of records NR

Take the second row with one or more spaces or: delimited results

[[email protected] tmp]# ifconfig eth0| awk -F [" ":]+ ‘NR==2{print $4}‘192.168.10.10

RS Record delimiter variable

Set FS to "\ n" to tell awk that each field occupies one row.
By setting RS to "", tell awk that each address record is delimited by a blank line.

[[email protected] tmp]# cat recode.txtJimmy the Weasel100 Pleasant DriveSan Francisco, CA 12345#此处是空白行Big Tony200 Incognito Ave.Suburbia, WA 67890[[email protected] tmp]# cat awk.txt#!/bin/awkBEGIN {    FS="\n"    RS=""}{    print $1 ", " $2 ", " $3}[[email protected] tmp]# awk -f awk.txt recode.txtJimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345Big Tony, 200 Incognito Ave., Suburbia, WA 67890

OFS output Field delimiter

[[email protected] tmp]# cat hello.txtroot:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologin:888[[email protected] tmp]# awk ‘BEGIN{FS=":"}{print $1","$2","$3}‘ hello.txtroot,x,0bin,x,1[[email protected] tmp]# awk ‘BEGIN{FS=":";OFS="#"}{print $1,$2,$3}‘ hello.txtroot#x#0bin#x#1

ORS Output Record delimiter

[[email protected] tmp]# cat recode.txtJimmy the Weasel100 Pleasant DriveSan Francisco, CA 12345Big Tony200 Incognito Ave.Suburbia, WA 67890[[email protected] tmp]# cat awk.txt#!/bin/awkBEGIN {    FS="\n"    RS=""    ORS="\n\n"} {    print $1 ", " $2 ", "$3}[[email protected] tmp]# awk -f awk.txt recode.txtJimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345Big Tony, 200 Incognito Ave., Suburbia, WA 67890

Rule expressions

awk '/reg/{action} ' file
/reg/is a regular expression,
You can feed the records that meet the criteria into: action for processing

[[email protected] tmp]# awk ‘/root/{print $0}‘ passwdroot:x:0:0:root: /root:/bin/bashoperator:x:11:0:operator:/root:/sbin/nologin[[email protected] tmp]# awk -F : ‘$5~/root/{print $0}‘ passwdroot:x:0:0:root: /root:/bin/bash[[email protected] tmp]# ifconfig eth0|awk ‘BEGIN{FS="[[:space:]:]+"} NR==2{print $4}‘ #取出 ip192.168.10.10[[email protected] tmp]# ifconfig eth0|awk ‘BEGIN{FS="([[:space:]]|:)+"} NR==2{print $4}‘ #取出 ip192.168.10.10

Boolean expression

The awk ' Boolean Expression {action} ' file only executes a block of code if the preceding Boolean expression evaluates to True.

[[email protected] tmp]# awk -F: ‘$1=="root"{print $0}‘ passwdroot:x:0:0:root: /root:/bin/bash[[email protected] tmp]# awk -F: ‘($1=="root")&&($5=="root"){print $0}‘ passwdroot:x:0:0:root: /root:/bin/bash

Conditional statements

{    if ( $1== "foo" ) {        if ( $2== "foo" ) {            print "uno"        } else {            print "one"        }    } elseif ($1== "bar" ) {        print "two"    } else {        print "three"    }}

Loop structure

do...while循环{    count=1    do {        print "I get printed at least once no matter what"    } while ( count !=1 )}

for 循环{    for ( x=1;x<=4;x++ ) {        print "iteration", x    }}

Break and Continue

{    x=1    while (1) {        if ( x==4 ) {            x++            continue        }        print "iteration", x        if ( x>20 ) {            break        }    x++}

Array

{    cities[1]=”beijing”    cities[2]=”shanghai”    cities[“three”]=”guangzhou”    for( c in cities) {        print cities[c]    }    print cities[1]    print cities[“1”]    print cities[“three”]}

Example

查看服务器连接状态并汇总netstat -an|awk ‘/^tcp/{++s[$NF]}END{for(a in s)print a,s[a]}‘

统计 web 日志访问流量要求输出访问次数请求页面或图片每个请求的总大小总访问流量的大小汇总awk ‘{a[$7]+=$10;++b[$7];total+=$10}END{for(x in a)print b[x],x,a[x]|"sort -rn -k1";print"total size is :"total}‘ access_loga[$7]+=$10表示以第7列为下标的数组（$10列为$7列的大小）把他们大小累加得到$7每次访问的大小后面的for循环有个取巧的地方，a和b数组的下标相同所以一条for语句足矣

awk Common functions

function Description

Gsub (Ere, REPL, [in]) In addition to the regular expression all the specific values are substituted for this, and it executes exactly like the sub function.

Sub (Ere, REPL, [in]) Replaces the first concrete value of the extended regular expression specified by the Ere parameter with the string specified by the REPL parameter, in the string designated by the in parameter. The Sub function returns the number of replacements. The & (and symbol) that appears in the string specified by the REPL parameter is replaced with a string specified by the in parameter that matches the specified extended regular expression of the Ere parameter. If the in parameter is not specified, the default value is the entire record (the $ record variable).

Index (STRING1, String2) In the string specified by the String1 parameter, where there is a parameter specified by String2, returns the position, numbering starting at 1. Returns 0 (0) If the String2 parameter is not present in the STRING1 parameter.

length [(String)] Returns the length (in character form) of the string specified by the string parameter. If the string argument is not given, the length of the entire record is returned (the $ record variable).

Blength [(String)] Returns the length, in bytes, of the string specified by the string parameter. If the string argument is not given, the length of the entire record is returned (the $ record variable).

substr (String, M, [N]) Returns a substring with the number of characters specified by the N parameter. The substring is obtained from the string specified by the string parameter, and its character begins at the position specified by the M parameter. The M parameter is specified as the first character in the string parameter as number 1. If the N parameter is not specified, the length of the substring will be the length specified by the M parameter to the end of the String parameter.

Match (String, Ere) Returns the position (in character form) in the string specified by the strings parameter (the extension regular expression specified by the Ere parameter), numbering starting at 1, or 0 (0) If the ere parameter does not appear. The Rstart special variable is set to the return value. The Rlength special variable is set to the length of the matched string, or if no match is found, set to 1 (minus one).

Split (String, A, [Ere]) Divides the parameter specified by the String parameter into the array element a[1], a[2], ..., a[n], and returns the value of the n variable. This separation can be done with an extended regular expression specified by the Ere parameter, or with the current field delimiter (FS special variable) (if the Ere parameter is not given). The elements in a array are created with string values unless the context indicates that a particular element should also have a numeric value.

ToLower (String) Returns the string specified by the string parameter, with each uppercase character in the string changed to lowercase. Uppercase and lowercase mappings are defined by the LC_CTYPE category of the current locale.

ToUpper (String) Returns the string specified by the string parameter, with each lowercase character in the string changed to uppercase. Uppercase and lowercase mappings are defined by the LC_CTYPE category of the current locale.

sprintf (Format, expr,expr, ...) Formats the expression specified by the expr parameter and returns the last generated string, based on the printf subroutine format string specified by the format parameter.

Replace
awk ‘BEGIN{info="this is a test2010test!";gsub(/[0-9]+/,"!",info);print info}‘ this is a test!test!在info中查找满足正则表达式，/[0-9]+/用”!”替换，并且替换后的值，赋值给 info未给info值，默认是$0
Find
awk ‘BEGIN{info="this is a test2010test!";print index(info,"test")?"ok":"no found";}‘ok #未找到，返回 0
Match Lookup
awk ‘BEGIN{info="this is a test2010test!";print match(info,/[0-9]+/)?"ok":"no found";}‘ok #如果查找到数字则匹配成功返回 ok，否则失败，返回未找到
Intercept
awk ‘BEGIN{info="this is a test2010test!";print substr(info,4,10);}‘s is a tes #从第 4 个 字符开始，截取 10 个长度字符串
Segmentation
awk ‘BEGIN{info="this is a test";split(info,tA," ");print length(tA);for(k in tA){print k,tA[k];}}‘ 44 test 1 this 2 is 3 a#分割info,动态创建数组tA,awk for …in 循环，是一个无序的循环。并不是从数组下标1…n 开始

function	Description
Gsub (Ere, REPL, [in])	In addition to the regular expression all the specific values are substituted for this, and it executes exactly like the sub function.
Sub (Ere, REPL, [in])	Replaces the first concrete value of the extended regular expression specified by the Ere parameter with the string specified by the REPL parameter, in the string designated by the in parameter. The Sub function returns the number of replacements. The & (and symbol) that appears in the string specified by the REPL parameter is replaced with a string specified by the in parameter that matches the specified extended regular expression of the Ere parameter. If the in parameter is not specified, the default value is the entire record (the $ record variable).
Index (STRING1, String2)	In the string specified by the String1 parameter, where there is a parameter specified by String2, returns the position, numbering starting at 1. Returns 0 (0) If the String2 parameter is not present in the STRING1 parameter.
length [(String)]	Returns the length (in character form) of the string specified by the string parameter. If the string argument is not given, the length of the entire record is returned (the $ record variable).
Blength [(String)]	Returns the length, in bytes, of the string specified by the string parameter. If the string argument is not given, the length of the entire record is returned (the $ record variable).
substr (String, M, [N])	Returns a substring with the number of characters specified by the N parameter. The substring is obtained from the string specified by the string parameter, and its character begins at the position specified by the M parameter. The M parameter is specified as the first character in the string parameter as number 1. If the N parameter is not specified, the length of the substring will be the length specified by the M parameter to the end of the String parameter.
Match (String, Ere)	Returns the position (in character form) in the string specified by the strings parameter (the extension regular expression specified by the Ere parameter), numbering starting at 1, or 0 (0) If the ere parameter does not appear. The Rstart special variable is set to the return value. The Rlength special variable is set to the length of the matched string, or if no match is found, set to 1 (minus one).
Split (String, A, [Ere])	Divides the parameter specified by the String parameter into the array element a[1], a[2], ..., a[n], and returns the value of the n variable. This separation can be done with an extended regular expression specified by the Ere parameter, or with the current field delimiter (FS special variable) (if the Ere parameter is not given). The elements in a array are created with string values unless the context indicates that a particular element should also have a numeric value.
ToLower (String)	Returns the string specified by the string parameter, with each uppercase character in the string changed to lowercase. Uppercase and lowercase mappings are defined by the LC_CTYPE category of the current locale.
ToUpper (String)	Returns the string specified by the string parameter, with each lowercase character in the string changed to uppercase. Uppercase and lowercase mappings are defined by the LC_CTYPE category of the current locale.
sprintf (Format, expr,expr, ...)	Formats the expression specified by the expr parameter and returns the last generated string, based on the printf subroutine format string specified by the format parameter.

05-linux Text Processing-awk

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

rapidminer text processing tutorial what text processing linux text linux graphical text editor linux text reader text editor in linux c text editor linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

05-linux Text Processing-awk

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support