05-linux Text Processing-awk

Source: Internet
Author: User
Tags arithmetic arithmetic operators locale logical operators numeric value recode uppercase character

  • Awk is a programming language
  • For text processing and report generation
语法格式awk   [option] ‘pattern{action}‘ fileawk [参数]    ‘条件{动作}‘       文件
parameter options for the awk command
  • -f Specifies the field delimiter
  • -V define or modify a variable inside awk
Common features of the awk command
  • specify separators to display a few columns
  # Directly remove this column that shows the URL of the log file Awk-f "get| HTTP "' {print $} ' access.log  
  • take out what you want with regular expressions
  # Analyze logs in the production environment to find out who is cracking the user password awk ' $6~/failed/{print $11} '/var/log/secure  
  • shows content within a range
  # Show Files of 20 to 30 lines awk ' nr= =20,nr==30 ' filename  
  • Statistical calculation by awk
  # Calculate sum awk ' {sum+=$0}end{print sum} ' test . txt  
  • awk array calculation and de-weight
  # Log statistics and Count awk ' {array[$1]++}end{for (key in array) print Key,array[key]} ' access.log  
Specify a delimiter to display a few columns
# 输出passwd的每一行awk ‘{print $0}‘ /etc/passwd
# 针对每行以‘:’为分割符,输出第一项awk -F":" ‘{ print $1 }‘ /etc/passwd
# 针对每行以‘:’为分割符,输出第一项和第二项awk -F":" ‘{ print $1 $3 }‘ /etc/passwd
# 针对每行以‘:’为分割符,输出第一项和第二项,并格式化输出awk -F":" ‘{ print $1 " " $3 }‘ /etc/passwdawk -F":" ‘{ print "username: " $1 "\t\tuid:" $3 }‘ /etc/passwd
Begin and End modules
    • The initialization code (the code block within begin) is executed before awk begins processing the text in the input file.
    • Awk executes the end block after all the lines in the input file have been processed.
    • The end block is used to perform final calculations or to print summary information that should appear at the end of the output stream.
Assignment operators
[[email protected] tmp]# awk ‘BEGIN{a=5;a+=5;print a}‘10
logical operators
[[email protected] tmp]# awk ‘BEGIN{a=1;b=2;print (a>2&&b>1,a=1||b>1)}‘0 1
The regular operator
[[email protected] tmp]# awk ‘BEGIN{a="100testaaa";if(a~/100/){print "ok"}}‘ok[[email protected] tmp]# echo|awk ‘BEGIN{a="100testaaa"}a~/100/{print "ok"}‘ok
Relational operators
    • Its > < can be used as a string comparison or as a numeric comparison.
    • The key is that if the operand is a string, it will be converted to a string comparison.
    • Two numbers are converted to numeric comparisons.
    • string comparison: in ASCII order comparison.
[[email protected] tmp]# awk ‘BEGIN{a=11;if(a>=9){print "ok"}}‘ok[[email protected] tmp]# awk ‘BEGIN{a;if(a>=b){print "ok"}}‘ok
Arithmetic operators
    • All operations are performed as arithmetic operators, the operands are automatically converted to numeric values,
    • All non-numeric values are changed to 0.
[[email protected] tmp]# awk ‘BEGIN{a="b";print a++,++a}‘0 2[[email protected] tmp]# awk ‘BEGIN{a="20b4";print a++,++a}‘20 22
Other operators
    • ?: three mesh operator
[[email protected] tmp]# awk ‘BEGIN{a="b";print a=="b"?"ok":"err"}‘ok[[email protected] tmp]# awk ‘BEGIN{a="b";print a=="c"?"ok":"err"}‘err
awk built-in variables
Variable name Properties
$ Current record
$1~ $n Nth field of the current record
Fs Enter field delimiter, default is space
Rs Enter record delimiter, default to line break
Nf The number of fields in the current record, that is, how many columns
Nr The number of records that have been read is the line number, starting from 1
OFS Output field separator, default is space
ORS Output record delimiter, default to line break
Field Delimiter FS
    • Fs= "\t+" One or more tab-delimited
[[email protected] tmp]# cat tab.txtww        CC      IDD[[email protected] tmp]# awk ‘BEGIN{FS="\t+"}{print $1,$2,$3}‘ tab.txtww CC IDD
    • Fs= "[[: space:]+]" one or more blank spaces, the default
[[email protected] tmp]# cat space.txtwe are           studing awk now![[email protected] tmp]# awk -F [[:space:]+] ‘{print $1,$2}‘ space.txtwe are
    • Fs= "[" ":]+" with one or more spaces or: delimited
[[email protected] tmp]# cat hello.txtroot:x:0:0:root: /root:/bin/bash[[email protected] tmp]# awk -F [" ":]+ ‘{print $1,$2,$3}‘ hello.txtroot x 0
Number of fields NF
    • ":" is a delimiter, the number of separated fields NF is 8 output the line
[[email protected] tmp]# cat hello.txtroot:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologin:888[[email protected] tmp]# awk -F ":" ‘NF==8{print $0}‘ hello.txtbin:x:1:1:bin:/bin:/sbin/nologin:888
Number of records NR
    • Take the second row with one or more spaces or: delimited results
[[email protected] tmp]# ifconfig eth0| awk -F [" ":]+ ‘NR==2{print $4}‘192.168.10.10
RS Record delimiter variable
    • Set FS to "\ n" to tell awk that each field occupies one row.
    • By setting RS to "", tell awk that each address record is delimited by a blank line.
[[email protected] tmp]# cat recode.txtJimmy the Weasel100 Pleasant DriveSan Francisco, CA 12345#此处是空白行Big Tony200 Incognito Ave.Suburbia, WA 67890[[email protected] tmp]# cat awk.txt#!/bin/awkBEGIN {    FS="\n"    RS=""}{    print $1 ", " $2 ", " $3}[[email protected] tmp]# awk -f awk.txt recode.txtJimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345Big Tony, 200 Incognito Ave., Suburbia, WA 67890
OFS output Field delimiter
[[email protected] tmp]# cat hello.txtroot:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologin:888[[email protected] tmp]# awk ‘BEGIN{FS=":"}{print $1","$2","$3}‘ hello.txtroot,x,0bin,x,1[[email protected] tmp]# awk ‘BEGIN{FS=":";OFS="#"}{print $1,$2,$3}‘ hello.txtroot#x#0bin#x#1
ORS Output Record delimiter
[[email protected] tmp]# cat recode.txtJimmy the Weasel100 Pleasant DriveSan Francisco, CA 12345Big Tony200 Incognito Ave.Suburbia, WA 67890[[email protected] tmp]# cat awk.txt#!/bin/awkBEGIN {    FS="\n"    RS=""    ORS="\n\n"} {    print $1 ", " $2 ", "$3}[[email protected] tmp]# awk -f awk.txt recode.txtJimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345Big Tony, 200 Incognito Ave., Suburbia, WA 67890
Rule expressions
    • awk '/reg/{action} ' file
    • /reg/is a regular expression,
    • You can feed the records that meet the criteria into: action for processing
[[email protected] tmp]# awk ‘/root/{print $0}‘ passwdroot:x:0:0:root: /root:/bin/bashoperator:x:11:0:operator:/root:/sbin/nologin[[email protected] tmp]# awk -F : ‘$5~/root/{print $0}‘ passwdroot:x:0:0:root: /root:/bin/bash[[email protected] tmp]# ifconfig eth0|awk ‘BEGIN{FS="[[:space:]:]+"} NR==2{print $4}‘ #取出 ip192.168.10.10[[email protected] tmp]# ifconfig eth0|awk ‘BEGIN{FS="([[:space:]]|:)+"} NR==2{print $4}‘ #取出 ip192.168.10.10
Boolean expression
    • The awk ' Boolean Expression {action} ' file only executes a block of code if the preceding Boolean expression evaluates to True.
[[email protected] tmp]# awk -F: ‘$1=="root"{print $0}‘ passwdroot:x:0:0:root: /root:/bin/bash[[email protected] tmp]# awk -F: ‘($1=="root")&&($5=="root"){print $0}‘ passwdroot:x:0:0:root: /root:/bin/bash
Conditional statements
{    if ( $1== "foo" ) {        if ( $2== "foo" ) {            print "uno"        } else {            print "one"        }    } elseif ($1== "bar" ) {        print "two"    } else {        print "three"    }}
Loop structure
do...while循环{    count=1    do {        print "I get printed at least once no matter what"    } while ( count !=1 )}
for 循环{    for ( x=1;x<=4;x++ ) {        print "iteration", x    }}
Break and Continue
{    x=1    while (1) {        if ( x==4 ) {            x++            continue        }        print "iteration", x        if ( x>20 ) {            break        }    x++}
Array
{    cities[1]=”beijing”    cities[2]=”shanghai”    cities[“three”]=”guangzhou”    for( c in cities) {        print cities[c]    }    print cities[1]    print cities[“1”]    print cities[“three”]}
Example
查看服务器连接状态并汇总netstat -an|awk ‘/^tcp/{++s[$NF]}END{for(a in s)print a,s[a]}‘
统计 web 日志访问流量要求输出访问次数请求页面或图片每个请求的总大小总访问流量的大小汇总awk ‘{a[$7]+=$10;++b[$7];total+=$10}END{for(x in a)print b[x],x,a[x]|"sort -rn -k1";print"total size is :"total}‘ access_loga[$7]+=$10表示以第7列为下标的数组($10列为$7列的大小)把他们大小累加得到$7每次访问的大小后面的for循环有个取巧的地方,a和b数组的下标相同所以一条for语句足矣
awk Common functions
function Description
Gsub (Ere, REPL, [in]) In addition to the regular expression all the specific values are substituted for this, and it executes exactly like the sub function.
Sub (Ere, REPL, [in]) Replaces the first concrete value of the extended regular expression specified by the Ere parameter with the string specified by the REPL parameter, in the string designated by the in parameter. The Sub function returns the number of replacements. The & (and symbol) that appears in the string specified by the REPL parameter is replaced with a string specified by the in parameter that matches the specified extended regular expression of the Ere parameter. If the in parameter is not specified, the default value is the entire record (the $ record variable).
Index (STRING1, String2) In the string specified by the String1 parameter, where there is a parameter specified by String2, returns the position, numbering starting at 1. Returns 0 (0) If the String2 parameter is not present in the STRING1 parameter.
length [(String)] Returns the length (in character form) of the string specified by the string parameter. If the string argument is not given, the length of the entire record is returned (the $ record variable).
Blength [(String)] Returns the length, in bytes, of the string specified by the string parameter. If the string argument is not given, the length of the entire record is returned (the $ record variable).
substr (String, M, [N]) Returns a substring with the number of characters specified by the N parameter. The substring is obtained from the string specified by the string parameter, and its character begins at the position specified by the M parameter. The M parameter is specified as the first character in the string parameter as number 1. If the N parameter is not specified, the length of the substring will be the length specified by the M parameter to the end of the String parameter.
Match (String, Ere) Returns the position (in character form) in the string specified by the strings parameter (the extension regular expression specified by the Ere parameter), numbering starting at 1, or 0 (0) If the ere parameter does not appear. The Rstart special variable is set to the return value. The Rlength special variable is set to the length of the matched string, or if no match is found, set to 1 (minus one).
Split (String, A, [Ere]) Divides the parameter specified by the String parameter into the array element a[1], a[2], ..., a[n], and returns the value of the n variable. This separation can be done with an extended regular expression specified by the Ere parameter, or with the current field delimiter (FS special variable) (if the Ere parameter is not given). The elements in a array are created with string values unless the context indicates that a particular element should also have a numeric value.
ToLower (String) Returns the string specified by the string parameter, with each uppercase character in the string changed to lowercase. Uppercase and lowercase mappings are defined by the LC_CTYPE category of the current locale.
ToUpper (String) Returns the string specified by the string parameter, with each lowercase character in the string changed to uppercase. Uppercase and lowercase mappings are defined by the LC_CTYPE category of the current locale.
sprintf (Format, expr,expr, ...) Formats the expression specified by the expr parameter and returns the last generated string, based on the printf subroutine format string specified by the format parameter.
Replace
awk ‘BEGIN{info="this is a test2010test!";gsub(/[0-9]+/,"!",info);print info}‘ this is a test!test!在info中查找满足正则表达式,/[0-9]+/用”!”替换,并且替换后的值,赋值给 info未给info值,默认是$0
Find
awk ‘BEGIN{info="this is a test2010test!";print index(info,"test")?"ok":"no found";}‘ok #未找到,返回 0
Match Lookup
awk ‘BEGIN{info="this is a test2010test!";print match(info,/[0-9]+/)?"ok":"no found";}‘ok #如果查找到数字则匹配成功返回 ok,否则失败,返回未找到
Intercept
awk ‘BEGIN{info="this is a test2010test!";print substr(info,4,10);}‘s is a tes #从第 4 个 字符开始,截取 10 个长度字符串
Segmentation
awk ‘BEGIN{info="this is a test";split(info,tA," ");print length(tA);for(k in tA){print k,tA[k];}}‘ 44 test 1 this 2 is 3 a#分割info,动态创建数组tA,awk for …in 循环,是一个无序的循环。并不是从数组下标1…n 开始

05-linux Text Processing-awk

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.