International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Linux

A brief discussion on the text processing tools of Linux-awk sed grep

Last Update:2018-03-30 Source: Internet

Author: User

Tags natural logarithm square root

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

On the text processing tools of Linux

awk, boss.

"Feature description"

Language for text processing (row, filter), support for regular

NR represents the number of rows, $n take a column, $NF the last column

Nr==20,nr==30 from 20 rows to 30 rows

FS vertical cut, Column delimiter

RS cross-cut, row separator

"Syntax format"

awk   [–F]   ["[Delimiter]"] [   ' {print$1, $NF} ']     [destination file]awk ' begin{fs= "[column delimiter]+]; rs= "[Line delimiter]+";p rint "-gegin-"} nr==n{Action} end{print "-end-"} ' Xxx.txt

"Built-in variables"

                   The nth field of the current record, separated by FS between the fields. $ 0                    The complete input record. ARGC              The number of command-line arguments. Argind         The location of the current file in the command line (starting at 0).             The ARGV contains an array of command-line arguments. CONVFMT      number conversion format (default is%.6g) ENVIRON       environment variable associative array. ERRNO            A description of the last system error. fieldwidths         Field Width list (separated by Space key). FileName The               current file name. FNR                        with NR, but relative to the current file. FS                           The field delimiter (the default is any space). IGNORECASE         If true, the matching of the case is ignored.                   The number of fields in the current record.  The                  current number of records. OFMT              The output format of the number (the default value is%.6g). OFS                 the Output field delimiter (the default value is a space). ORS                 the output record delimiter (the default value is a newline character). Rlength the length of the         string that is matched by the match function. RS                   Record delimiter (default is a line break). Rstart the           first position of a string that is matched by the match function. Subsep           Array Subscript delimiter (the default value is \034).

operator

= + = = *=/=%= ^= **=        assignment?:                 C-conditional expression | |                 Logic or &&             logic and ~ ~!             Match regular expressions and mismatched regular expressions < <= > >=! = =       relational operator Space          join +-            Add, Subtract */&          multiply, divide and seek +-!          Unary Plus, minus, and logical non ^ * * *        Power +--        increase or decrease, as prefix or suffix $              field reference in             array member

"String Function"

The sub          matches the regular expression of the largest, leftmost substring in the record, replacing the strings with replacement strings. If you do not specify a target string, the entire record is used by default. Replace only occurs at the time of the first match gsub the        entire document matches the       position of the index return substring that was first matched, offset starting at position 1 substr      returns the substring starting at position 1, if the specified length exceeds the actual length, Returns the entire string split         splits the string into an array by the given delimiter. If the delimiter is not provided, then split by the current FS value length      returns the number of characters in the record match       returns the index of the position of the expression in the string, and returns 0 if the specified regular expression is not found. The match function sets the built-in variable Rstart to the beginning of a substring of a string, rlength the number of characters to the end of the substring. SUBSTR can facilitate these variables to intercept strings ToUpper and ToLower    can be used for conversion between string sizes, which is only valid in Gawk

"String Function"

atan2 (x, y)    y,x in the range of cotangent cos (XS)          cosine function exp (x)         exponentiation int (x)           rounding log (x)          natural logarithm rand ()         random number sin (x)          Sine sqrt (x)        square root srand (x)     x is the seed int (x) rounding of the rand () function          , the procedure does not round rand ()         produces a random number greater than or equal to 0 and less than 1

"Use Example"

1. View only the contents of lines 20th through 30th in the Ett.txt file (total 100 lines)

awk ' nr>19&&nr<31 ' Ett.txtawk ' {if (nr>19&&nr<31) print $} ' Ett.txt

2. Add a line number to the contents of the file

awk ' {print nr,$0} '/etc/inittab

3. Output line 24th and add line number

awk ' nr==24 {print nr,$0} '/etc/inittab

4. Standard notation

Awk-f ' [:]+ ' Nr==2{print $ (NF-1)} '  /etc/passwd equivalent to awk ' begin{fs= ' [:]+ '}nr==2{print $ (NF-1)} '/etc/passwdawk ' Begi N{rs= "/"} {print $} '/etc/passwd

5. Print the second column of the second row with one or more/rows of delimiters, the delimiter of the column is the default space, and the line number is printed

awk ' begin{rs= ' [/]+ '} nr==2{print nr,$2} ' test

AWK supports the regular:

6, to: For the delimiter, print the 5th column with s beginning of an entire line

Awk-f ":" ' $5~/^s/{print $} '/etc/passwd

7, with/As delimiter, match the second-to-last line of S or no s followed by the entire row of the bin

Awk-f "/" ' $ (NF-1) ~/(s|) Bin/'/etc/passwd

8. Match the first column with SSH or FTP or MySQL line beginning or ending

awk ' $1~/^ (ssh|ftp|mysql) $/{print $1,$2} '/etc/services

9. Output result 6 0 1 2

echo "[Email protected]@@@@@@@@@@@@@@0=============1############# #2" |awk-f ' [@=#]+ '

10.

awk ' begin{print '---BEGIN---"} nr==2{print $ end{print"---END----"}" xxx.conf

11. The problem of awk statistics percentage

Example one:

examples of log appearance are as follows:
HTTP/youku.com 200
HTTP/youku.com 302
HTTP/youku.com 403
HTTP/youku.com 502
HTTP/baidu.com 302
HTTP/baidu.com 404

Now want to use the awk command by the domain name statistics return code is greater than or equal to 400 percent, if Youku total 4 lines, a return code greater than or equal to 400 has two lines, that is 50%

awk ' {        count[$1]++;         if ($2>400) above400[$1]++    }    end{        for (i in count) {            print I, Count[i], above400[i]/count[i]        }    } ' < Xxx.txt

Example two:

Count the percentage of all the error in a file

awk  '/error/{err++}end{print err,nr,err/nr*100 '% '} ' < Xxx.txt

12. Associative array access problems

A.txt and b.txt two files with the same two fields (Id|money), output the same ID in A and b files and a large line of money value for B file

Cat >>a.txt <<eof 1|13|35|57|79|9eof

Cat >>b.txt<<eof1|12|23|304|45|56|67|708|89|910|10eof

Awk-f ' | ' ' begin{while (Getline < "a.txt") {user_map[$1] = $;}} {    if ($ user_map) {if (User_map[$1] < $) print $;} ' B.txt

Note: If A.txt does not exist, Getline will return-1, causing a dead loop . I've been there before because of this cause the program hangs dead, so special put forward to let everybody notice

13, 99 multiplication table

awk ' Begin{for (i=1;i<10;i++) {for (j=1;j<=i;j++) printf "%d%s%d%s%d\t", I, "X", J, "=", I*j;print}} '

14, Tomcat concurrency number

Netstat-an|grep 10050|awk ' {count[$6]++} end{for (i in count) print (I,count[i])} '

Sed dick

"Feature description"

Sed is the Strem editor (stream editor) abbreviation and is a powerful tool for manipulating, filtering, and transforming text content. Commonly used functions have to increase the deletion of search, filter, take the line.

Parameters

-N             #取消默认输出-R #使用扩展正则-I #刷到磁盘-              e #执行多条sed指令-             f              #指令放在文件里

Sed-command

A    append i     insert D    remove C    Replace the specified line s    replace each line match to the first character G   replace all P   output w save   file E to   execute bash command q   Do not continue reading down

Summary process: SED software reads a line from a file or pipe, processes one line, prints one line, reads one line, processes one line, and then outputs a line ...

Change and delete

A appends text to the specified line

I insert text before the specified line

Increase

Single line increase

Sed ' 2a 106,dandan,cso ' person.txtsed ' 2i 106,DANDAN,CSO ' person.txt

Multi-line increase

Sed ' 2a 106,dandan,cso\n107,bingbing,cco ' person.txt

Enterprise Case 1 : Optimizing SSH configuration (add several parameters with one-click Completion)

When we learn about system optimization, there is an optimization point: Change the configuration of the SSH service telnet. The main operation is to add the following 5 lines of text to the SSH configuration file. (The specific meanings of the following parameters are found in other courses.) )

Port 52113PermitRootLogin nopermitemptypasswords Nousedns nogssapiauthentication No

We can use the VI command to edit this text, but this is more troublesome, now want a command to add 5 lines of text to line 13th before?

Sed-ir ' I # # # #Chris-sshd-2016.5.4-youhua######\nport 52113\npermitrootlogin no\npermitemptypasswords No\nUseDNS No \ngssapiauthentication no\n#####--end--#######\n '/etc/ssh/sshd_config

Addresses are separated by commas, and n1,n2 can be represented by numbers, regular expressions, or a combination of the two.

Other use examples

10{sed-commands}                          action on line 10th

10,20{sed-commands}                     for 10 to 20 rows, including 10th, 20 rows

10,+20{sed-commands}                  for 10 to 30 (10+20) lines, including 10th, 30 rows

1~2{sed-commands}                       to 1,3,5,7,...... Row operations

10,${sed-commands}                      action on 10 to last line ($ for last row), including line 10th

/oldboy/{sed-commands}               line operation to match Oldboy

/oldboy/,/alex/{sed-commands} to match the line of    Oldboy to the row operation of Alex

/oldboy/,${sed-commands}           line-to-last row for matching Oldboy

/oldboy/,10{sed-commands} to         match the Oldboy row to line 10th operation, Note: If the first 10 rows do not match to the oldboy,sed software will display 10 lines after the matching Oldboy line, if any.

1,/alex/{sed-commands}               line action on line 1th to match Alex

/oldboy/,+2{sed-commands} to        2 rows following the line that matches the Oldboy

By deleting

D deletes the specified row

Sed ' d ' person.txt                                       #删除全部

Sed ' 2d ' person.txt                                     #删除第二行

Sed ' 2,5d ' person.txt                                  #删除2到5行

Sed ' 3, $d ' Person.txt                                  #删除3到结尾

Sed ' 1~2d ' person.txt                                #删除1, 3, 5 rows

Sed ' 1,+2d ' person.txt                               #删除1, 2,3

Sed '/zhangyao/d ' person.txt                    #删除匹配的zhangyao行

Sed '/oldboy/,/alex/d ' person.txt             #删除匹配oldboy到Alex行

Sed '/oldboy/,3d ' person.txt                     #删除从匹配oldboy的3行

Enterprise Case 2 : Prints the contents of the file but does not contain Oldboy

Sed '/oldboy/d ' person.txt                       #删除包含 line of "Oldboy"

Change by row substitution

C replace old rows with new lines

Sed ' 2c 106,dandan,cso ' Person.txt          #替换第2行的内容

Text substitution

S: Used alone to replace the first matched string in each row

G: replace each line with all

-I: Modifying file contents

sed Software Replacement Model (Box ▇ is replaced with a triangular ▲)

Sed-i ' s/▇/▲/g ' oldboy.log sed-i ' s#▇#▲ #g ' Oldboy.log

Enterprise Case 3 : Specify rows to modify the configuration file

Specify the line to precisely modify the configuration file, which prevents changes to the place.

Sed ' 3s#0#9# ' person.txt

Variable substitution

X=ay=becho $x $ysed s# $x # $y #g test.txt

Instructions for using group substitution \ (\) and \1

The \ (\) function of the SED software can remember part of the regular expression, where \1 is the first remembered pattern, the match in the first parenthesis, \2 the second remembered pattern, the match in the second parenthesis, and the SED can remember up to 9.

Example: Echo I am Oldboy teacher. If you want to keep the word oldboy in this line, delete the remainder, and use parentheses to mark the part you want to keep.

Echo I am Oldboy teacher. |sed ' s#^.*am \ ([a-z].*\) Tea.*$#\1#g ' echo I am Oldboy teacher. |sed-r ' S#^.*am ([a-z].*) Tea.*$#\1#g ' echo I am Oldboy teacher. |sed-r ' s#i (. *) (. *) Teacher.#\1\2#g '

Command description

Idea: Replace the I am Oldboy teacher with Oldboy characters.

The following explanation is used-instead of spaces

^.*am-–> This sentence means to start with any character to am-, matching the file of the I am-string;
\ ([a-z].*\)-–> This sentence of the shell is the brackets \ (\), the inside of [a-z] to match any one of 26 letters, [a-z].* together is to match any number of characters, the subject is to match the Oldboy string, because the Oldboy string is to be preserved , so enclose the match in parentheses, followed by \1 to fetch the Oldboy string.
-tea.*$–> represents a space tea start, any character end, is actually matched Oldboy string, followed by the string-teacher.;
The \1 in the later replaced content is the contents of the preceding parentheses, which is the Oldboy string we want.
() is a meta-character that extends the regular expression, the SED software recognizes the basic regular expression by default and wants to use the extension to use \ Escape, that is, \ (\).
SED uses the-r option to recognize an extended regular expression, which in turn uses \ (\) error.

Enterprise Case 4 : System boot Item Optimization

Chkconfig--list|grep "3:on" |grep-ve "Sshd|crond|network|rsyslog|sysstat" |awk ' {print '} ' |sed-r ' s#^ (. *) #chkconfig \ 1 off#g ' |bashchkconfig--list|grep "3:on"

Special symbols & representative of replaced content

#→ Replace 1 to 3 rows of C with--c--

Sed ' 1,3s#c#--&--#g ' person.txt #→ here & equals C

Enterprise Case 5 : Batch renaming files

For i in ' seq 5 ';d o touch stu_102999_${i}_finished.jpg;done ls |sed-r ' s/(. *) _finished (. *)/MV  &

Check

P outputs the specified content, but outputs 2 matches by default, so use N to cancel the default output

Query by row

Sed ' 2p ' person.txtsed-n ' 2p ' person.txtsed-n ' 2,3p ' person.txtsed-n ' 1~2p ' person.txtsed-n ' P ' person.txt

Query by string

Sed-n '/cto/p ' person.txtsed-n '/cto/,/cfo/p ' person.txt

Mixed query

Sed-n ' 2,/cfo/p ' person.txtsed-n '/feixue/,2p ' person.txt

#特殊情况, the first two lines do not match to Feixue, they match backwards, and if matched to Feixue, the line is printed.

Other features

Backup function

Sed-i.bak ' $a 1111111111 ' xxx.txt

Back up the Xxx.txt file as Xxx.txt.bak, modify the source file, add the last line 111111111

Save function

Replace SB with an entire line of SB's output to New.txt

Uppercase and lowercase conversions

\l #全部转换成小写

\l #单个转换成小写

\u #全部转换成大写

\u #单个转换成大写

\e #需要和 \u and \l to turn off \u and \l functions

Sed-r ' s/(. *), (. *), (. *)/\l\3,\e\1,\u\2/g ' Xxx.txt

perform multiple sed instruction

Sed-e ' 3, $d '-e ' s#10#01#g ' xxx.txtsed ' 3, $d; S#10#01#g ' Xxx.txt

Print Invisible characters L

Sed-n ' l ' xxx.txt

ABC Replace ABC (one by one corresponds)

TR ' abc ' abc ' xxx.txtsed ' y#abc#abc# ' xxx.txt

Can manipulate multiple files

Sed ' y#abc#abc# ' xxx.txt 222.txt

Simulate other commands

Automatically cancel # and modify paths when creating SVN libraries

Sed-i-R ' 12,13s/#//g ' svnserve.confsed-i-R ' 20s/^# (. *)/\1/g ' svnserve.confsed-i-R ' 27s/^# (. *)/\1/g ' svnserve.conf Sed-i-R ' 12,13s/^# (. *)/\1/g ' svnserve.confsed-i-R  ' 32s/# (. *=) (. *)/\1 \/usr\/svndata\//' svnserve.conf

One command Execution (Gas)

Svnpath= ' Zhangzhicheng ' sed-i-r-e ' 20s/^# (. *)/\1/g '-E ' 27s/^# (. *)/\1/g '-E ' 12,13s/^# (. *)/\1/g '-E "32s/# (. *=) (. *)/ \1 \/usr\/svndata\/$SvnPath/"svnserve.conf

grep old End

"Feature description"

The Three Musketeers old three. Search text, filter text string –v inverse

"Option description"

Parameter options	Explanatory notes (with ※ Focus)
-V	To read out the contents of the specified content
-A	Print the contents of the following n rows
-B	Print the contents of the previous N rows
-C	The contents of the N rows before and after printing
-N	Output Line line number
-E (Egrep)	Using an extended regular expression
-O	Output only the matching results
-I.	Ignore case
-A	Add-A when grep thinks it is a binary file

"Basic Paradigm"

Example 1: Known file Test.txt content is:

Test

Liyao

Oldboy

Please give the command that does not contain the Oldboy string when outputting the contents of the Test.txt file.

Grep–v Oldboy Test.txt

Example 2: Filtering out the contents of a row containing a/etc/services file that contains a 3306 or 15,212 database port

Grep–e "3306|1521"/etc/services

Example 3:

"Skill Example"

To remove a blank line from a file:

Grep-v ' ^$ ' Test.txtegrep-o "^[^:]+" Xxx.txt   #匹配开头以非: rows and outputs matching content (-O is not an entire line of output)

A brief discussion on the text processing tools of Linux-awk sed grep

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

sed and awk pdf sed and awk book online group discussion tools grep replace text best online discussion tools linux sed tutorial ediscovery processing tools

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A brief discussion on the text processing tools of Linux-awk sed grep

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support