A brief discussion on the text processing tools of Linux-awk sed grep

Source: Internet
Author: User
Tags natural logarithm square root

On the text processing tools of Linux

awk, boss.

"Feature description"

Language for text processing (row, filter), support for regular
NR represents the number of rows, $n take a column, $NF the last column
Nr==20,nr==30 from 20 rows to 30 rows
FS vertical cut, Column delimiter
RS cross-cut, row separator

"Syntax format"

awk   [–F]   ["[Delimiter]"] [   ' {print$1, $NF} ']     [destination file]awk ' begin{fs= "[column delimiter]+]; rs= "[Line delimiter]+";p rint "-gegin-"} nr==n{Action} end{print "-end-"} ' Xxx.txt

"Built-in variables"

                   The nth field of the current record, separated by FS between the fields. $ 0                    The complete input record. ARGC              The number of command-line arguments. Argind         The location of the current file in the command line (starting at 0).             The ARGV contains an array of command-line arguments. CONVFMT      number conversion format (default is%.6g) ENVIRON       environment variable associative array. ERRNO            A description of the last system error. fieldwidths         Field Width list (separated by Space key). FileName The               current file name. FNR                        with NR, but relative to the current file. FS                           The field delimiter (the default is any space). IGNORECASE         If true, the matching of the case is ignored.                   The number of fields in the current record.  The                  current number of records. OFMT              The output format of the number (the default value is%.6g). OFS                 the Output field delimiter (the default value is a space). ORS                 the output record delimiter (the default value is a newline character). Rlength the length of the         string that is matched by the match function. RS                   Record delimiter (default is a line break). Rstart the           first position of a string that is matched by the match function. Subsep           Array Subscript delimiter (the default value is \034).

operator

= + = = *=/=%= ^= **=        assignment?:                 C-conditional expression | |                 Logic or &&             logic and ~ ~!             Match regular expressions and mismatched regular expressions < <= > >=! = =       relational operator Space          join +-            Add, Subtract */&          multiply, divide and seek +-!          Unary Plus, minus, and logical non ^ * * *        Power +--        increase or decrease, as prefix or suffix $              field reference in             array member
"String Function"
The sub          matches the regular expression of the largest, leftmost substring in the record, replacing the strings with replacement strings. If you do not specify a target string, the entire record is used by default. Replace only occurs at the time of the first match gsub the        entire document matches the       position of the index return substring that was first matched, offset starting at position 1 substr      returns the substring starting at position 1, if the specified length exceeds the actual length, Returns the entire string split         splits the string into an array by the given delimiter. If the delimiter is not provided, then split by the current FS value length      returns the number of characters in the record match       returns the index of the position of the expression in the string, and returns 0 if the specified regular expression is not found. The match function sets the built-in variable Rstart to the beginning of a substring of a string, rlength the number of characters to the end of the substring. SUBSTR can facilitate these variables to intercept strings ToUpper and ToLower    can be used for conversion between string sizes, which is only valid in Gawk
"String Function"
atan2 (x, y)    y,x in the range of cotangent cos (XS)          cosine function exp (x)         exponentiation int (x)           rounding log (x)          natural logarithm rand ()         random number sin (x)          Sine sqrt (x)        square root srand (x)     x is the seed int (x) rounding of the rand () function          , the procedure does not round rand ()         produces a random number greater than or equal to 0 and less than 1

"Use Example"

1. View only the contents of lines 20th through 30th in the Ett.txt file (total 100 lines)
awk ' nr>19&&nr<31 ' Ett.txtawk ' {if (nr>19&&nr<31) print $} ' Ett.txt
2. Add a line number to the contents of the file
awk ' {print nr,$0} '/etc/inittab
3. Output line 24th and add line number
awk ' nr==24 {print nr,$0} '/etc/inittab
4. Standard notation
Awk-f ' [:]+ ' Nr==2{print $ (NF-1)} '  /etc/passwd equivalent to awk ' begin{fs= ' [:]+ '}nr==2{print $ (NF-1)} '/etc/passwdawk ' Begi N{rs= "/"} {print $} '/etc/passwd
5. Print the second column of the second row with one or more/rows of delimiters, the delimiter of the column is the default space, and the line number is printed
awk ' begin{rs= ' [/]+ '} nr==2{print nr,$2} ' test  
AWK supports the regular:
6, to: For the delimiter, print the 5th column with s beginning of an entire line
Awk-f ":" ' $5~/^s/{print $} '/etc/passwd    
7, with/As delimiter, match the second-to-last line of S or no s followed by the entire row of the bin
Awk-f "/" ' $ (NF-1) ~/(s|) Bin/'/etc/passwd    
8. Match the first column with SSH or FTP or MySQL line beginning or ending
awk ' $1~/^ (ssh|ftp|mysql) $/{print $1,$2} '/etc/services    
9. Output result 6 0 1 2
echo "[Email protected]@@@@@@@@@@@@@@0=============1############# #2" |awk-f ' [@=#]+ '  
10.
awk ' begin{print '---BEGIN---"} nr==2{print $ end{print"---END----"}" xxx.conf

11. The problem of awk statistics percentage

Example one:

examples of log appearance are as follows:
HTTP/youku.com 200
HTTP/youku.com 302
HTTP/youku.com 403
HTTP/youku.com 502
HTTP/baidu.com 302
HTTP/baidu.com 404

Now want to use the awk command by the domain name statistics return code is greater than or equal to 400 percent, if Youku total 4 lines, a return code greater than or equal to 400 has two lines, that is 50%
awk ' {        count[$1]++;         if ($2>400) above400[$1]++    }    end{        for (i in count) {            print I, Count[i], above400[i]/count[i]        }    } ' < Xxx.txt

Example two:

Count the percentage of all the error in a file

awk  '/error/{err++}end{print err,nr,err/nr*100 '% '} ' < Xxx.txt

12. Associative array access problems

A.txt and b.txt two files with the same two fields (Id|money), output the same ID in A and b files and a large line of money value for B file

Cat >>a.txt <<eof 1|13|35|57|79|9eof
Cat >>b.txt<<eof1|12|23|304|45|56|67|708|89|910|10eof
Awk-f ' | ' ' begin{while (Getline < "a.txt") {user_map[$1] = $;}} {    if ($ user_map) {if (User_map[$1] < $) print $;} ' B.txt

Note: If A.txt does not exist, Getline will return-1, causing a dead loop . I've been there before because of this cause the program hangs dead, so special put forward to let everybody notice

13, 99 multiplication table

awk ' Begin{for (i=1;i<10;i++) {for (j=1;j<=i;j++) printf "%d%s%d%s%d\t", I, "X", J, "=", I*j;print}} '

14, Tomcat concurrency number

Netstat-an|grep 10050|awk ' {count[$6]++} end{for (i in count) print (I,count[i])} '

Sed dick

"Feature description"

Sed is the Strem editor (stream editor) abbreviation and is a powerful tool for manipulating, filtering, and transforming text content. Commonly used functions have to increase the deletion of search, filter, take the line.

Parameters

-N             #取消默认输出-R #使用扩展正则-I #刷到磁盘-              e #执行多条sed指令-             f              #指令放在文件里

Sed-command

A    append i     insert D    remove C    Replace the specified line s    replace each line match to the first character G   replace all P   output w save   file E to   execute bash command q   Do not continue reading down

Summary process: SED software reads a line from a file or pipe, processes one line, prints one line, reads one line, processes one line, and then outputs a line ...

Change and delete

A appends text to the specified line

I insert text before the specified line

Increase

Single line increase

Sed ' 2a 106,dandan,cso ' person.txtsed ' 2i 106,DANDAN,CSO ' person.txt

Multi-line increase

Sed ' 2a 106,dandan,cso\n107,bingbing,cco ' person.txt

Enterprise Case 1 : Optimizing SSH configuration (add several parameters with one-click Completion)

When we learn about system optimization, there is an optimization point: Change the configuration of the SSH service telnet. The main operation is to add the following 5 lines of text to the SSH configuration file. (The specific meanings of the following parameters are found in other courses.) )

Port 52113PermitRootLogin nopermitemptypasswords Nousedns nogssapiauthentication No

We can use the VI command to edit this text, but this is more troublesome, now want a command to add 5 lines of text to line 13th before?

Sed-ir ' I # # # #Chris-sshd-2016.5.4-youhua######\nport 52113\npermitrootlogin no\npermitemptypasswords No\nUseDNS No \ngssapiauthentication no\n#####--end--#######\n '/etc/ssh/sshd_config

Addresses are separated by commas, and n1,n2 can be represented by numbers, regular expressions, or a combination of the two.

Other use examples

10{sed-commands}                          action on line 10th
10,20{sed-commands}                     for 10 to 20 rows, including 10th, 20 rows
10,+20{sed-commands}                  for 10 to 30 (10+20) lines, including 10th, 30 rows
1~2{sed-commands}                       to 1,3,5,7,...... Row operations
10,${sed-commands}                      action on 10 to last line ($ for last row), including line 10th
/oldboy/{sed-commands}               line operation to match Oldboy
/oldboy/,/alex/{sed-commands} to match the line of    Oldboy to the row operation of Alex
/oldboy/,${sed-commands}           line-to-last row for matching Oldboy
/oldboy/,10{sed-commands} to         match the Oldboy row to line 10th operation, Note: If the first 10 rows do not match to the oldboy,sed software will display 10 lines after the matching Oldboy line, if any.
1,/alex/{sed-commands}               line action on line 1th to match Alex
/oldboy/,+2{sed-commands} to        2 rows following the line that matches the Oldboy
By deleting

D deletes the specified row

Sed ' d ' person.txt                                       #删除全部
Sed ' 2d ' person.txt                                     #删除第二行
Sed ' 2,5d ' person.txt                                  #删除2到5行
Sed ' 3, $d ' Person.txt                                  #删除3到结尾
Sed ' 1~2d ' person.txt                                #删除1, 3, 5 rows
Sed ' 1,+2d ' person.txt                               #删除1, 2,3
Sed '/zhangyao/d ' person.txt                    #删除匹配的zhangyao行
Sed '/oldboy/,/alex/d ' person.txt             #删除匹配oldboy到Alex行
Sed '/oldboy/,3d ' person.txt                     #删除从匹配oldboy的3行

Enterprise Case 2 : Prints the contents of the file but does not contain Oldboy

Sed '/oldboy/d ' person.txt                       #删除包含 line of "Oldboy"
Change by row substitution

C replace old rows with new lines

Sed ' 2c 106,dandan,cso ' Person.txt          #替换第2行的内容
Text substitution

S: Used alone to replace the first matched string in each row

G: replace each line with all

-I: Modifying file contents

sed Software Replacement Model (Box ▇ is replaced with a triangular ▲)

Sed-i ' s/▇/▲/g ' oldboy.log sed-i ' s#▇#▲ #g ' Oldboy.log

Enterprise Case 3 : Specify rows to modify the configuration file

Specify the line to precisely modify the configuration file, which prevents changes to the place.

Sed ' 3s#0#9# ' person.txt
Variable substitution
X=ay=becho $x $ysed s# $x # $y #g test.txt
Instructions for using group substitution \ (\) and \1

The \ (\) function of the SED software can remember part of the regular expression, where \1 is the first remembered pattern, the match in the first parenthesis, \2 the second remembered pattern, the match in the second parenthesis, and the SED can remember up to 9.

Example: Echo I am Oldboy teacher. If you want to keep the word oldboy in this line, delete the remainder, and use parentheses to mark the part you want to keep.

Echo I am Oldboy teacher. |sed ' s#^.*am \ ([a-z].*\) Tea.*$#\1#g ' echo I am Oldboy teacher. |sed-r ' S#^.*am ([a-z].*) Tea.*$#\1#g ' echo I am Oldboy teacher. |sed-r ' s#i (. *) (. *) Teacher.#\1\2#g '

Command description

Idea: Replace the I am Oldboy teacher with Oldboy characters.

The following explanation is used-instead of spaces

    1. ^.*am-–> This sentence means to start with any character to am-, matching the file of the I am-string;
    2. \ ([a-z].*\)-–> This sentence of the shell is the brackets \ (\), the inside of [a-z] to match any one of 26 letters, [a-z].* together is to match any number of characters, the subject is to match the Oldboy string, because the Oldboy string is to be preserved , so enclose the match in parentheses, followed by \1 to fetch the Oldboy string.
    3. -tea.*$–> represents a space tea start, any character end, is actually matched Oldboy string, followed by the string-teacher.;
    4. The \1 in the later replaced content is the contents of the preceding parentheses, which is the Oldboy string we want.
    5. () is a meta-character that extends the regular expression, the SED software recognizes the basic regular expression by default and wants to use the extension to use \ Escape, that is, \ (\).
    6. SED uses the-r option to recognize an extended regular expression, which in turn uses \ (\) error.

Enterprise Case 4 : System boot Item Optimization

Chkconfig--list|grep "3:on" |grep-ve "Sshd|crond|network|rsyslog|sysstat" |awk ' {print '} ' |sed-r ' s#^ (. *) #chkconfig \ 1 off#g ' |bashchkconfig--list|grep "3:on"
Special symbols & representative of replaced content

#→ Replace 1 to 3 rows of C with--c--

Sed ' 1,3s#c#--&--#g ' person.txt #→ here & equals C

Enterprise Case 5 : Batch renaming files

For i in ' seq 5 ';d o touch stu_102999_${i}_finished.jpg;done ls |sed-r ' s/(. *) _finished (. *)/MV  &  
Check

P outputs the specified content, but outputs 2 matches by default, so use N to cancel the default output

Query by row
Sed ' 2p ' person.txtsed-n ' 2p ' person.txtsed-n ' 2,3p ' person.txtsed-n ' 1~2p ' person.txtsed-n ' P ' person.txt
Query by string
Sed-n '/cto/p ' person.txtsed-n '/cto/,/cfo/p ' person.txt
Mixed query
Sed-n ' 2,/cfo/p ' person.txtsed-n '/feixue/,2p ' person.txt
#特殊情况, the first two lines do not match to Feixue, they match backwards, and if matched to Feixue, the line is printed.
Other features

Backup function

Sed-i.bak ' $a 1111111111 ' xxx.txt

Back up the Xxx.txt file as Xxx.txt.bak, modify the source file, add the last line 111111111

Save function

Replace SB with an entire line of SB's output to New.txt

Uppercase and lowercase conversions

\l #全部转换成小写

\l #单个转换成小写

\u #全部转换成大写

\u #单个转换成大写

\e #需要和 \u and \l to turn off \u and \l functions

Sed-r ' s/(. *), (. *), (. *)/\l\3,\e\1,\u\2/g ' Xxx.txt

perform multiple sed instruction

Sed-e ' 3, $d '-e ' s#10#01#g ' xxx.txtsed ' 3, $d; S#10#01#g ' Xxx.txt

Print Invisible characters L

Sed-n ' l ' xxx.txt

ABC Replace ABC (one by one corresponds)

TR ' abc ' abc ' xxx.txtsed ' y#abc#abc# ' xxx.txt

Can manipulate multiple files

Sed ' y#abc#abc# ' xxx.txt 222.txt

Simulate other commands

Automatically cancel # and modify paths when creating SVN libraries
Sed-i-R ' 12,13s/#//g ' svnserve.confsed-i-R ' 20s/^# (. *)/\1/g ' svnserve.confsed-i-R ' 27s/^# (. *)/\1/g ' svnserve.conf Sed-i-R ' 12,13s/^# (. *)/\1/g ' svnserve.confsed-i-R  ' 32s/# (. *=) (. *)/\1 \/usr\/svndata\//' svnserve.conf
One command Execution (Gas)
Svnpath= ' Zhangzhicheng ' sed-i-r-e ' 20s/^# (. *)/\1/g '-E ' 27s/^# (. *)/\1/g '-E ' 12,13s/^# (. *)/\1/g '-E "32s/# (. *=) (. *)/ \1 \/usr\/svndata\/$SvnPath/"svnserve.conf
grep old End

"Feature description"

The Three Musketeers old three. Search text, filter text string –v inverse

"Option description"

Parameter options

Explanatory notes (with ※ Focus)

-V

To read out the contents of the specified content

-A

Print the contents of the following n rows

-B

Print the contents of the previous N rows

-C

The contents of the N rows before and after printing

-N

Output Line line number

-E (Egrep)

Using an extended regular expression

-O

Output only the matching results

-I.

Ignore case

-A

Add-A when grep thinks it is a binary file

"Basic Paradigm"

Example 1: Known file Test.txt content is:

Test

Liyao

Oldboy

Please give the command that does not contain the Oldboy string when outputting the contents of the Test.txt file.

Grep–v Oldboy Test.txt

Example 2: Filtering out the contents of a row containing a/etc/services file that contains a 3306 or 15,212 database port

Grep–e "3306|1521"/etc/services

Example 3:

"Skill Example"

To remove a blank line from a file:

Grep-v ' ^$ ' Test.txtegrep-o "^[^:]+" Xxx.txt   #匹配开头以非: rows and outputs matching content (-O is not an entire line of output)

A brief discussion on the text processing tools of Linux-awk sed grep

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.