Detailed shell script: sed command tool, awk command tool

Source: Internet
Author: User
Tags printable characters egrep


-----------------------------Overview-----------------------------------
The
Linux Text Processing tool (the following are commands and tools, and only as a display, not as a modification action)

grep (filtering, but does not support regular expressions)
Egrep (regular expression support)
Sed (row filter)
awk (column filtering)


-


the concept of regular expressions

Regular Expressions: Use a single string to describe and match a series of strings that conform to a certain syntactic rule

Composed of ordinary characters and special characters, generally used in script programming, text editor, such as PHP, Python, she and so on, shorthand for regex, regexp, used to retrieve, replace the text conforming to the pattern, with strong text matching function

The ability to process this document quickly and efficiently in the text ocean


-


Regular Expression hierarchy:

1 Basic Regular expressions
2 Extending regular expressions


-


Regular expressions: That is, to find the law of string matching

such as the format of the mailbox: [email protected]

Like mobile phone number: 1 [356789][0-9]{10}
The first digit does not change [second digit within range]

such as the age of a person: [0-1] {1} [0-2] [0-9] | [0-9] {2}
[Range of matches] {Number of matches}

--------------Basic Regular Expression meta-character----------basic regular expression is a common regular expression part
nonprinting
characters: non-printable characters can also be part of regular expressions. The following table lists the escape sequences that represent nonprinting characters: character descriptions
  • \CX matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
  • \f matches a page break. Equivalent to \x0c and \CL.
  • \ n matches a line break. Equivalent to \x0a and \CJ.
  • \ r matches a carriage return character. Equivalent to \x0d and \cm.
  • \s matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v]. Note Unicode Regular expressions match full-width whitespace characters.
  • \s matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
  • \ t matches a tab character. Equivalent to \x09 and \ci.

  • \v matches a vertical tab. Equivalent to \x0b and \ck.


-


Special characters: The so-called special characters, which are some characters with special meanings, as mentioned above in Runoo b , simply means to represent any string meaning. If you are looking for a symbol in a string , you need to escape it by adding a \: Runo*ob match to the runo*ob in front of it. Many metacharacters require special treatment when trying to match them. To match these special characters, you must first make the characters "escaped," that is, the backslash character \ is placed before them. The following table lists the special characters in regular expressions: special character descriptions
  • $ matches the end position of the input string. If the Multiline property of the RegExp object is set, then $ also matches ' \ n ' or ' \ R '. To match the $ character itself, use \$.
  • () to mark the start and end positions of a sub-expression. Sub-expressions can be obtained for later use. To match these characters, use (and).
  • matches the preceding subexpression 0 or more times. To match characters, use *.
    • Matches the preceding subexpression one or more times. to match the + character, use +.
  • . Matches any single character except for the newline character \ n. to match. , please use. 。
  • [Marks the beginning of a bracket expression. To match [, use [.
  • ? Matches the preceding subexpression 0 or one time, or indicates a non-greedy qualifier. to match? characters, use \?.
  • \ marks the next character as either a special character, or a literal character, or a backward reference, or octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches line breaks. The sequence ' \ ' matches ' \ ', while ' (' then matches ' (".
  • ^ matches the starting position of the input string, unless used in a square bracket expression, at which point it indicates that the character set is not accepted. To match the ^ character itself, use \^.
  • {The beginning of the tag qualifier expression.} To match {, use {.
  • | Indicates a choice between the two items. to match |, please use |.


-


Qualifier: The qualifier is used to specify how many times a given component of a regular expression must appear to satisfy a match. have * or + or? or {n} or {n,} or {n,m} altogether 6 kinds. The qualifiers for a regular expression are:

Character description

  • matches the preceding subexpression 0 or more times. For example, Zo can match "z" and "Zoo". * Equivalent to {0,}.
    • Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
  • ? Matches the preceding subexpression 0 or one time. For example, do (es) can match "do" in "Do", "does" in "does", "Doxy" in "Do"? Equivalent to {0,1}.
  • {n} n is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
  • {N,} n is a non-negative integer. Match at least n times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
  • {n,m} m and n are non-negative integers, where n <= m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers.


-


To illustrate:
[[email protected] ~] # vim test01.sh
gd
god
good
goood
gooood
gold
glad
gaad
abcEfg
food
12345678Z
888-88888888
6666-6666666
IP 192.168.120.5
IP 119.75.217.109
pay $ 888

[[email protected] ~] # sed -n ‘/ \ $ / p’ test01.sh
pay $ 888

[[email protected] ~] # awk ‘/ \ $ /’ test01.sh
pay $ 888

[[email protected] ~] # grep ‘\ $’ test01.sh
pay $ 888

[[email protected] ~] # grep "^ [a-z]" test01.sh #### starts with a letter

[[email protected] ~] # sed -n ‘/ ^ [a-z] / p’ test01.sh #### begins with a letter

[[email protected] ~] # grep ‘[0-9] $’ test01.sh #### End with a number

[[email protected] ~] # grep ‘go.d’ test01.sh
good
gold


-


Common meta characters (cont.)
*: Match the previous subexpression 0 or more times

Example: GooD, go. D

Example: [ABC], [A-z], [a-z0-9]

Example: [^a-z], [^0-9], [^a-z0-9]

{n,m}: matches the preceding subexpression N to M times, with {n}, {n}, {n,m} three formats
Example: Go{2}d, Go{2,3}d, go{2,}d


-


To
illustrate:
[[email protected] ~] # grep ‘goo * d’ test01.sh
god
good
goood
gooood

[[email protected] ~] # grep "go [la] d" test01.sh
gold

[[email protected] ~] # sed -n ‘/ go [la] d / p’ test01.sh
gold

[[email protected] ~] # awk ‘/ go [la] d /’ test01.sh
gold

[[email protected] ~] # grep ‘[0-9] \ {3,4 \}-[0-9] \ {7,8 \}’ test01.sh
888-88888888
6666-6666666

[[email protected] ~] # grep '[0-9] \ {1,3 \} \. [0-9] \ {1,3 \} \. [0-9] \ {1,3 \} \. [0-9] \ {1,3 \} 'test01.sh
IP 192.168.120.5
IP 119.75.217.109

-

### PS:
### egrep ‘go \ {2 \} d’
  #When a command of this form appears, be sure to add ‘‘. Generally, when the two characters \ {} appear in the command, it ’s best to add ‘‘ [the command is not recognized and used after removal]


-


-----------------an extended regular expression--------------
extended Regular expression is an extension of the underlying regular expression to deepen the expansion of the meta-character

+: Match front face expression more than 1 times
Example: Go+d, will match at least one 0
?: Match front face expression 0 or 1 times
Example: Go?d, will match GD or God

To illustrate:
[[email protected] ~] # grep go + d test01.sh #Extensions are not supported
[[email protected] ~] # egrep go + d test01.sh
god
good
goood
gooood

[[email protected] ~] # awk ‘/ go + d /’ test01.sh
god
good
goood
gooood

[[email protected] ~] # sed -n ‘/ go + d / p’ test01.sh #Extensions are not supported

[[email protected] ~] # awk ‘/ go? d /’ test01.sh
gd
god


Extended meta characters (cont.)


(): Enclose the string in parentheses as a whole
Example: (XYZ) +, will match xyz overall more than 1 times, such as xyzxyz

| : Match the note string in a way or
Example 1:good food, will match good or food
Example 2:g (Oo|la) d, will match good or glad
To illustrate:
[Email protected] ~]# egrep ' g (oo) +d ' test01.sh
Good
Gooood

[[email protected] ~]# egrep ' g (LA|AA) d ' test01.sh
Glad
Gaad

[Email protected] ~]# egrep-v ' #|^$|^: '/etc/ssh/sshd_config

 
 
----------------------------------sed Tools-----------------------------
Overview of the SED tools:

SED is a text processing tool that reads text content and processes it according to specified conditions such as delete, replace, add, etc.

SED is a stream editor, which is a very useful tool in text processing and can be used perfectly in conjunction with regular expressions. When processing, the currently processed rows are stored in a temporary buffer called pattern space, followed by the SED command to process the contents of the buffer, and after processing is done, the contents of the buffer are sent to the screen. Then the next line is processed, so it repeats until the end of the file. The file content does not change unless you use redirection to store the output. SED is mainly used to automatically edit one or more files, to simplify the repeated operation of the file, to write the conversion program and so on.


-


SED command format:
Command syntax
Sed-e ' edit directive ' file 1 File 2 .....
Sed-n-E ' edit directive ' file 1 File 2 .....
Sed-i-E ' edit directive ' file 1 File 2 .....


-


Common options
-e Specifies the command to execute, which can be omitted when there is only one edit command
-N outputs only processed rows and does not appear when read
-I direct programming files without outputting results


-


Edit Command format
[Address 1[, address 2] operation [parameters]

"Address", can be numeric, regular expression, $, no address represents all rows

"Operation" can be P, D, S, R, W, I, etc.


-


SED command:

A\: Inserts text below the current line.
I\: Inserts text above the current line.
C\: Changes the selected line to a new text.
D: Delete, delete the selected row.
D: Delete the first line of the template block.
S: replace specified characters
H: Copies the contents of the template block into an in-memory buffer.
H: Appends the contents of the template block to the in-memory buffer.
G: Gets the contents of the memory buffer and overrides the text in the current template block.
G: Gets the contents of the memory buffer and appends it to the text of the current template block.
L: The list cannot print a list of characters.
N: Reads the next input line, processing the new row with the next command instead of the first command.
N: Append the next input line to the template block and embed a new line between the two, changing the current line number.
P: Prints the line of the template block.
P: (uppercase) The first line of the print template block.
Q: Exit sed.
B:lable branches to the markup in the script, branching to the end of the script if the branch does not exist.
R:file reads rows from file.
T:label if branch, starting with the last line, the condition satisfies or t,t the command, causing the branch to be at the command with a label, or at the end of the script.
T:label the wrong branch, starting with the last line, when an error or T,T command occurs, it causes the branch to be at the command with a label, or to the end of the script.
W:file writes and appends the template block to the end of file.
W:file writes and appends the first line of the template block to the end of file.
! : Indicates that the subsequent command has effect on all rows that are not selected.
=: Prints the current line number.
#: Extends annotations before the next line break.

"Parameters", generally with g, represent as long as all the conditions are met for processing


-


Sed meta Character set:
^: Match line starts, such as:/^sed/matches all lines beginning with sed.
$: Matches the end of the line, such as:/sed$/matches all lines ending in sed.
. : matches any character that does not have a line break, such as:/s.d/matches an arbitrary character followed by a, and finally D.

  • : matches 0 or more characters, such as:/*sed/match all the templates are one or more spaces followed by the SED line.
    []: matches a specified range of characters, such as/[ss]ed/matching sed and sed.
    [^]: matches a character that is not within the specified range, such as:/[^a-rt-z]ed/matches a letter that does not contain a-r and t-z, immediately following the line of Ed.
    (..) : Match substring, save matching characters, such as s/(Love) able/\1rs,loveable is replaced with lovers.
    &: Save search characters to replace other characters, such as s/love/&/,love This into love.
    \<: Matches the beginning of a word, such as:/\<love/matches a line containing a word that begins with love.
    \>: Matches the end of a word, such as/love\>/matches a line containing a word that ends with love.
    X{M}: Repeats characters x,m times, such as:/0{5}/matches rows that contain 5 0.
    X{m,}: Repeats the character X, at least m times, such as:/0{5,}/matches at least 5 rows of 0.
    X{m,n}: Repeat character x, at least m times, not more than n times, such as:/0{5,10}/matches 5~10 0 rows.


-


Examples of sed usage:

Delete Example
Sed '3,5d' bfile #delete number 3.
Sed '/ xml / d' bfile #Delete the line containing xm
Sed '/ ^ install / d' bfle #Remove the line beginning with instal
Sed '/ arch $ / d' bfle #Delete lines ending with arch
Sed '$ d' bfile #Delete the last line
Sed '/ ^ $ / d' bfle #Delete all blank lines


-


To illustrate:
[[email protected] ~] # vim test01.sh
     1 gd
     2 god
     3 good
     4 goood
     5 gooood
     6 gold
     7 glad
     8 gaad
     9 abcEfg
    10 food
    11 12345678Z
    12 888-88888888
    13 6666-6666666
    14 IP 192.168.120.5
    15 IP 119.75.217.109
    16 pay $ 888

[[email protected] ~] # sed -n ‘12p’ test01.sh
888-88888888

[[email protected] ~] # sed -n ‘3,5p’ test01.sh
good
goood
gooood

[[email protected] ~] # sed -n ‘p; n’ test01.sh #Print odd lines
gd
good
gooood
glad
abcEfg
12345678Z
6666-6666666
IP 119.75.217.109

[[email protected] ~] # sed -n ‘n; p’ test01.sh #print even lines
god
goood
gold
gaad
food
888-88888888
IP 192.168.120.5
pay $ 888

[[email protected] ~] # sed -n ‘1,5 {p; n}’ test01.sh #Output odd rows between 1 and 5 rows
gd
good
gooood

[[email protected] ~] # sed -n ‘$ p’ test01.sh #Print the last line
pay $ 888

[[email protected] ~] # sed -n -e ‘1p; 10p’ test01.sh #Output lines 1 and 10
gd
food

[[email protected] ~] # sed ‘16d’ test01.sh #Delete 16 lines, do not operate on the text
gd
god
good
goood
gooood
gold
glad
gaad
abcEfg
food
12345678Z
888-88888888
6666-6666666
IP 192.168.120.5
IP 119.75.217.109

[[email protected] ~] # cat -n test01.sh #View a few more blank lines

[[email protected] ~] # sed -i ‘/ ^ $ / d’ test01.sh #Operate directly on text, delete blank lines


-


Replacement Examples:

Sed ' s/xml/xml/' bfile//#Replace the first xml in each line with XML
Sed 's / xml // g' bfle / # delete all xml in the file
All XML in the sed '3,5s / xml / xml / g' bfile / # 3-to-line is replaced with ⅩML
Sed '/ xml / s / com / com / g' bfile // # Replace com in all lines containing xml with COM



-



To illustrate:
[[Email protected] ~] # sed '/ ^ ip / s / ^ / # /' test01.sh #match IP and added to the beginning #

[Email protected] ~] # sed 'one s / $ / abc /' test01.sh #Add ABC at the end of line 11

[[Email protected] ~] # sed '5,10 s / $ / abc /' test01.sh #at 5-10 line End Add ABC

[Email protected] ~] # sed '/ ^ abcd / s / $ / abc /' test01.sh #Add ABC at the end of the line that starts with abcd



-



Execute edit Commands multiple times
Sed-e '3,5p' -e '3,5s / xml / xml / g' bfile
Multiple editing commands can be saved to a file with the-F specified file to complete multiple processing operations

To illustrate:
[Email protected] ~] # sed '2cABC' test01.sh #Replace the second line with ABC
[Email protected] ~] # sed '2aABC' test01.sh #Insert the second line into ABC

[Email protected] ~] # sed '5r / proc / version' test01.sh #Append under line 5

[Email protected] ~] # sed '2iabc123' test01.sh #Add ABC123 in the second line

[[Email protected] ~] # sed '15,16w out.txt' test01.sh #  Current 15 ~ 16 row saved to current directory OUT.txt

[[Email protected] ~] # sed '/ ^ ip / {h; d}; $ G' test01.sh #Cut the two lines starting with IP to the last line, H stands for replication

[[Email protected] ~] # sed '1,5h; 15,16g' test01.sh #  1-i copy, 15 paste once, 16 lines paste once






--------------the awk tool introduction------------------
awk is also a powerful editing tool that, like SED, can implement quite complex text manipulation command formats without interactivity

awk option ' mode or condition {edit instruction} ' file 1 file 2
Awk-f script file 1 file 2

Working principle

Reads text line by row, by default separated by spaces, saves separate fields into built-in variables, and executes edit commands by pattern or condition


-


To
illustrate:

[Email protected] ~]# vim 6.txt
11 22 33 44
55 66 77 88

[[email protected] ~]# awk ' {print $} ' 6.txt
11
55

[[email protected] ~]# awk ' {print $ '---' $6.txt '
---33
---77

[Email protected] ~]# sed-i ' s//:/g ' 6.txt #空格加:
[email protected] ~]# cat 6.txt
11:22:33:44:
55:66:77:88:

[[email protected] ~]# awk-f ': ' {print $} ' 6.txt #第三列以 ":" Delimited
33
77


-


awk built-in variables:

FS: Specifies the field delimiter for each line of text, by default a space or tab stop (-f)
NF: The number of fields in the currently processed row
NR: Line number (ordinal) of the line currently being processed
$ A: The entire line of the currently processed row
$n: Nth field (column N) of the currently processed row


-


To illustrate:

[[email protected] ~]# awk-f ': ' {print $3,nf} ' 6.txt
33 5
77 5
[email protected] ~]# cat 6.txt
11:22:33:44: # Think there are arguments, the number of fields in the row is 5
55:66:77:88:

[[email protected] ~]# awk-f ': ' {print nr,$3,nf} ' 6.txt
1 33 5
2 77 5

[[email protected] ~]# awk ' {print $} ' 6.txt
11:22:33:44:
55:66:77:88:


-


Examples of awk tools:
Print text content
awk 'nr == 1nr == 3 {print}' bfile #Output the first to the third lines
awk 'nr == 1 | nr == 3 {Print} bfile #Output line 1, line 3rd content
awk '/ ^ root / (print' / etc / passwd # print lines starting with root

Output text by field
awk '(print $ 1, $ 3)' bfile # print the first and third fields in each line
Awk-f ":" '{print $ 1, $ 7}' / etc / shadow #Output shadow records of users with empty passwords



-



Example:
[[[email protected] ~] # awk 'nr == 1, nr == 3 {print}' test01.sh
gd
God
Good

[[em Ail protected] ~] # awk '(nr> = 1) && (nr <= 3) {print}' test01.sh #Print the first line to the third line
gd
God
Good

[[email protected] ~] # awk 'nr == 1 | | Nr == 3 {print}' test01.sh #print the first line, third line
GD
Good

[[email protected] ~] # awk '(nr% 2) == 0 {print}' test01.sh # Enter even rows

[[email protected] ~] # awk '(nr% 2) == 1 {print}' test01.sh #Enter odd lines

[[email protected] ~] # awk '/ ^ i /' test01.sh #line starting with uppercase I

[[email protected] ~] # awk-f ':' $ 2 == '"{print'} '/ etc / shadow #Password is empty

[[Email protected] ~] # awk-f ':' {print $ 1, $ 3, $ 4, $ 7} '/ etc / passwd

[[email protected] ~] # free-m | awk '/ cache: /'
/ + buffers / cache: 71 925
[[email protected] ~] # free-m | awk '/ cache: / {print $ 3 + $ 4}'

[email protected] ~] # free-m | awk '/ cache: / {print int ($ 3 / ($ 3 + $ 4) * 100)}' #int is an integer, not rounded
6












Detailed shell script: sed command tool, awk command tool


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.