Come back! Master usage of SED

Source: Internet
Author: User

Above linuxfans. orgWatch_1394The advanced usage of SED is posted here!

I saw the top post on the Forum, one of which is about sed. I read it but it is not completely. In addition, I learned this command based on the electric sub-version sed & awk OF THE o'reilly book, I have translated the advanced usage of sed in this book based on my usage experience. Of course, I have bought this book, but it is very rare and expensive. I hope this job can help the poor like me in the jar. If there is an error, don't hit me with bricks. Just point it out to me.
I have been busy designing a chemical process recently, so I only need to do it slowly. I will do this part in part. You may have time to use the advanced usage of awk. I hope that the moderator will not reply to the website either. I have never been able to participate in the open-source business by writing software in my life. I hope I can push it from the other side.

The original book can be found here:
Http://www.unix.org.ua/orelly/

First, you should understand the definition of the mode space. The mode space is the cache where the read row is located, and sed processes the text row in this cache. This is helpful for the next study.
Normally, sed reads the row to be processed into the mode space. The commands in the script process the row one by one until the script is executed and the row is output, the mode space should be blank. Repeat the previous action and read a new row in the file until the file is fully processed.
However, for various reasons, for example, the user wants to execute a command in the script under a certain condition, or wants the mode space to be retained for the next processing, it is possible that SED does not follow the normal process when processing files. At this time, sed sets some advanced commands to meet user requirements.
In general, these commands can be divided into the following three types:
1. N, D, and P: processing the multi-row mode space;
2. H, H, G, G, and X: Put the content of the mode space into the bucket for subsequent editing;
3.:, B, and T: implement the branch and condition structure in the script.
Processing of multi-row mode space:
Because the regular expression is Row-oriented, if a phrase is not at the end of a row, and the other part is at the beginning of the next row, in this case, it is quite difficult to use commands such as grep for processing. However, the SED multi-line command N, D, P can easily complete this task.
The multi-line next (n) command is relative to the next (n) command. The latter outputs the content in the mode space and then reads the next line into the mode space, however, the script is not transferred to the start but is executed after the current N command. The former saves the content in the original mode space and then reads a new row, the two are separated by a line break "/N. After the N command is executed, the control flow continues to process the mode space with the commands after the N command.
It is worth noting that in multiline mode, the special characters "^" and "$" match the beginning and end of the pattern space, instead of embedding the start and end of "/N.
Example 1:
$ Cat expl.1
Consult section 3.1 In the owner and operator
Guide for a description of the tape drives
Available on your system.
Replace "owner and operator Guide" with "Installation Guide ":
$ Sed '/operator $ /{
> N
> S/owner and operator/nguide/Installation Guide/
>/
>} 'Expl.1
In the above example, there is an embedded line break between the line and the line. In addition, if you want to insert a line break in the content to replace the line break, you must use the escape of the above.
Let's look at another example:
Example 2:
$ Cat expl.2
Consult section 3.1 In the owner and operator
Guide for a description of the tape drives
Available on your system.

Look in the owner and operator guide shipped with your system.

Two manuals are provided including the owner and
Operator guide and the user guide.

The owner and operator guide is shipped with your system.
$ SED's/owner and operator Guide/Installation Guide/
>/Owner /{
> N
> S/*/N //
> S/owner and operator guide */Installation Guide/
>/
} 'Expl.2
The result is as follows:
Consult section 3.1 in the installation guide
For a description of the tape drives
Available on your system.

Look in the installation guide shipped with your system.

Two manuals are provided including the Installation Guide
And the user guide.

The Installation Guide is shipped with your system.
It seems that it is unnecessary to replace the SED command twice. In fact, if you remove the first replacement and run the script, two problems will occur in the output. One is that the last line in the result will not be replaced (in some versions of SED, it will not even be output ). This is because the last line matches the "owner" and executes the N command. However, when the end of the file is reached, some versions will directly print this line and exit, other versions Exit immediately without printing. For this problem, run the "$! N "to solve. This indicates that the n command does not work on the last line. Another problem is that the "look manuals" section is split into two rows, and the blank rows with the next section are deleted. This is because the embedded linefeed is replaced. Therefore, it is not unnecessary to replace sed twice.
Example 3:
$ Cat expl.3
<Para>

This is a test paragraph in interleaf style ASCII. Another line
In a paragraph. Yet another.

<Figure begin>

V.11111111111111111111111_00000000000000011111111111111000000
100001000100100010001000001000000000000000000000000000000000000
000000

<Figure end>

<Para>

More lines of text to be found after the figure.
These lines shocould print.
The SED command is as follows:
$ Sed '/<para> {
> N
> C/
>. Lp
>}
>/<Figure begin>/,/<figure end> /{
> W fig. interleaf
>/<Figure end>/I/
>. Fg/
> <Insert figure here>/
>. Fe
> D
>}
>/^ $/D' expl.3
The result is as follows:
. Lp
This is a test paragraph in interleaf style ASCII. Another line
In a paragraph. Yet another.
. FG
<Insert figure here>
. Fe
. Lp
More lines of text to e found after the figure.
These lines shocould print.
The content between <figure begin> and <figure end> is written to the file "Fig. interleaf ". It is worth noting that the command "D" does not affect the content inserted by the command I.
The "D" command is used to delete the content of the mode space and then read the new row. The SED script is executed again from the beginning. The difference between the command "D" is that it deletes a part of the pattern space until the first line break is embedded, but it does not read new lines, the script will return to start processing the remaining content.
Example 4:
$ Cat expl.4
This line is followed by 1 blank line.

This line is followed by 2 blank line.

This line is followed by 3 blank line.

This line is followed by 4 blank line.

This is the end.
Different deletion commands get different results:
$ Sed '/^ $ /{
> N
>/^/N $/D>/^/N $/d
>}'Expl.4 >}' expl.4
Sed outputs each row (whether processed or not) in the file by default. If the option "-n" is added, the output will be restrained, in this case, you also need to print the output command. The print command for the single-line mode space is "P", and the print command for the multi-line mode space is "p ". The p command prints a portion of the pattern space until the first line break is embedded.
The p command usually appears before the D command after the N command, thus forming an input and output loop. In this case, there are always two lines of text in the mode space, and the output is always a line of text. The purpose of this loop is to display the first line in the output mode space, return the script to the starting point, and then process the second line in the space. Imagine that without this loop, when the script execution is complete, the content in the mode space will be output, which may not meet the user's requirements or reduce the efficiency of program execution.
The following is an example:
Example 5:
$ Cat expl.5
Here are examples of the UNIX
System. Where Unix
System appears, it shocould be the Unix
Operating System.
$ Sed '/Unix $ /{
> N
> // Nsystem /{
> S // operating &/
> P
> D
>}
>} 'Expl.5
The replacement result is:
Here are examples of the UNIX Operating
System. Where Unix operating
System appears, it shocould be the Unix
Operating System.
You can replace "P" and "d" in the SED command with lowercase letters to compare the differences between the two types of commands.
The following example is quite difficult:
Example 6:
$ Cat expl.6
I want to see @ Fl (what will happen) if we put
Font change commands @ Fl (on a set of lines). If I understand
Things (correctly), the @ Fl (third) line causes problems. (No ?).
Is this really the case, or is it (maybe) Just something else?

Let's test having two on a line @ Fl (here) and @ Fl (there)
Well as one that begins on one line and ends @ Fl (somewhere
On another line). What if @ Fl (it is here) on the line?
Another @ Fl (one ).
What we want to do now is to "FL @(...) Replace with "/FB (...) /FR. The following is the SED command that meets the conditions:
$ SED's/@ Fl (/([^)] */) // FB/1 // fr/g
>/@ Fl (.*/{
> N
> S/@ Fl (/(. */n [^)] */) // FB/1 // fr/g
> P
> D
>} 'Expl.6
However, if we do not use this input and output loop, but use N alone, the following problems may occur:
$ SED's/@ Fl (/([^)] */) // FB/1 // fr/g
>/@ Fl (.*/{
> N
> S/@ Fl (/(. */n [^)] */) // FB/1 // fr/g
>} 'Expl.6
Such sed scripts are vulnerable.

Store rows:
The definition of the mode space has been explained earlier, and the cache in SED is also called the storage space. The content in the mode space and bucket can be copied to each other using a set of commands:
Command shorthand Function
Hold h or H copies or attaches the content of the mode space to the bucket
Get g or G copies or attaches the content of the bucket to the mode space
Exchange x exchange mode space and content in the bucket
The difference between upper case and lower case is that the upper case command attaches the content of the source space to the target space, while the lower case command overwrites the target space with the content of the source space. It is worth noting that both the hold command and the GET command will add a line break after the original content of the target space before adding the content in the source space to the line break.
The following example shows the preliminary application of this part of content:
Example 7:
$ Cat expl.7
1
2
11
22
111
222
The job we need to do is to swap the first line with the second line, the third line with the fourth line, and the fifth line with the sixth line. Sed Commands include:
$ Sed'
>/1 /{
> H
> D
>}
>/2 /{
> G
>} 'Expl.7
This process is as follows: first, sed reads the first line into the mode space, then the H command puts it into the bucket for storage, and a D command clears the content in the mode space; next, sed reads the second row into the mode space, then the G command attaches the content in the bucket to the mode space (note that a line break is added at the end of the original content of the mode space ).
The final result is as follows:
2
1
22
11
222
111
When using the H or H command, it is common to add the D command after this command, so that the SED script will not reach the end, so the content in the mode space will not be output. In addition, if you change d to N or G to G, it will not achieve the goal.
What is the most convenient case conversion for the Child Mother? It is estimated that it is tr.
$ TR "[A-Z]" "[A-Z]" File
Sed can also perform this conversion. The corresponding command is Y:
$ Sed'
>/[Address]/y/abcdefghijklmnopqrstuvwxyz/'file
However, the y command completely modifies the entire line. Therefore, it is not feasible to change the case sensitivity of only a few characters in the line. To do this, you need to use the hold and get commands mentioned above.
Cat expl.8
Find the match statement
Consult the get statement
Using the read Statement to retrieve data
$ Sed '/The. * Statement /{
> H
> S/. * The/(. */) statement. * // 1/
> Y/abcdefghijklmnopqrstuvwxyz/
> G
> S // (. */)/n/(. * The/). */(statement. */)/2/1/3/
>} 'Expl.8
The first line of the processing process to illustrate the meaning of this command:
(1) "find the match statement" is put into the bucket;
(2) Replace and change the line to match;
(3) convert the result of (2) To uppercase: match;
(4) from the bucket location (1) the reserved content is appended to the mode space. The content of the mode space is as follows:
Match/nfind the match statement
(5) Replace the content of the mode space to find the match statement.
The following example uses a solid regular expression, but it does not matter. All problems can be solved. In addition, the text used in this example is mainly related to editing and formatting. In this case, I am not doing this, so I just took out the SED script, grasped the core, and saved the details:
Example 9:
$ Cat expl.9.sed
H
S/[] [// *.] /// &/g
X
S/[// &] /// &/g
S/^/. XX //
S/$ ////
X
S/^ ///. XX/(. */) $ // ^/. XX/S ///1 //
G
S // N //
(1) h: Put the text lines into the bucket.
(2) S/[] [// *.] /// &/G: this expression is difficult. If it is expressed in a class, that is, the first character in "[]" is, then, "]" loses its special meaning. In addition, alas, "[]", only "/" has special meanings, the implication is "*",". "All are literal meanings. To make them have special meanings, you must use the"/"meaning. Although it does not appear in the expression, you should also mention it, in "[]", only "^" indicates the meaning of "not" when it appears at the first position. In other cases, it is interpreted literally, "$" only has special meanings at the end of a regular expression. "//" Removes the special meaning of "/". "&" indicates forward reference. Therefore, the second command means: "[", "]", "/", "*", and ". "Use"/[","/] "," // ","/* ", and"/. "to replace.
(3) X: swap mode space and storage space. After executing this command, the content of the mode space is the content of the original text, and the content in the bucket changes, and each special character is replaced "/&".
(4) S/[// &] /// &/G: process the mode space, "/" or "&" will be replaced with "//" or "/&".
(5) S/$ //: to understand this, add a "/" at the end of the pattern space "/".
(6) X: swap the content of two spaces again.
(7) S/^ ///. XX /(. */) $ // ^ //. XX // S // 1 //: there is no difficulty in this case, that is, the references are easy to confuse people. If you are careful and there is no problem, just skip it.
(8) G: omitted.
(9) S // N //: Delete the linefeed.
What is the use of this script? Use the following text for an experiment:
. Xx "asterisk (*) metacharacter"
The following is the result of each command. The first and second lines represent the content of the mode space and the bucket respectively:
1. xx "asterisk (*) metacharacter"
. Xx "asterisk (*) metacharacter"

2./. xx "asterisk (/*) metacharacter"
. Xx "asterisk (*) metacharacter"

3. xx "asterisk (*) metacharacter"
/. Xx "asterisk (/*) metacharacter"

4. xx "asterisk (*) metacharacter"
/. Xx "asterisk (/*) metacharacter"

5. "asterisk (*) metacharacter"
/. Xx "asterisk (/*) metacharacter"

6. "asterisk (*) metacharacter "/
/. Xx "asterisk (/*) metacharacter"

7./. xx "asterisk (/*) metacharacter"
"Asterisk (*) metacharacter "/

8./^/. XX/S/"asterisk (/*) metacharacter "/
"Asterisk (*) metacharacter "/

9./^/. XX/S/"asterisk (/*) metacharacter" // n/"asterisk (*) metacharacter "/

10./^/. XX/S/"asterisk (/*) metacharacter"/"asterisk (*) metacharacter "/

No. Actually, "s/[// &] /// &/" does not work in our example, but it is indispensable because in the second part of the S command, "/" and "&" have special meanings, so escape them in advance.
Do you understand? When you want to use a shell script to automatically generate a sed script that replaces the command, you will find that the above content is critical to the processing of special characters.
With the above applications, the bucket can even store many rows of content for future output. In fact, this function is very effective for HTML and other texts with very obvious structures. The following is an example:
Example 10
Cat expl.10
<P> my wife won't let me buy a power saw. She is afraid of
Accident if I use one.
So I rely on a hand saw for a variety of weekend projects like
Building shelves.
However, if I made my living as a carpenter, I wowould
Have to use a power
Saw. The speed and efficiency provided by power tools
Wocould be essential to being productive. </P>

<P> for people who create and modify text files,
Sed and awk are power tools for editing. </P>

<P> most of the things that you can do with these programs
Can be done interactively with a text editor. However,
Using these programs can save your hours of repetitive
Work in achieving the same result. </P>

$ Sed '/^ $ /! {
> H
> D
>}
>/^ $ /{
> X
> S/^/n/<p>/
> S/$/</P>/
> G
>} 'Expl.10
Run this command to see what the result is. In fact, the results are no longer important. Through this sub-program, we should learn the thought of process control embodied in scripts. Use the first part of the script "! "Indicates to process non-matching rows. However, because of the existence of" D ", this processing will not go through the bottom of the script, and naturally there will be no output; in the second part of the script, the script is indeed final, and the content of the mode space and storage space is cleared accordingly, preparing for reading the next section.
This example is complete, but there is another case. If the last row of the file is not empty, what will happen? Obviously, the last part of the text will not be output. How can this problem be solved? The smartest way is to "CREATE" a blank line. The new script is as follows:
$ Sed '$ {
>/^ $ /! {
> H
> S /.*//
>}
>}
>/^ $ /! {
> H
> D
>}
>/^ $ /{
> X
> S/^/n/<p>/
> S/$/</P>/
> G
>} 'Expl.10

Process control commands
To make the user "free" When writing the SED script, sed also allows ":" to be used in the script to set the mark, then, use the "B" and "T" commands to control the process. As the name implies, "B" indicates "branch", "T" indicates "test"; the former is the branch command, and the latter is the test command.
First, let's look at the various types of labels. This label is placed in the place where you want the process to start, put a single line, start with a colon. There are no spaces or tabs between the colon and the change. If there is a space at the end of the label, it is also considered as part of the label.
Let's talk about the B command. The format is as follows:
[Address] B [label]
It means that, if the address is met, the SED flow jumps with the label: If the label is specified, the script first assumes that the label is in a line below the B command, then execute the corresponding command in the row. If the label does not exist, the control process will jump directly to the end of the script. Otherwise, run the subsequent commands.
In some cases, commands B and! Commands are similar,! The command can only take effect on the content in {} next to it, while the B command gives the user sufficient freedom to select which commands should be executed in the SED script and which commands should not be executed. The following provides the classic usage of several B commands:
(1) create a cycle:
: Top
Command1
Command2
/Pattern/B top
Command3
(2) Ignore some commands that do not meet the conditions:
Command1
/Patern/B End
Command2
: End
Command3
(3) only one of the two parts of the command can be executed:
Command1
/Pattern/B dothere
Command
B
: Dothere
Command3
The format of the T command is the same as that of the B command:
[Address] T [label]
It indicates that if the address is met, the SED script will transfer the process according to the tag indicated by the T command. The tag rules are the same as those of the preceding B command. The following is an example:
S/pattern/replacement/
T break
Command
: Break
Or use case 6's sed script as an example. In fact, after careful consideration, we will find that this script is not powerful enough: what if a @ FL structure spans two lines, for example, three lines? This requires the following enhanced version of SED:
$ Cat expl.6.sed
: Begin
/@ Fl (/([^)] */)/{
S // FB/1 // fr/g
B begin
}
/@ Fl (.*/{
N
S/@ F1 (/([^)] */n [^)] */) // FB/1 // fr/g
T again
B begin
}
: Again
P
D
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.