Sed instance, part 1

Source: Internet
Author: User
Tags character classes control characters ibm developerworks
 
General thread-sed instance, part 1

Reposted from: IBM developerworks Chinese website


Sed is a very powerful and compact text stream editor. In the second article of this series, Daniel Robbins demonstrates how to use sed to perform string replacement, create a larger sed script, and how to use sed to append, insert, and change line commands.

Sed is a useful (but often forgotten) UNIX stream editor. It is an ideal tool for editing files in batches or creating shell scripts effectively to modify existing files. This article is the continuation of the previous article about sed.

Replace!
Let's take a look at one of the most useful sed commands and replace them. This command can be used to replace a specific string or matching rule expression with another string. The following is an example of the most basic usage of the command:

 $ sed -e 's/foo/bar/' myfile.txt 

The above command replaces 'foo' (if any) that appears for the first time in each line of myfile.txt with the string 'bar', and then outputs the file content to the standard output. Note that I am talking aboutThe first appearance of each lineAlthough this is not what you usually want. During string replacement, you usually want to perform global replacement. That is, to replaceAllAppears as follows:

$ sed -e 's/foo/bar/g' myfile.txt 

The 'G' option appended after the last slash tells sed to perform global replacement.

There are several other things to understand about the 'S //' replacement command. First, it is a command and only a command. No address is specified in the previous example. This means that the 'S //' can also be used with the address to control the lines to which the command will be applied, as shown below:

 $ sed -e '1,10s/enchantment/entrapment/g' myfile2.txt 

In the above example, the phrase 'enable' is used to replace all the appearing phrases 'enable', but only on the first to tenth rows (including the two rows.

 $ sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt 

In this example, 'mountains' will be used to replace 'hills'. However, it only starts from the empty line and ends with the line starting with three characters 'end' (including the two lines).

Another trick about the 'S //' command is that '/' separators have many replacement options. If you are executing string replacement and there are many slashes in the Rule expression or replacement string, you can change the delimiter by specifying a different character after 'S. For example, the following example replaces/usr/local with/USR:

 $ sed -e 's:/usr/local:/usr:g' mylist.txt 

In this example, the colon is used as the separator. If you need to specify a delimiter in the Rule expression, you can add a backslash before it.

Chaotic rule expressions
So far, we have only executed simple string replacement. Although this is convenient, we can also match the rule expression. For example, the following sed command matches a phrase that ends from '<' to '>' and contains any number of characters. In the following example, this phrase will be deleted (replaced with an empty string ):

 $ sed -e 's/<.*>//g' myfile.html 

This is the first good sed Script Attempt to remove HTML tags from the file, but it does not work well due to the special rules of the Rule expression. Why? When sed tries to match a rule expression in a row, it must searchLongest. In my

In the previous sed article, this is not a problem, because we use the 'D' and 'P' commands which always delete or print the entire line. However, the usage of the 'S //' command is indeed quite different, because the entire part of the Rule expression match will be replaced by the target string, or, in this example, deleted. This means that the previous example will put the following:

 <b>This</b> is what <b>I</b> meant. 

To:

 meant. 

What we want is not this,:

 This is what I meant. 

Fortunately, there is a simple way to correct the problem. We do not enter a regular expression that follows the '<' character and ends with the '>' character, you only need to enter a regular expression "'<' followed by any number of non-'>' characters and ended with '>. This will match the shortest, not the longest possibility. The new command is as follows:

 $ sed -e 's/<[^>]*>//g' myfile.html 

In the preceding example, '[^>]' specifies the "not"> '"character, the '*' after it completes the expression to indicate "zero or multiple non-'>' characters ". Test the command on several HTML files, export them to "more", and view the results carefully.

More character matching
'[]' The rule expression syntax also has some additional options. To specify the character range, you can use '-' as long as the character is not in the first or last position, as shown below:

 '[a-x]*' 

This will match zero or more characters that are all 'A', 'B', 'C'... 'V', 'w', and 'x. In addition, you can use the '[: Space:]' character class to match spaces. The following is a complete list of available character classes:


Character class Description
[: Alnum:] Letter and number [A-Z A-Z 0-9]
[: Alpha:] Letter [A-Z A-Z]
[: Blank:] Space or tabulation key
[: Cntrl:] Any control character
[: Digit:] Number [0-9]
[: Graph:] Any visual character (no space)
[: Lower:] Lowercase [A-Z]
[: Print:] Non-control characters
[: Punct:] Punctuation
[: Space:] Space
[: Upper:] Capital [A-Z]
[: Xdigit:] Hexadecimal number [0-9 A-F A-F]

It is advantageous to use character classes as much as possible because they can better adapt to non-English locale (including some necessary accent characters, etc ).

Advanced replacement
We have seen how to perform simple or even complex replacement directly, but sed can do more. In fact, some or all of the matching rule expressions can be referenced, and these parts can be used to construct a replacement string. For example, assume that you are replying to a message. The following example adds the phrase "Ralph said:" in front of each line :":

 $ sed -e 's/.*/ralph said: &/' origmsg.txt 

The output is as follows:

 ralph said: Hiya Jim, ralph said: ralph said: 
I sure like this sed stuff! ralph said:

The replacement string in this example uses the '&' character, which tells sed to insert the entire matching rule expression. Therefore, you can '. * 'Any Matching content (the maximum group or whole line of zero or multiple characters in the row) is inserted to any position in the replacement string, or even multiple times. This is good, but SED is even more powerful.

Excellent parentheses with backslash
The 'S //' command is even better than '&', which allows us to defineRegionThen, you can reference these specific regions in the replacement string. For example, assume there is a file containing the following text:

 foo bar oni eeny meeny miny larry curly moe jimmy the weasel 

Now suppose you want to write a sed script, which will replace "eeny meeny miny" with "Victor eeny-meeny von miny" and so on. To do this, you must first compile a rule expression that is separated by spaces and matches the three strings.

 '.* .* .*' 

Now, parentheses with a backslash will be inserted on both sides of each of the areas of interest to define the area:

 '/(.*/) /(.*/) /(.*/)' 

In addition to defining three logical regions that can be referenced in the replacement string, the rule expression works in the same way as the first rule expression. The following is the final script:

 $ sed -e 's//(.*/) /(.*/) /(.*/)/Victor /1-/2 Von /3/' myfile.txt 

As you can see, input '/X' (where X is the region number starting from 1) to reference each area bounded by parentheses. Enter the following information:

 Victor foo-bar Von oni Victor eeny-meeny Von miny Victor larry-curly Von moe Victor jimmy-the Von weasel 

As you become more familiar with SED, you can take the minimum effort to perform quite powerful text processing. You may want to use a familiar scripting language to solve this problem-can you use a line of code to easily implement this solution?

Combined Use
When creating more complex sed scripts, you must have the ability to input multiple commands. There are several ways to do this. First, you can use semicolons between commands. For example, the following command series use the '=' command and 'P' command, and the '=' command tells sed to print the row number, the 'P' command explicitly tells sed to print the row (because it is in '-n' mode ).

 $ sed -n -e '=;p' myfile.txt 

No matter when two or more commands are specified, each command is applied to each line of the file in order. In the preceding example, the '=' command is first applied to the 1st line, and then the 'p' command is applied. Next, sed continues to process 2nd rows and repeats the process. Although the semicolon is convenient, it does not work normally in some cases. Another alternative method is to use two-e options to specify two different commands:

 $ sed -n -e '=' -e 'p' myfile.txt 

However, when using more complex additional and insert commands, even multiple '-e' options cannot help us. For complex multi-line scripts, the best way is to put the command into a separate file. Then, use the-F option to reference the script file:

 $ sed -n -f mycommands.sed myfile.txt 

This method may not be convenient, but it always works.

Multiple commands at one address
Sometimes, you may need to specify multiple commands that are applied to an address. This is especially convenient when you execute a lot of 'S //' to transform the words and syntax in the source file. To execute multiple commands on an address, enter the SED command in the file and use the '{}' character to group these commands, as shown below:

 1,20{ s/[Ll]inux/GNU//Linux/g s/samba/Samba/g s/posix/POSIX/g } 

In the preceding example, three replace commands are applied to rows 1st to 20th (including the two lines ). You can also use the rule expression address or a combination of the two:

 1,/^END/{         s/[Ll]inux/GNU//Linux/g         s/samba/Samba/g         s/posix/POSIX/g p } 

In this example, all the commands between '{}' are applied to the end of the line starting from line 1 and ending with the letter "end" (if "end" is not found in the source file ", to the end of the file.

Appending, inserting, and changing rows
Since sed scripts are written in a separate file, we can use additional, insert, and change line commands. These commands insert a row after the current row, insert a row before the current row, or replace the current row in the mode space. They can also be used to insert multiple rows into the output. The command for inserting a line is as follows:

 i/ This line will be inserted before each line 

If you do not specify an address for this command, it will apply to each line and generate the following output:

 This line will be inserted before each line line 1 here 
This line will be inserted before each line line 2 here
This line will be inserted before each line line 3 here
This line will be inserted before each line line 4 here

If you want to insert multiple rows before the current row, you can add additional rows by adding a backslash after the previous row, as shown below:

 i/ insert this line/ and this one/ and this one/ and, uh, this one too. 

The additional command is similar in usage, but it inserts one or more rows into the current row in the mode space. The usage is as follows:

 a/ insert this line after each line.  Thanks! :) 

On the other hand, the "change line" command will actuallyReplaceThe current row in the mode space. Its usage is as follows:

 c/ You're history, original line! Muhahaha! 

Because the additional, inserted, and modified line commands need to be input in multiple lines, they will be input to a text sed script, and then told sed to execute them by using the '-F' option. If you use other methods to pass the command to SED, the problem may occur.

Next article
In the next article and the last article in this sed series, I will show you many excellent instances that use sed to complete different types of tasks. I will not only show you what the script does, but also showWhy?Do that. After that, you will learn more about how to use sed in different projects. See you later!

References

  • ReadDeveloperworksAnother article on Daniel's sed: general thread: sed instance, part 1 and part 2.
  • View Eric pement's excellent sed FAQ.
  • You can find sed 3.02 resources at ftp.gnu.org.
  • The new SED 3.02.80 will be found at alpha.gnu.org.
  • In addition, Eric pement has some convenient sed single-line programs. Any aspiring sed experts should take a look.
  • For older books to be favored, o'reilly's sed & awk, 2nd edition will be an excellent choice.
  • You may want to read the 7th edition UNIX's sed man page (about 1978 !).
  • Read the short sed tutorial of Felix von Leitner.
  • Review using regular expressions to find and modify the mode in this free DW expressions tutorial.

About the author
Daniel Robbins lives in Albuquerque, New Mexico. He is the President and CEO of Gentoo Technologies, Inc,Gentoo Linux(Advanced Linux for PC) andPortageFounder of the system (next-generation Linux port system. He is still a MacMillan book.Caldera OpenLinux unleashed,SuSE Linux unleashedAndSamba unleashed. Since the second year of elementary school, Daniel had a close relationship with computers. At that time, he first came into contact with the logo programming language and became addicted to the Pac-Man game. This may be because he is still servingSony electronic publishing/psygnosisThe reason for this is. Daniel likes spending time with his wife Mary and new daughter hadassia. Contact Daniel Robbins via drobbins@gentoo.org.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.