Linux sed command performs text substitution on files

Source: Internet
Author: User
Tags character classes posix

Let's take a look at one of the most useful commands for SED, replacing the command. You can use this command to replace a specific string or a matching rule expression with another string. The following is an example of the most basic usage of the command:

$ Sed-e ' s/foo/bar/' myfile.txt

The above command replaces the first occurrence of ' foo ' (if any) in myfile.txt with the string ' Bar ', and then outputs the file contents to standard output. Note that I'm talking about the first occurrence of each line, although this is usually not what you want. When you do a string substitution, you typically want to perform a global substitution. That is, to replace all occurrences in each row, as follows:

$ Sed-e ' s/foo/bar/g ' myfile.txt

The ' G ' option appended after the last slash tells SED to perform a global substitution.

There are a few other things to know about the ' s///' replacement command. First, it's a command, and it's just a command, and there's no address specified in all of the previous examples. This means that ' s///' can also be used with the address to control which rows to apply the command to, as follows:

$ Sed-e ' 1,10s/enchantment/entrapment/g ' myfile2.txt

The previous example would result in replacing all occurrences of the phrase ' enchantment ' with the phrase ' entrapment ', but only on the first and tenth lines (including these two lines).

$ Sed-e '/^$/,/^end/s/hills/mountains/g ' myfile3.txt

The example will replace ' Hills ' with ' mountains ', but only from the empty line, to the end of the line beginning with the three-character ' end ' (including these two lines) on the block of text.

Another beauty about the ' s///' command is that the '/' delimiter has many substitution options. If you are performing a string substitution, and there are many slashes in the rule expression or substitution string, you can change the delimiter by specifying a different character after the ' s ' character. For example, the following example replaces all occurrences of/usr/local with/usr:

$ Sed-e ' s:/usr/local:/usr:g ' mylist.txt

In this example, a colon is used as the delimiter. If you do not specify a delimiter, it becomes the following:

$ Sed-e ' S/USR/LOCAL/USRG ' mylist.txt

So you can't do it.
If you need to specify a delimiter character in a rule expression, you can precede it with a backslash.

Rule expression Confusion
So far, we've only performed a simple string substitution. Although this is convenient, we can also match regular expressions. For example, the following SED command matches a phrase that starts with ' < ', ends with ' > ', and contains any number of characters in it. The following example deletes the phrase (replaced with an empty string):

$ Sed-e ' s/<.*>//g ' myfile.html

This is the first good SED script attempt to remove HTML markup from a file, but it does not work well because of the rules that are unique to the rule expression. Why? When an SED tries to match a regular expression in a row, it looks for the longest match in the row. In my previous SED article, this was not a problem because we used the ' d ' and ' P ' commands, which always removed or printed the entire line. However, when using the ' s///' command, there is really a big difference, because the entire part of the rule expression match is replaced by the target string, or, in this case, it is deleted. This means that the above example will put the following line:

This is the what I meant.

Become:
Meant.
This is not what we want, but:
This is the what I meant.

Fortunately, there is an easy way to correct the problem. We do not enter the "' < ' character followed by some characters and end with the ' > ' character of the regular expression,
Instead, simply enter a "' < ' character followed by any number of non-' > ' characters and End with the ' > ' character" expression. This will match the shortest, not the longest possibility. The new command is as follows:

$ Sed-e ' s/<[^>]*>//g ' myfile.html

In the example above, ' [^>] ' specifies the ' not ' > ' character, followed by ' * ' to complete the expression to represent "0 or more non-' > ' characters". Test the command on several HTML files, export them to "more", and then examine the results carefully.

More character matches
The ' [] ' rule expression syntax also has some additional options. To specify a range of characters, you can use '-' as long as the character is not in the first or last position, as follows:

' [a-x]* '
This will match 0 or more characters that are all ' a ', ' B ', ' C ' ... ' V ', ' w ', ' X '.

It is advantageous to use character classes as much as possible, as they can be better adapted to non-English locales (including some required accent characters, etc.).

Advanced Replacement Functionality
We've seen how to perform simple and even complex direct replacements, but SED can do more. You can actually refer to some or all of the matching rule expressions and use those parts to construct the replacement string. As an example, suppose you are replying to a message. The following example adds the phrase "Ralph said:" Before each line:

$ Sed-e ' S/.*/ralph said: &/' Origmsg.txt

The output is as follows:

Ralph Said:hiya Jim, Ralph Said:ralph said:
I sure like this sed stuff! Ralph said:

The substitution string for this example uses the ' & ' character, which tells SED to insert the entire matching rule expression. Therefore, any content that matches '. * ' can be inserted anywhere in the replacement string, or even more than once, by inserting the largest group or whole row of 0 or more characters in the row. This is very good, but the SED is even more powerful.

Those very good round brackets with backslashes
The ' s///' command is even better than ' & ', which allows us to define the area in the regular expression and then reference those specific areas in the replacement string. As an example, suppose you have a file that contains the following text:

Foo Bar oni Eeny Meeny miny Larry Curly Moe Jimmy the Weasel

Now suppose you want to write an SED script that will replace "Eeny Meeny Miny" with "Victor Eeny-meeny Von miny" and so on. To do this, you first write a rule expression separated by a space and matched with three strings.

‘.* .* .*’

Now you'll define the area by inserting parentheses around each of the areas of interest with backslashes:

‘\(.*\) \(.*\) \(.*\)’

In addition to defining three logical regions that can be referenced in a replacement string, the rule expression works the same as the first rule expression. Here is the final script:

$ Sed-e ' s/\ (. *\) \ (. *\) \ (. *\)/victor \1-\2 Von \3/' myfile.txt

As you can see, each area bounded by parentheses is referenced by entering ' \x ' (where x is the area number starting with 1). Enter as follows:

Victor Foo-bar von Oni Victor Eeny-meeny von Miny Victor Larry-curly von Moe Victor Jimmy-the von Weasel

As you become more familiar with SED, you can take the minimum effort to do fairly powerful text processing. You might want to use a familiar scripting language to handle this problem-can you easily implement such a solution in one line of code?

Combined use
When you start to create more complex sed scripts, you need to have the ability to enter multiple commands. There are several ways to do this. First, you can use semicolons between commands. For example, the following command series uses the ' = ' command and the ' P ' command, the ' = ' command tells SED to print the line number, and the ' P ' command explicitly tells SED to print the line (because it is in '-n ' mode).

$ sed-n-E ' =;p ' myfile.txt

Whenever two or more commands are specified, each command is applied sequentially to each row of the file. In the example above, first apply the ' = ' command to line 1th and then apply the ' P ' command. The SED then continues to process line 2nd and repeats the process. Although the semicolon is convenient, in some cases it does not work properly. Another alternative is to use the two-e option to specify two different commands:

$ sed-n-E ' = '-e ' P ' myfile.txt

However, even multiple '-e ' options are not helpful when using more complex append and insert commands. For complex multi-line scripts, the best approach is to put the commands in a separate file. Then, reference the script file with the-F option:

$ sed-n-F mycommands.sed myfile.txt

This method may not be convenient, but it always works.

Multiple commands for an address
Sometimes, you may want to specify multiple commands that apply to an address. This is especially handy when performing many ' s///' to transform the word and grammar in the source file. To execute multiple commands on an address, enter the SED command in the file, and then use the ' {} ' character to group the commands as follows:

1,20{s/[ll]inux/gnu\/linux/g s/samba/samba/g s/posix/posix/g}

The previous example applies the three substitution commands to lines 1th through 20th, including the two lines. You can also use a rule expression address or a combination of both:

1,/^end/{s/[ll]inux/gnu\/linux/g s/samba/samba/g s/posix/posix/g p}

This example applies all the commands between ' {} ' to the line starting at line 1th and ending with the letter "End" (if "end" is not found in the source file, the end of the file).

Attach, insert, and change rows
Now that we have written the SED script in a separate file, we can take advantage of the Append, insert, and change row commands. These commands insert a row after the current row, insert a row before the current row, or replace the current row in the pattern space. They can also be used to insert multirow into the output. The Insert Line command uses the following:

I\ this line'll be inserted before all line

If you do not specify an address for the command, it is applied to each row and produces the following output:

This line would be a inserted before each line line 2 here
This line would be a inserted before each line line 3 here
This line would be a inserted before each line line 4 here
This line would be a inserted before each line line 1 here

If you want to insert more than one row before the current row, you can add additional rows by appending a backslash to the previous line, as follows:

I\ Insert this line\ and this one\ and this one\ and, uh, this one too.

The attach command is similar in usage, but it inserts one or more rows into the pattern space after the current row. Its usage is as follows:

A\ insert this on each line. thanks! :)

On the other hand, the change row command actually replaces the current row in the pattern space with the following usage:

C\ you ' re, original line! muhahaha!

Because attaching, inserting, and changing the row commands require multiple lines of input, they are entered into a text sed script, and then the SED is told to execute them by using the '-f ' option. There are problems with using other methods to pass commands to sed.

Posted: 2008-05-20, edited on: 2008-05-20 09:11, viewed 5,707 times

From http://blogold.chinaunix.net/u2/68904/showart_695390.html

Linux sed command performs text substitution on files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.