Shell-sed instance Part 2

Source: Internet
Author: User
Tags character classes

Sed is a useful (but often forgotten) UNIX stream editor. It is an ideal tool for editing files in batches or creating shell scripts effectively to modify existing files. This article is the continuation of the previous article about sed.

Replace!
Let's take a look at one of the most useful sed commands and replace them. This command can be used to replace a specific string or matching rule expression with another string. The following is an example of the most basic usage of the command:

$ Sed-e's/foo/bar/'myfile.txt
The above command replaces 'foo' (if any) that appears for the first time in each line of myfile.txt with the string 'bar', and then outputs the file content to the standard output. Note that I am talking about the first appearance of each line, although this is usually not what you want. During string replacement, you usually want to perform global replacement. That is to say, to replace all occurrences in each row, as shown below:

$ Sed-e's/foo/bar/G' myfile.txt
The 'G' option appended after the last slash tells sed to perform global replacement.

There are several other things to understand about the's //' replacement command. First, it is a command and only a command. No address is specified in the previous example. This means that the's //' can also be used with the address to control the lines to which the command will be applied, as shown below:

$ Sed-e '1, 10 s/enchantment/entrapment/G' myfile2.txt
In the above example, the phrase 'enable' is used to replace all the appearing phrases 'enable', but only on the first to tenth rows (including the two rows.

$ Sed-e '/^ $/,/^ END/s/hills/mountains/G' myfile3.txt
In this example, 'mountains' will be used to replace 'hills'. However, it only starts from the empty line and ends with the line starting with three characters 'end' (including the two lines).

Another trick about the's //' command is that '/' separators have many replacement options. If you are executing string replacement and there are many slashes in the Rule expression or replacement string, you can change the delimiter by specifying a different character after 'S. For example, the following example replaces/usr/local with/usr:

$ Sed-e's:/usr/local:/usr: G' mylist.txt
In this example, the colon is used as the separator. If you need to specify a delimiter in the Rule expression, you can add a backslash before it.

Chaotic rule expressions
So far, we have only executed simple string replacement. Although this is convenient, we can also match the rule expression. For example, the following sed command matches a phrase that ends from '<' to '>' and contains any number of characters. In the following example, this phrase will be deleted (replaced with an empty string ):

$ Sed-e's/<. *> // G' myfile.html
This is the first good sed Script Attempt to remove HTML tags from the file, but it does not work well due to the special rules of the Rule expression. Why? When sed tries to match a rule expression in a row, it needs to find the longest match in the row. In my previous sed article, this is not a problem, because we use the 'D' and 'P' commands which always delete or print the entire line. However, the usage of the's //' command is indeed quite different, because the entire part of the Rule expression match will be replaced by the target string, or, in this example, deleted. This means that the previous example will put the following:

<B> This </B> is what <B> I </B> meant.
To:

Meant.
What we want is not this,:

This is what I meant.
Fortunately, there is a simple way to correct the problem. We do not enter a regular expression that follows the '<' character and ends with the '>' character, you only need to enter a regular expression "'<' followed by any number of non-'>' characters and ended with '>. This will match the shortest, not the longest possibility. The new command is as follows:

$ Sed-e's/<[^>] *> // G' myfile.html
In the preceding example, '[^>]' specifies the "not"> '"character, the '*' after it completes the expression to indicate "zero or multiple non-'>' characters ". Test the command on several html files, export them to "more", and view the results carefully.

More character matching
'[]' The rule expression syntax also has some additional options. To specify the character range, you can use '-' as long as the character is not in the first or last position, as shown below:

'[A-x] *'
This will match zero or more characters that are all 'A', 'B', 'C'... 'V', 'w', and 'x. In addition, you can use the '[]' character class to match spaces. The following is a complete list of available character classes:

Character class description
[] Letter number [a-z A-Z 0-9]
[] [A-z A-Z]
[] Space or tabulation key
[] Any control character
[] Number [0-9]
[] Any visual character (no space)
[] Lowercase [a-z]
[] Uncontrolled characters
[] Punctuation
[] Space
[] Capital [A-Z]
[] Hexadecimal number [0-9 a-f A-F]

It is advantageous to use character classes as much as possible because they can better adapt to non-English locale (including some necessary accent characters, etc ).

Advanced replacement
We have seen how to perform simple or even complex replacement directly, but sed can do more. In fact, some or all of the matching rule expressions can be referenced, and these parts can be used to construct a replacement string. For example, assume that you are replying to a message. The following example adds the phrase "ralph said:" in front of each line :":

$ Sed-e's/. */ralph said: &/'origmsg.txt
The output is as follows:

Ralph said: Hiya Jim, ralph said:
I sure like this sed stuff! Ralph said:
The replacement string in this example uses the '&' character, which tells sed to insert the entire matching rule expression. Therefore, you can '. * 'Any Matching content (the maximum group or whole line of zero or multiple characters in the row) is inserted to any position in the replacement string, or even multiple times. This is good, but sed is even more powerful.

Excellent parentheses with backslash
The's //' command is even better than '&'. It allows us to define regions in the Rule expression and then reference these specific regions in the replacement string. For example, assume there is a file containing the following text:

Foo bar oni eeny meeny miny larry curly moe jimmy the weasel
Now suppose you want to write a sed script, which will replace "eeny meeny miny" with "Victor eeny-meeny Von miny" and so on.
########################################
Note: you can write a sed script to implement it, which is simpler.
Example:
#! /Bin/sed-f
/Eeny meeny miny/c \
Victor eeny-meeny Von miny
Save the script and add the execution permission
After execution, you can
############################

To do this, you must first compile a rule expression that is separated by spaces and matches the three strings.

'.*.*.*'
Now, parentheses with a backslash will be inserted on both sides of each of the areas of interest to define the area:

'\(.*\)\(.*\)\(.*\)'
In addition to defining three logical regions that can be referenced in the replacement string, the rule expression works in the same way as the first rule expression. The following is the final script:

$ Sed-e's/\ (. * \)/Victor \ 1-\ 2 Von \ 3/'myfile.txt
As you can see, input '\ x' (where x is the region number starting from 1) to reference each area bounded by parentheses. Enter the following information:

Victor foo-bar Von oni Victor eeny-meeny Von miny Victor larry-curly Von moe Victor jimmy-the Von weasel
As you become more familiar with sed, you can take the minimum effort to perform quite powerful text processing. You may want to use a familiar scripting language to solve this problem-can you use a line of code to easily implement this solution?

Combined Use
When creating more complex sed scripts, you must have the ability to input multiple commands. There are several ways to do this. First, you can use semicolons between commands. For example, the following command series use the '=' command and 'P' command, and the '=' command tells sed to print the row number, the 'P' command explicitly tells sed to print the row (because it is in '-n' mode ).

$ Sed-n-e' =; P' myfile.txt
No matter when two or more commands are specified, each command is applied to each line of the file in order. In the preceding example, the '=' command is first applied to the 1st line, and then the 'p' command is applied. Next, sed continues to process 2nd rows and repeats the process. Although the semicolon is convenient, it does not work normally in some cases. Another alternative method is to use two-e options to specify two different commands:

$ Sed-n-e' = '-e' P' myfile.txt
However, when using more complex additional and insert commands, even multiple '-e' options cannot help us. For complex multi-line scripts, the best way is to put the command into a separate file. Then, use the-f option to reference the script file:

$ Sed-n-f mycommands. sed myfile.txt
This method may not be convenient, but it always works.

Multiple commands at one address
Sometimes, you may need to specify multiple commands that are applied to an address. This is especially convenient when you execute a lot of's //' to transform the words and syntax in the source file. To execute multiple commands on an address, enter the sed command in the file and use the '{}' character to group these commands, as shown below:

1, 20 {s/[Ll] inux/GNU \/Linux/g s/samba/Samba/g s/posix/POSIX/g}
In the preceding example, three replace commands are applied to rows 1st to 20th (including the two lines ). You can also use the rule expression address or a combination of the two:

1,/^ END/{s/[Ll] inux/GNU \/Linux/g s/samba/Samba/g s/posix/POSIX/g p}
In this example, all the commands between '{}' are applied to the END of the line starting from line 1 and ending with the letter "END" (if "END" is not found in the source file ", to the end of the file.

Appending, inserting, and changing rows
Since sed scripts are written in a separate file, we can use additional, insert, and change line commands. These commands insert a row after the current row, insert a row before the current row, or replace the current row in the mode space. They can also be used to insert multiple rows into the output. The command for inserting a line is as follows:

I \ This line will be inserted before each line
If you do not specify an address for this command, it will apply to each line and generate the following output:

This line will be inserted before each line 1 here
This line will be inserted before each line 2 here
This line will be inserted before each line 3 here
This line will be inserted before each line 4 here
If you want to insert multiple rows before the current row, you can add additional rows by adding a backslash after the previous row, as shown below:

I \ insert this line \ and this one \ and, uh, this one too.
The additional command is similar in usage, but it inserts one or more rows into the current row in the mode space. The usage is as follows:

A \ insert this line after each line. Thanks! :)
On the other hand, the "change line" command replaces the current line in the actual mode space. Its usage is as follows:

C \ You're history, original line! Muhahaha!
Because the additional, inserted, and modified line commands need to be input in multiple lines, they will be input to a text sed script, and then told sed to execute them by using the '-F' option. If you use other methods to pass the command to sed, the problem may occur.

Author: "Daily Yunhui"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.