Sed instance, part 1th

Source: Internet
Author: User
Tags regular expression

Enter SED
It would be great if you could automate the process of editing a file so that you can edit the file in batch mode, or even write a script that makes complex changes to an existing file. Fortunately, there is a better way to do this-and this better approach is called "sed."

SED is a lightweight flow editor that is almost included on all UNIX platforms, including Linux. Sed has a lot of good features. First of all, it's pretty small and usually a lot smaller than your favorite scripting language. Second, because SED is a stream editor, it can edit data received from standard inputs such as pipelines. Therefore, you do not need to store the data you want to edit in a file on disk. Because data pipelines can be easily exported to SED, it is easy to use SED as a long and complex pipe in a powerful shell script. Try to do that with your favorite editor.

GNU SED
Fortunately for Linux users, one of the best sed versions happens to be the GNU sed, whose current version is 3.02. Each Linux distribution has (or at least should have) the GNU sed. The GNU sed is popular not only because it is free to distribute its source code, but also because it happens to have many convenient, time-saving extensions to the POSIX sed standard. In addition, GNU does not have many restrictions on the early versions of SED, such as the line length limit-the GNU can easily handle rows of any length.

The latest GNU sed
In my study of this article, I noticed that several online sed enthusiasts referred to the GNU sed 3.02a. Strangely, the SED 3.02a was not found on ftp.gnu.org(see Resources for these links), so I had to look elsewhere. I found it in the/pub/sed of alpha.gnu.org . So I was happy to download it, compile it and install it, and a few minutes later I found that the latest SED version was 3.02.80-the source code could be found next to the 3.02a source code on alpha.gnu.org . After installing the GNU sed 3.02.80, I was completely ready.

The right sed
In this series, the GNU sed 3.02.80 will be used. Some (but very few) of the most advanced examples will not be available in GNU sed 3.02 or 3.02a in the upcoming articles of the series that are coming up. If you are not using the GNU sed, the results may be different. Now why not take some time to install the GNU sed 3.02.80. That way, not only will you be ready for the rest of the series, but you can also use the SED that might be the best in the present.

sed example
SED works by performing any number of user-specified editing operations ("commands") on the input data. SED is row-based, so the commands are executed sequentially for each row. Sed then writes its results to standard output (stdout), which does not modify any input files.

Let's look at some examples. The first few will be a bit odd, because I'm going to use them to demonstrate how SED works, not to perform any useful tasks. However, if you are a beginner of sed, it is important to understand them. Here is the first example:

$ Sed-e ' d '/etc/services


If you enter this command, you will not get any output. So, what's going on. In this example, SED is invoked with an edit command ' d '. Sed opens the/etc/services file, reads a line into its mode buffer, executes the edit command ("Delete Row"), and then prints the mode buffer (the buffer is empty). It then repeats these steps for each subsequent line. This does not produce output because the "D" command removes each row in the pattern buffer.

In this example, there are a few things to note. First,/etc/services is not modified at all. This is also because SED only reads the file specified at the command line and uses it as input-it does not attempt to modify the file. The second thing to note is that SED is line-oriented. The ' d ' command is not simply telling sed to delete all input data at once. Instead, the SED reads each row of/etc/services into its internal buffer, which is called the pattern buffer. Once a row is read into the mode buffer, it executes the ' d ' command and then prints the contents of the mode buffer (in this case, nothing). I'll show you later how to use address ranges to control which rows are applied to the command-but if you don't use an address, the command applies to all rows .

The third thing to note is the use of single quotes that enclose the ' d ' command. It's a good practice to get into the habit of using single quotes to enclose the sed command, which disables the shell extension.

Another SED example
The following is an example of using SED to remove the first line of the/etc/services file from the output stream:

$ Sed-e ' 1d '/etc/services | More


As you can see, this command is very similar to the first ' d ' command in addition to the preceding ' 1 '. If you guessed that ' 1 ' refers to the first line, you guessed it. Unlike the first example where only ' d ' is used, there is an optional numeric address in front of the ' d ' used this time. By using an address, you can tell sed to edit only one or more specific lines.

Address range
Now, let's look at how to specify an address range. In this example, SED will delete line 1th to 10th of the output:

$ Sed-e ' 1,10d '/etc/services | More


When the two addresses are separated by commas, SED applies the following command to the range starting at the first address and ending at the second address. In this example, the ' d ' command is applied to line 1th to 10th (including both lines). All other rows are ignored.

Address with regular expression
A more useful example is now shown. Suppose you want to view the contents of a/etc/services file, but you are not interested in viewing the part of the comment that is included. As you know, you can place comments in the/etc/services file by a line that begins with the ' # ' character. To avoid annotations, we want SED to delete lines that begin with ' # '. The following are specific practices:

$ Sed-e '/^#/d '/etc/services | More


Try this example to see what's going on. You will notice that SED successfully completed the expected task. Now, let's analyze what happens.

To understand the '/^#/d ' command, you need to dissect it first. First, let's remove the ' d '-this is the same delete-line command we used earlier. The new addition is the '/^#/' section, which is a new rule expression address. Regular expression addresses are always surrounded by slashes. They specify a pattern , and the command immediately following the rule expression address will only apply to rows that match exactly that particular pattern.

Therefore, '/^#/' is a regular expression. But, what does it do? Obviously, it's time to review the regular expressions.

Rule Expression Review
You can use a rule expression to represent patterns that might be found in text. Have you used the ' * ' character in the shell command line? This usage is similar to the regular expression, but not the same. The following are special characters that can be used in regular expressions:

Character Describe
^ Match the beginning of the line
$ Match end of Line
. Match any one of the characters
* Matches 0 or more occurrences of the previous character
[ ] Match all characters within []

The best way to feel a regular expression might be to look at a few examples. All of these examples will be accepted by SED as a legal address, which appears on the left side of the command. Here are a few examples:

Rules
An expression
Describe
/./ Matches any line that contains at least one character
/.. / Matches any line that contains at least two characters
/^#/ will match any line starting with ' # '
/^$/ Will match all empty rows
/}^/ Matches any line that ends with '} ' (no spaces)
/} *^/ Matches any row that ends with 0 or more spaces followed by '} '
/[abc]/ Matches any row that contains lowercase ' a ', ' B ', or ' C '
/^[abc]/ Matches any line starting with ' A ', ' B ' or ' C '

In these examples, you are encouraged to try several. Take some time to familiarize yourself with the regular expressions, and then try several of the rule expressions that you created yourself. You can use RegExp as follows:

$ Sed-e '/regexp/d '/path/to/my/test/file | More


This causes the SED to delete any matching rows. However, it is better to familiarize yourself with the rule expression by telling Sed to print regexp matches and delete the mismatched content instead of the opposite method. You can do this with the following command:

$ sed-n-E '/regexp/p '/path/to/my/test/file | More


Note the new '-n ' option, which tells SED to not do this unless it explicitly requires print mode space. You will also notice that we have replaced the ' d ' command with the ' P ' command, which, as you suspect, explicitly requires SED print mode space. In this way, only the matching parts will be printed.

More about the address
So far, we've seen the line address, line range address, and RegExp address. But there are more possibilities. We can specify two regular expressions separated by commas, and SED will match all rows that match the first line of the first rule expression to the end of the line that matches the second regular expression (including the row). For example, the following command prints a block of text that starts with a line that contains "begin" and ends with a line that contains "end":

$ sed-n-E '/begin/,/end/p '/my/test/file | More


If "BEGIN" is not found, then the data will not be printed. If "BEGIN" is found, but "END" is not found in all subsequent rows, then all successive rows will be printed. This happens because of the flow-oriented nature of the SED--it does not know if "END" will occur.

C Source code Example
If you just print the main () function in the C source file, you can enter:

$ sed-n-E '/main[[:space:]]* (/,/^}/p ' sourcefile.c | More


The command has two regular expressions of '/main[[:space:]]* (/' and '/^}/', and a command ' P '. The first rule expression will match the string "main" followed by any number of spaces or tab keys and the opening parenthesis. This should match the beginning of the general ANSI C Main () declaration.

In this particular rule expression, a ' [: space:] ' character class appears. This is just a special keyword that tells sed to match TAB or space. If you want, you can not enter ' [[: Space:] ', and enter ' [', then the space letter, then '-V ', then enter the tab letter and '] '--control-v tells Bash to insert the "real" tab key instead of executing the command extension. Use the ' [: Space:] ' command class (especially in scripts) to be clearer.

OK, now take a look at the second regexp. '/^} ' will match any '} ' character appearing at the beginning of the new line. If the code is well-formed, it will match the closing curly brace of the main () function. If the format is not good, it does not match correctly-this is a tricky task to perform pattern matching tasks.

Because it is in the '-N ' Quiet Way, the ' P ' command still completes its habitual task of explicitly telling Sed to print the line. Try running the command against the C source file-It should output the entire main () {} block, including the start of "main ()" and the End of '} '.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.