Shell-sed instance part 3

Last Update:2014-06-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Robust sed in the second sed article, I provided some examples to demonstrate how sed works, but few of them can actually do something particularly useful. In the last article of this sed series, I want to change that method and use sed for practical purposes. I... Info & nbsp

Robust sed
In the second sed article, I provided some examples to demonstrate the working principle of sed, but few examples of them can actually do something particularly useful. In the last article of this sed series, I want to change that method and use sed for practical purposes. I will show you a few examples. they not only demonstrate sed's capabilities, but also do some really clever (and convenient) things. For example, in the lower half of this article, we will show you how to design a sed script to convert the. QIF file from the Quicken financial program of Intuit to a text file with a good format. Before doing so, let's take a look at the sed script that is not very complex but useful.

Text conversion
The first actual script converts UNIX-style text to DOS/Windows format. As you may know, DOS/Windows-based text files have a CR (press enter) and LF (line feed) at the end of each line, while UNIX text has only one line feed. Sometimes you may need to move some UNIX text to the Windows system. this script will perform the necessary format conversion for you.

$ Sed-e's/$/\ r/'myunix.txt> mydos.txt

In this script, the '$' rule expression matches the end of the row, and '\ r' tells sed to insert a carriage return before it. Insert a carriage return before line feed. immediately, each line ends with CR/LF. Note that '\ r' is replaced with CR only when GNU sed 3.02.80 or later is used '. If GNU sed 3.02.80 has not been installed, check the instructions in my first sed article.

I can't remember how many times I downloaded some sample scripts or C code, but I found it in DOS/Windows format. Although many programs do not care about CR/LF text files in DOS/Windows format, there are several programs that care about it-the most famous is bash. as long as you press enter, it will cause problems. The following sed calls convert DOS/Windows text to a trusted UNIX format:

$ Sed-e's/. $ // 'mydos.txt> myunix.txt

The script works very easily: the replacement rule expression matches the last character of a row, and the character is exactly the carriage return. We can replace it with an empty character to completely delete it from the output. If you use this script and notice that the last character of each line in the output has been deleted, you specify a text file that is already in UNIX format. So there is no need to do that!

Reverse row
The following is another convenient script. Like the "tac" command in most Linux distributions, this script reverses the order of row in the file. The "tac" name may be misleading, because "tac" does not reverse the character position (left and right) in the line, but rather reverses the position (top and bottom) of the row in the file ). Use "tac" to process the following files:

Foo bar oni

... The following output is generated:

Oni bar foo

You can use the following sed script for the same purpose:

$ Sed-e '1! G; h; $! D' forward.txt> backward.txt

If you log on to the FreeBSD system without the "tac" command, you will find this sed script useful. Although convenient, it is better to know why the script is doing that. Let's discuss it.

Reverse interpretation
First, the script contains three separate sed commands separated by semicolons: '1! G', 'H', and '$! D '. Now, you need to understand the addresses used for the first and third commands. If the first command is '1g ', the 'G' command applies only the first line. However, there is another '! 'Character -- this '! The 'character ignores this address, that is, the 'G' command applies to all rows except the first line. '$! D' command is similar. If the command is '$ d', only the 'D' command will be applied to the last line in the file (' $ 'address is a simple way to specify the last line ). However, '! ',' $! D' applies the 'D' command to all rows except the last row. What we need to understand is what these commands do.

When the reverse script is executed on the preceding text file, the first command is 'H '. This command tells sed to copy the content of the mode space (saving the buffer of the current row being processed) to the reserved space (temporary buffer ). Then, run the 'D' command to delete "foo" from the mode space so that it is not printed after all commands are executed on this line.

Now, the second line. After reading "bar" into the mode space, run the 'G' command to keep the content of the space ("foo \ n ") attaches to the mode space ("bar \ n"), so that the content of the mode space is "bar \ n \ foo \ n ". The 'H' command puts the content back to the reserved space for protection, and then, 'D' deletes the row from the mode space so that it is not printed.

For the last "oni" row, except that the content of the mode space is not deleted (because '$! ') And print the content of the mode space (three rows) out of the standard output. repeat the same step.

Now, use sed to perform some powerful data conversion.

Sed QIF magic
For the past few weeks, I have been wanting to buy a Quicken to settle my bank account. Quicken is a very good financial program, and of course it will be successfully completed. However, after consideration, I feel that I can easily write a software to settle my checkbook. I think, after all, I am a software developer!

I developed a good small checkbook settlement program (using awk) that calculates the balance by analyzing the syntax of text files containing all my transactions. After a slight adjustment, I improved it so that different loan and loan types could be tracked like Quicken. However, I want to add another feature. Recently, I transferred my account to a bank with a Web account interface. One day, I noticed that the bank's Web site allowed to download my account information in Quicken. QIF format. I immediately felt that it would be great to convert the information into text format.

Two formats
Before viewing the QIF format, let's take a look at my checkbook.txt format:

28 Aug 2000 food-Y Supermarket 30.94 25 Aug 2000 watr-103 Y Check 103 52.86

In my files, all fields are separated by one or more tabs, and each transaction occupies one row. The next field after the date lists the expenditure type ("-" if it is an income item "-"). The third field lists the income types ("-" for expenditure items "-"). Then, it is a check number field (if it is empty, it is still "-"), a transaction completion field ("Y" or "N "), A comment and a dollar amount field. Now let's take a look at the QIF format. When you use a text viewer to view the downloaded QIF file, it looks as follows:

! Type: Bank D08/28/2000 T-8.15 n pcheckcard supermarket ^ D08/28/2000 T-8.25 n pcheckcard punjab restaurant ^ D08/28/2000 T-17.17 N PCHECKCARD SUPERMARKET

After browsing a file, it is difficult to guess its format-ignore the first line, and the other formats are as follows:

D <数据>
T <交易量>
N <支票号>
P <描述>
^ (This is the field separator)

Start processing
Don't be discouraged when dealing with important sed projects like this-sed allows you to gradually change data to the final form. In progress, you can continue to refine the sed script until the output is exactly the same as expected. You do not need to make sure it is correct when you try it for the first time.

To start, first create a file named "qiftrans. sed" and then modify the data:

1d/^/d s/[[]/g

The first '1D 'command deletes the first line, and the second command removes the annoying' ^ 'characters from the output. The last line removes any control characters that may exist in the file. Since the external file format is being processed, I want to eliminate any risk of control characters in the middle. So far, everything went well. Now, you need to add some processing functions to the basic script:

First, add a '/^ D/' address so that sed can start processing only when the first character 'D' of the QIF data field is encountered. When sed reads such a row into its Mode space, all the commands in curly brackets are executed in order.

The first command in curly braces will put the following lines:

D08/28/2000

Convert:

08/28/2000 OUTY INNY

Of course, the current format is not perfect, but it doesn't matter. We will gradually refine the content of the mode space during the process. The final effect of the next 12 rows is to convert the data into three letter formats, and remove three slashes from the data in the last row. Finally, we get this line:

Aug 28 2000 OUTY INNY

The OUTY and INNY fields are placeholders and will be replaced later. They cannot be determined yet, because if the dollar amount is negative, OUTY and INNY will be set to "misc" and "-", but if the dollar amount is positive, change them to "-" and "inco" respectively ". Since the dollar amount has not been read, you need to use placeholders temporarily.

Details
Further details:

1d/^/d s/[[] // g/^ D /{
S/^ D \ (. * \)/\ 1 \ tOUTY \ tINNY \ t/
S/^ 01/Jan/s/^ 02/Feb/
S/^ 03/Mar/s/^ 04/Apr/
S/^ 05/May/s/^ 06/Jun/
S/^ 07/Jul/s/^ 08/Aug/
S/^ 09/Sep/s/^ 10/Oct/
S/^ 11/Nov/s/^ 12/Dec/
S: ^ \ (. * \)/\ (. * \)/\ (. * \): \ 2 \ 1 \ 3:
N
S/\ nT $. * $ \ nN $. * $ \ nP $. * $/NUM \ 2NUM \ t \ tY \ t \ 3 \ tAMT \ 1AMT/
S/NUMNUM/-/s/NUM \ ([0-9] * \) NUM/\ 1/
S/\ ([0-9] \),/\ 1 /}

The last seven rows are complicated, so we will discuss them in detail. First, use three 'n' commands consecutively. The 'N' command tells sed to read the next line into the input and then append it to the current mode space. The three 'n' commands cause the next three rows to be appended to the buffer zone of the current mode space. The current row looks as follows:

28 Aug 2000 outy inny \ nT-8.15 \ nN \ nPCHECKCARD SUPERMARKET

Sed's mode space becomes ugly-you need to remove extra new lines and execute some additional formatting. To do this, an alternative command is used. The pattern to be matched is:

'\ NT. * \ nN. * \ nP .*'

This will be followed by a new line with 'T', zero or multiple characters, a new line, 'n', any number of characters, a new line, 'P', and a new line with any number of characters. match. Ah! This rule expression will match the full content of the three rows that have just been appended to the schema space. But we need to reformat the region, instead of replacing it. The dollar amount, check number (if any), and description must appear in the replacement string. To do this, we use parentheses with backslashes to enclose the "interesting parts ", so that you can reference them in the replacement string (use '\ 1',' \ 2 \, and '\ 3' to tell sed where to insert them ). The following is the final command:

S/\ nT $. * $ \ nN $. * $ \ nP $. * $/NUM \ 2NUM \ t \ tY \ t \ 3 \ tAMT \ 1AMT/

This command transforms our line:

28 Aug 2000 outy inny numnum y checkcard supermarket AMT-8.15AMT

Although this row is getting better, it seems interesting to have a few things at first glance. The first is the stupid "NUMNUM" string-what is its purpose? If you check the last two rows of the sed script, you will find the target. in the last two rows, replace "NUMNUM" with "-", and then replace "NUM" Replace "NUM" . As you can see, using a silly flag to include a check number allows us to easily insert a "-" When this field is empty "-".

End attempt
The last line removes the comma after the number. It converts the dollar amount like "3,231.00" into the "3231.00" format I use ". Now let's take a look at the final script:

The final "QIF to text" script 1d/^/d s/[[] // g/^ D/{s/^ D $. * $/\ 1 \ tOUTY \ tINNY \ t/
S/^ 01/Jan/s/^ 02/Feb/s/^ 03/Mar/s/^ 04/Apr/s/^ 05/May/
S/^ 06/Jun/s/^ 07/Jul/s/^ 08/Aug/s/^ 09/Sep/s/^ 10/Oct/
S/^ 11/Nov/s/^ 12/Dec/s: ^ $. *$/$. *$/$. * $: \ 2 \ 1 \ 3:
N s/\ nT $. * $ \ nN $. * $ \ nP $. * $/NUM \ 2NUM \ t \ tY \ t \ 3 \ tAMT \ 1AMT/
S/NUMNUM/-/s/NUM \ ([0-9] * \) NUM/\ 1/s/\ ([0-9] \),/\ 1/
/AMT-[0-9] *. [0-9] * AMT/B fixnegs
S/AMT \ (. * \) AMT/\ 1/s/OUTY/-/s/INNY/inco/
B done: fixnegs s/AMT-\ (. * \) AMT/\ 1/s/OUTY/misc/
S/INNY/-/: done}

The additional eleven lines use substitution and some branch functions to beautify the output. First, let's take a look at this line:

/AMT-[0-9] *. [0-9] * AMT/B fixnegs

This row contains a branch command in the format of "/regexp/B label. If the pattern space matches the rule expression, sed branches to the fixnegs label. You can easily find this label, which is ": fixnegs" in the code ". If the rule expression does not match, continue to process the next command in the normal way.

Now that you understand how the command works, let's take a look at the branch. If you look at the branch rule expression, you will see it followed by '-', any number of numbers, '. '. any number matches the 'amt' string. As I'm sure you have guessed, this rule expression specifically processes the negative dollar amount. Before that, use 'ATM 'to include the dollar amount so that you can easily find it later. Because the rule expression only matches the dollar amount starting with '-', this branch only happens when the loan is processed. If the loan is being processed, you should set OUTY to 'Misc', INNY to '-', and the negative number before the loan quantity should be removed. If you follow the code process, you will see the actual situation. If no branch is executed, replace OUTY with '-' and INNY with 'inco. Finished! Now the output line is perfect:

28 Aug 2000 misc-y checkcard supermarket-8.15

Don't be confused
As you can see, as long as the problem is solved step by step, using sed to convert data is not that difficult. Do not try to use a sed command or solve all problems at once. Instead, you need to step toward the target and constantly improve the sed script until the output is as expected. Sed has many functions. I hope you are very familiar with its internal working principles and continue to work hard to learn more about it!

Author: "Daily Yunhui"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Shell-sed instance part 3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Shell-sed instance part 3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support