How to solve various problems in the series of Linux sed commands

Source: Internet
Author: User
Tags expression engine

How to solve various problems in the series of Linux sed commands

Directory:
1. variable replacement in sed
2. Reverse reference failure
3 "-I" option file storage problems
4. Greedy matching
5 dispute between sed commands "a" and "N"

1. variable replacement in sed

When using sed in a script, you may need to reference the shell variable in sed, or even want to replace the variable in sed command line. Many people may have encountered this problem, but the quotation marks cannot be properly debugged. In fact, this is not a sed issue, but a shell feature. Understanding how sed solves the problem of quotation marks can be of great help to understand the shell quotation marks. It is similar to this. In the future, we will not be confused when using tools such as awk and mysql with built-in Syntax Parsing.

For example, we want to output the last five rows of a.txt. The following command line may be written easily:

Total = 'wc-l <a.txt'
Sed-n' $ (total-4), $ P' a.txt

But unfortunately, this will report an error. On the one hand, "$" is a special symbol in sed. when placed in a fixed expression, it indicates the mark of the last row of the input stream. While$(())The "$" symbol also appears, which causes sed to parse the symbol. On the other hand,$(())This part uses shell computing instead of sed computing. Therefore, you must expose it to the shell so that the shell can parse it.

In shell, single quotation marks, double quotation marks, and no quotation marks are added.

  • Single quotes: All characters in single quotes are converted into literal characters. However, note: you cannot use single quotes in single quotes, even if you use backslash escape.
  • Double quotation marks: All characters in double quotation marks are converted to literal characters, except "\", "$", and "'" (reverse quotation marks). If "! "When a historical command is referenced, the exclamation point is also excluded.
  • Without quotation marks: It is almost equivalent to double quotation marks, but braces and tildes are extended.

The double quotation marks described above are not really complete, but they are sufficient. These are just their literal meanings. The true meaning of quotation marks is: Determine which "Words" in the command line need to be parsed by shell, and determine which words are not parsed by shell.

Obviously, all characters in single quotes become literal characters, and shell does not parse any words in them. For example, variables in single quotes are no longer parsed, commands are replaced, arithmetic operations are no longer executed, and path extensions are not performed. In short, the characters in single quotes are all common characters. If some characters need to be resolved by the command that comes with the parsing function, single quotes must be used. For example, "$ ","! Both "and" {} "have special meanings in sed. To enable sed to parse them, you must use single quotation marks. Otherwise, errors or ambiguities are required. For example, all the symbols in the following three sed statements must use single quotation marks to get the correct result.

Sed '$ d' filename
Sed '1! D 'filename
Sed-n'2 {p; q} 'filename

To resolve a special character by shell, you must not enclose it in single quotes. You can use double quotation marks or do not add any quotation marks, even if you do not add any quotation marks, it may look weird. For example, the arithmetic operation above$(())It is intended to be parsed by shell, so it must be exposed to shell using single quotes or without quotation marks. The correct statement is:

Sed-n $ (total-4) ', $ P' a.txt
Sed-n "$ (total-4)" ', $ P' a.txt
Sed-n "$ (total-4), \ $ p" a.txt

From the naked eye, the quotation marks of this statement are really weird. Shell, however ugly and beautiful, is dead. It has its own set of rules when dividing command lines and how to divide rules.

Therefore, we can draw a set of conclusions on how sed interacts with shell:

  • No quotation marks or double quotation marks are required for shell resolution;
  • In case of special characters common to shell and executed commands, to be parsed by sed, single quotation marks must be added, or double quotation marks must be added with backslashes to escape;
  • Irrelevant characters, No matter what quotation marks.

Therefore, use the command replacement method to let sed output the last five rows of statements as follows:

Sed-n 'expr $ (wc-l <a.txt)-4 '', $ P' a.txt
In the preceding statement, 'expr $ (wc-l <a.txt)-4' must be parsed by shell, so it must not be enclosed by single quotes. To resolve the $ p part to the last line by sed, you must use single quotation marks to avoid shell resolution.

More complex: Replace the variable in the sed regular expression. For example, output the line starting with the variable str string in a.txt to the last line.

Str = "abc"
Sed-n/^ $ str/', $ P' a.txt
Because no quotation marks are used, $ str can be replaced by "abc" by shell as scheduled ". There are many other ways to write this command:

Sed-n'/^ '$ str'/, $ P' a.txt
Sed-n "/^ $ str" '/, $ P' a.txt
Sed-n "/^ $ str/, \ $ p" a.txt
Sed-n "/^ $ str/," '$' p a.txt
It is a little difficult to use the sed symbol. Replace the password of the last line in/etc/shadow with "$1 $123456 $ wOSEtcyiP2N/IfIl15W6Z0 ".

[Root @ xuexi ~] # Tail-n 1/etc/shadow
UserX: $6 $ hS4yqJu7WQfGlk0M $ Xj/logs./dxjn0zadaxqum1_cuwvryzuu6npplwoyv8expa.: 0: 99999: 7 :::
The replacement statement is as follows:

Old_pass = "$ (tail-n 1/etc/shadow | cut-d': '-f2 )"
New_pass = '$1 $123456 $ wOSEtcyiP2N/IfIl15W6Z0'
Sed-n' $'s % $ old_pass % $ new_pass %/etc/shadow

Because old_pass and old_pass contain the "/" and "$" symbols, the separator of the "s" command is replaced by "%. Take a closer look at new_pass, which contains the "." symbol, which is a metacharacter of the regular expression. Therefore, it can match other conditions.

2. Reverse reference failure

When the option "|" is selected for the regular expression, if the content in the grouping parentheses () is not involved in the match, the backward reference does not work. For example(a)\1u|b\1Only the row of "aau" is matched, but the row of "ba" is not matched.\1The Group is not involved in the matching, so\1Invalid, but\1Valid.

This is a regular expression matching problem, not just sed. Other tools that use basic regular expressions and expand the Regular Expression Engine also have this problem.

In addition, when reverse reference is used in the s command, the group outside the "s" command is not referenced. For example:

Echo "ab3456cd" | sed-r "/(AB)/s/([0-9] +)/\ 1 /"

The result is ab3456cd instead of ababcd.\2The system reports the error "invalid reference \ 2 on's 'COMMAND's RHS ".

3. "-I" option file storage problems

Sed creates a temporary file, writes the output to the temporary file, and renames the temporary file as the source file to save the file. Therefore, sed ignores the read-only nature of the file.

Whether to allow renaming, moving, or deleting a file is controlled by the permission of the directory where the file is located. If the directory is read-only, sed cannot use the "-I" option to save the results, even if the file has the read permission.

4. Greedy matching

The so-called greedy match refers to the longest one when the regular expression can match multiple contents. The simplest example is to specify the data "abcdsbaz", regular expression ". * B "can match" AB "and" abcdsb "in the Data. Because of greedy match, it will take the longest" abcdsb ".

Echo "abcdbaz" | grep-o "a. * B"
Abcdb

One of the shortcomings of basic regular expressions and extended regular expressions is that they cannot overcome greedy matching in the original ecology. Regular Expressions like Perl regular expressions or other programming languages are fully implemented, add "? "It can be clearly indicated that the pattern of" .*? B ".

Echo "abcdbaz" | grep-P-o ".*? B"
AB

To overcome the greedy match of basic or extended regular expressions, you can only use the non-contained symbol "[^]" in "opportunistic" ways. For example:

Echo "abcdbaz" | grep-o "a [^ B] * B"
AB

This opportunistic method has poor performance, because the engine that uses the basic or extended regular expression always matches the longest content first and then matches back. This is called "backtracking ". For example, when "abcdsbaz" is matched by "a [^ B] * B", "abcdsb" is matched first, and a character is used to return the matching, it is the shortest result to roll back to the first "B.

For example, the format of each row of data in the/etc/passwd file is as follows:

Rootx: 0: 0: root:/bin/bash

Use sed to ask each user in/etc/passwd. The output format is "hello root" and "hello nobody ".

First, retrieve the first column in the file, that is, the user name. However, because all rows in this file use colons to separate fields, to use a regular expression to match the first segment, you must overcome greedy match. The statement is as follows:

Sed-r's/^ ([^:] *):. */hello \ 1/'/etc/passwd

Note that sed uses the basic Regular Expression and extended Regular Expression Engine. To overcome greedy matching, it must first match the longest and then trace back the shortest.

What if I want to obtain the first two fields in/etc/passwd? You only need to repeat the greedy regular expression as a whole.

Sed-r's/^ ([^:] *) :( [^:] *):. */hello \ 1 \ 2/'/etc/passwd

Take the third field?

Sed-r's/^ ([^:] * :) {2} ([^:] *):. */hello \ 2/'/etc/passwd

Take the third and fifth fields? No way. You can only explicitly mark the fourth field.

Sed-r's/^ ([^:] * :) {2} ([^:] *) :( [^:] *) :( [^:] *): /hello \ 2 \ 4/'/etc/passwd

Take the third 5th field? It is simpler to repeat three times.

Sed-r's/^ ([^:] * :) {2} ([^:] * :) {3 }). */hello \ 2/'/etc/passwd

In such a result, fields 3rd to 5th must contain the ":" separator. Do you want to remove it? Wash and sleep! Sed is not good at processing fields. Overcoming greedy matching makes expressions complicated and difficult to read, and the efficiency is not high. Using it to process fields is definitely supported.

5. Dispute between sed commands "a" and "N"

Sed's "a" command is used to queue the provided text data in the memory, and then append the data to the end of the output stream when the content in the mode space is output.

For example, insert a data row "matched successful" after matching the row "ccc ".

Echo-e "aaa \ nbbb \ nccc \ nddd" | sed '/ccc/a matched successful'
Aaa
Bbb
Ccc
Matched successful
Ddd

When I used the "a" command, it went very smoothly and there was no problem. But try it with "N?

Echo-e "aaa \ nbbb \ nccc \ nddd" | sed '/ccc/{\
Matched successful
; N }'

Aaa
Bbb
Matched successful
Ccc
Ddd

Isn't it appended at the end? How do I run the matching row before? Even if "N" reads the next line, should it be appended to the next line of "ddd? To understand this problem, you must be familiar with the output Mechanism of the sed mode space. For more information, see the Introduction to the series of Linux sed commands. The output Mechanism of the "N" command is briefly described here.

Whether sed automatically reads the next line, or the "n" or "N" command reads the next line, as long as there is a read action, the content of the mode space will be output before it. When "N" reads the next row, it first determines whether there is another row available for reading. If yes, it first locks the mode space and then automatically outputs and clears the mode space, unlock the mode space and append a linefeed "\ n" to the end of it. Finally, read the next line and append it to the end of the linefeed. Because the mode space is locked, the output stream is empty during Automatic output, and the mode space cannot be cleared. Note: It does not prohibit output. Although the result of the empty output stream is the same as that of the disabled output stream, the empty output stream has an output action and an output stream, which writes the output to the standard output, if the output is not allowed, no action is output. If no row is available for reading, the system automatically outputs the mode space, clears the mode space, and exits the sed program. The process is roughly described as follows:

If ["$ line"-ne "$ last_line_num"]; then
Lock pattern_space;
Auto_print;
Remove_pattern_space;
Unlock pattern_space;
Append "\ n" to pattern_space;
Read next_line to pattern_space;
Else
Auto_print;
Remove_pattern_space;
Exit;
Fi

Return to the question of combining the "a" command with the "N" command. The reason why the queuing Text of the "a" command is inserted before the matching line is that the problem lies in the empty output stream. "N" has an output action when it is preparing to read the next row, even if the output result is empty. The "a" command is always waiting for the sed output stream. As long as there is an output stream, it will immediately catch up with the append of the output stream. Therefore, "matched successful" will be appended to the end of the empty stream. After the append, "N" will be read into the next line, and the content in the output mode space will be "ccc \ nddd ", then we can get the result of "unexpected.

This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151220.htm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.