Sed (stream editor)
Function Description: Uses scripts to process text files.
Syntax: sed [-hnV] [-e <script>] [-f <script File>] [text file]
Note: sed can process and edit text files according to script instructions.
Parameters:
-E <script> or -- expression = <script> processes input text files with the script specified in the option.
-F <script file> or -- file = <script file> processes input text files with the script file specified in the option.
-H or -- help displays help.
-N or -- quiet or -- silent only displays the results after script processing.
-V or -- version displays version information.
Sed Working principle:
Sed is a non-interactive stream editor. The so-called non-interactive mode means that sed can only input and edit commands under the command line to edit the text, and then view the output on the screen. The so-called stream editor, sed reads only one row from a file (or input) at a time and then processes the specified row, and output the result to the screen (unless the screen output is canceled and the print command is not explicitly used), and then read the next line. The entire file is processed row by row like a streamline and then output row by row.
Next, let's take a look at sed's working process. Sed does not directly process the original input, but puts the read row in the buffer to process the content in the buffer, after processing, it will not write back the original file (unless the shell output redirection is used to save the result), but directly output to the screen. Sed maintains two buffers during its operation. One is the active pattern space, and the other is the auxiliary "holding space )". Generally, when sed is run, sed first loads the first row into the mode space, and then outputs the data to the screen, then Replace the second row with the original content in the mode space, and then process the content, and so on.
Generally, the temporary buffer cannot be used, but some special commands can exchange data between the mode space and the temporary buffer. Since sed performs all operations on the text in the buffer, it will not cause any damage to the original file.
Sed Command Format
The sed command format is as follows:
Sed [-Options] ['commands'] filename
Command is a sed Command. The sed Command must be included in a pair of single quotes to avoid shell interpretation. The format is as follows:
[Address-range] [sed-command] or
[Pattern-to-match] [sed-command]
Address-range refers to the range of rows to be processed, also known as the address range. pattern-to-match is a pattern to be matched and a regular expression, sed-command is a sed command used to process specified rows. The following is a simple example:
Sed-n'1, 3 p 'students
This command prints lines 1st to 3 from the students file to the screen. Note that there is no space between the address range and the sed command. If spaces are added, sed ignores them. The-n parameter is used to cancel the default output. By default, sed reads a row into the mode space, regardless of whether or not to process it. Before reading the next row, it needs to output the content in the mode space to the screen. The-n parameter can be used to cancel the default output. The specified line is output to the screen only when the user uses the command p. If the p command is executed on the specified line without the-n parameter, the rows will be printed twice.
The address range can be a number, which represents a row number, or a range expressed by two numbers separated by commas ). The range can be a number, a regular expression, or a combination of the two.
Pattern-to-match is a pattern to be matched. sed will execute sed-command on all matched rows. In fact, the pattern-to-match can also be considered as an address, which is the row number of all rows that match the specified pattern. Therefore, the sed format can be summarized as follows:
Sed [-Options] '[address-range] [sed-command]' filename
Text interval:
# Add an empty row after each row
Sed G
# Delete all original empty rows and add an empty row after each row.
# In this way, each row in the output text is followed by an empty row.
Sed '/^ $/d; G'
# Add two blank rows after each row
Sed 'G; G'
# Delete all blank rows generated by the first script (that is, delete all even [odd] rows)
Sed 'n'; D'
# Insert an empty row before matching the row with the style "regex"
Sed '/regex/{x; p; x ;}'
# Insert an empty row after matching the row with the style "regex"
Sed '/regex/G'
# Insert an empty row before and after the row matching the style "regex"
Sed '/regex/{x; p; x; G ;}'
No:
# Number each row in the file (simple left alignment ). The "tab" is used here"
# (Tab, see the '\ t' usage description at the end of this article) instead of spaces to align edges.
Sed = filename | sed 'n'; s/\ N/\ t /'
# Number of all rows in the file (the row number is on the left [Top], and the text is aligned on the right [left] side ).
Sed = filename | sed 'n'; s/^ //; s/* \ (. \ {6, \} \) \ N/\ 1 /'
# Number of all rows in the file, but only the row numbers of non-blank rows are displayed.
Sed '/./= 'filename| sed'/./N; s/\ n //'
# Calculate the number of rows (simulate "wc-l ")
Sed-n' $ ='
Text conversion and substitution:
# Unix environment: the new line character (CR/LF) for converting DOS is in Unix format.
Sed's/. $ // # assume that all rows end with CR/LF
Sed's/^ M $ // '# In bash/tcsh, change Ctrl-M to Ctrl-V
Sed's/\ x0D $ // '# ssed, gsed 3.02.80, and later
# Unix environment: Convert the Unix newline character (LF) to the DOS format.
Sed "s/$/'echo-e \ R'/" # Command Used in ksh
Sed's/$ '"/'echo \ R'/" # Command Used in bash
Sed "s/$/'echo \ R'/" # Command Used in zsh
Sed's/$/\ r/'# gsed 3.02.80 and later
# DOS environment: Convert Unix newline character (LF) to DOS format.
Sed "s/$ //" # method 1
Sed-n p # method 2
# DOS environment: Convert the DOS newline character (CR/LF) to Unix format.
# The following script is only valid for UnxUtils sed 4.0.7 and later versions. To identify the UnxUtils version
# Sed can use its unique "-- text" option. You can use the help option ("-- help") to view
# Whether there is a "-- text" item to determine whether the version is UnxUtils. Other DOS
# The sed version cannot perform this conversion. However, you can use "tr" to achieve this conversion.
Sed "s/\ r //" infile> outfile # UnxUtils sed v4.0.7 or later
Tr-d \ r <infile> outfile # GNU tr 1.22 or later
# Delete the leading blank characters (spaces and tabs) of each line
# Align left
Sed's/^ [\ t] * // '# refer to the' \ t' usage description at the end of this article.
# Delete the blank characters (spaces and tabs) at the end of each line
Sed's/[\ t] * $ // # refer to the '\ t' usage description at the end of this article.
# Delete leading and trailing blank characters in each row
Sed's/^ [\ t] * //; s/[\ t] * $ //'
# Insert 5 spaces at the beginning of each line (to move the full text to the right five characters)
Sed's/^ //'
# Align all texts right with 79 characters in width
Sed-e: a-e's/^. \ {\} $/&/; ta '#78 characters plus the last space
# Use 79 characters as the width to center all texts. In method 1, to center the text before each row
# The header and the backend header are filled with spaces. In method 2, the text is only filled before the text in the center process.
# Spaces, and half of these spaces will be deleted. In addition, no spaces are filled in the backend of each row.
Sed-e: a-e's/^. \ {, 77 \} $/&/; ta '# method 1
Sed-e: a-e's/^. \ {\} $/&/; ta '-e's/\ (* \) \ 1/\ 1/' # method 2
# Search for the string "foo" in each row and replace the "foo" with "bar"
Sed's/foo/bar/'# Only Replace the first "foo" string in each row
Sed's/foo/bar/4' # Only Replace the fourth "foo" string in each row
Sed's/foo/bar/G' # Replace all "foo" in each row with "bar"
Sed's/\ (. * \) foo \ (. * foo \)/\ 1bar \ 2/'# Replace the last and second "foo"
Sed's/\ (. * \) foo/\ 1bar/'# Replace the last "foo"
# Replace "foo" with "bar" only when the string "baz" appears in the row"
Sed '/baz/s/foo/bar/G'
# Replace "foo" with "bar", and replace it only when "baz" is not displayed in the row.
Sed '/baz /! S/foo/bar/G'
# "Red" is used for both "scarlet", "ruby", and "puce"
Sed's/scarlet/red/g; s/ruby/red/g; s/puce/red/G' # effective for most sed
Gsed's/scarlet \ | ruby \ | puce/red/G' # only valid for GNU sed
# Invert all rows. The first line is the last line, and so on (simulate "tac ").
# For some reason, HHsed v1.5 deletes empty lines in the file when the following command is used
Sed '1! G; h; $! D' # method 1
Sed-n' 1! G; h; $ P' # method 2
# Sort the characters in the row in reverse order. The first word becomes the last word ,...... (Simulate "rev ")
Sed '/\ n /! G; s/\ (. \) \ (. * \ n \)/& \ 2 \ 1/; // D; s /.//'
# Concatenate each two rows into one line (similar to "paste ")
Sed '$! N; s/\ n //'
# If the current row ends with a backslash (\), the next row is added to the end of the current row.
# Remove the backslash at the end of the original line
Sed-e: a-e '/\ $/N; s/\ n //; ta'
# If the current row starts with an equal sign, add the current row to the end of the previous row
# Replace the "=" of the original line header with a single space"
Sed-e: a-e '$! N; s/\ n = //; ta '-e' P; D'
# Add a comma separator for the numeric string and change "1234567" to "1,234,567"
Gsed ': a; s/\ B [0-9] \ {3 \} \>/, &/; ta' # GNU sed
Sed-e: a-e's /\(. * [0-9] \) \ ([0-9] \ {3 \} \)/\ 1, \ 2/; ta '# other sed
# Add a comma separator (GNU sed) for values with decimal points and negative numbers)
Gsed-R': a; s/(^ | [^ 0-9.]) ([0-9] +) ([0-9] {3})/\ 1 \ 2, \ 3/g; ta'
# Add a blank row after each 5 rows (add a blank row after rows 5, 10, 15, 20, and so on)
Gsed '0 ~ 5G '# only valid for GNU sed
Sed 'n'; n; G; '# other sed
Select to display specific rows:
# Display the first 10 lines in the file (simulate the "head" behavior)
Sed 10q
# Display the first line of the file (simulate the "head-1" command)
Sed q
# Display the last 10 lines in the file (simulate "tail ")
Sed-e: a-e '$ q; N; 11, $ D; Ba'
# Display the last two lines in the file (simulate the "tail-2" command)
Sed '$! N; $! D'
# Display the last line in the file (simulate "tail-1 ")
Sed '$! D' # method 1
Sed-n' $ P' # method 2
# Display the second and last lines in the file
Sed-e '$! {H; d;} '-e x # enter a blank line when there is only one row in the file.
Sed-e '1 {$ q;} '-e' $! {H; d;} '-e x # This row is displayed when there is only one row in the file.
Sed-e '1 {$ d;} '-e' $! {H; d;} '-e x # when there is only one row in the file, no output
# Only display rows matching Regular Expressions (simulate "grep ")
Sed-n'/regexp/P' # method 1
Sed '/regexp /! D' # method 2
# Show only the rows that do not match the regular expression (simulate "grep-v ")
Sed-n'/regexp /! P' # method 1, which corresponds to the preceding command
Sed '/regexp/d' # method 2, similar syntax
# Search for "regexp" and display the last line of the matched row, but not the matched row
Sed-n'/regexp/{g; 1! P;}; H'
# Search for "regexp" and display the next row of the matching row, but not the matching row
Sed-n'/regexp/{n; p ;}'
# Display the rows that contain "regexp" and the front and back rows, and add "regexp" before the first row
# Line number (similar to "grep-A1-B1 ")
Sed-n-e '/regexp/{=; x; 1! P; g; $! N; p; D;} '-e h
# Display rows containing "AAA", "BBB", or "CCC" (in any order)
Sed '/AAA /! D;/BBB /! D;/CCC /! D' # The string order does not affect the result
# Display rows containing "AAA", "BBB", and "CCC" (fixed order)
Sed '/AAA. * BBB. * CCC /! D'
# Display rows that contain "AAA" "BBB" or "CCC" (simulate "egrep ")
Sed-e '/AAA/B'-E'/BBB/B '-E'/CCC/B'-e d # majority of sed
Gsed '/AAA \ | BBB \ | CCC /! D' # valid for GNU sed
# Display the section containing "AAA" (separated by blank lines)
# HHsed v1.5 must add "G;" after "x;". This is the case for the next three scripts.
Sed-e '/./{H; $! D;} '-e' x;/AAA /! D ;'
# Display paragraphs containing "AAA", "BBB", and "CCC" strings (in any order)
Sed-e '/./{H; $! D;} '-e' x;/AAA /! D;/BBB /! D;/CCC /! D'
# Display the section containing any string of "AAA", "BBB", and "CCC" (in any order)
Sed-e '/./{H; $! D;} '-e' x;/AAA/B'-E'/BBB/B '-E'/CCC/B'-e d
Gsed '/./{H; $! D ;}; x;/AAA \ | BBB \ | CCC/B; D' # only valid for GNU sed
# Display rows containing 65 or more characters
Sed-n'/^. \ {65 \}/P'
# Display rows with less than 65 characters
Sed-n'/^. \ {65 \}/! P' # method 1, which corresponds to the above script
Sed '/^. \ {65 \}/d' # method 2, a simpler method
# Display part of the text-starting from the row containing the regular expression to the end of the last row
Sed-n'/regexp/, $ P'
# Display part of text -- specify the row number range (from 8th to 12th rows, including 8 and 12 rows)
Sed-n'8, 12p' # method 1
Sed '8, 12! D' # method 2
# Display rows 52nd
Sed-n '52p' # method 1
Sed '52! D' # method 2
Sed '52q; D' # method 3, which is more efficient in processing large files
# Display each 7 rows starting from 3rd
Gsed-n' 3 ~ 7p' # only valid for GNU sed
Sed-n' 3, $ {p; n;} '# other sed
# Display the text (inclusive) between two regular expressions)
Sed-n'/Iowa/,/Montana/P' # Case Sensitive Mode
Select to delete a specific row:
# Display the entire document except the content between two regular expressions
Sed '/Iowa/,/Montana/d'
# Delete adjacent duplicate rows in the file (simulate "uniq ")
# Only the first row in the duplicate row is retained, and the other rows are deleted.
Sed '$! N;/^ \ (. * \) \ n \ 1 $ /! P; D'
# Delete duplicate rows in the file, regardless of whether there are adjacent rows. Note: cache supported by hold space
# Size, or use GNU sed.
Sed-n'g; s/\ n/&/;/^ \ ([-~] * \ N \). * \ n \ 1/d; s/\ n //; h; P'
# Delete all rows except duplicate rows (simulate "uniq-d ")
Sed '$! N; s/^ \ (. * \) \ n \ 1 $/\ 1/; t; D'
# Delete the first 10 lines in the file
Sed '1, 10'
# Delete the last row in the file
Sed '$ d'
# Delete the last two lines in the file
Sed 'n'; $! P; $! D; $ d'
# Delete the last 10 lines in the file
Sed-e: a-e '$ d; N; 2, 10ba'-e 'P; D' # method 1
Sed-n-e: a-e '1, 10! {P; N; D ;}; N; Ba' # method 2
# Delete multiple rows of 8
Gsed '0 ~ 8d '# only valid for GNU sed
Sed 'n'; n; d; '# other sed
# Deleting matching rows
Sed '/pattern/d' # deletes rows containing pattern. Of course, pattern
# You can replace it with any valid regular expression.
# Delete all empty lines in the file (same effect as "grep)
Sed '/^ $/d' # method 1
Sed '/./! D' # method 2
# Only the first row of multiple adjacent empty rows is retained. Delete the empty lines at the top and end of the file.
# (Simulate "cat-s ")
Sed '/./,/^ $ /! D' # Method 1: Delete the empty lines at the top of the file and allow the trailing lines to be retained
Sed '/^ $/N;/\ n $/d' # method 2, allowing an empty row to be retained at the top, leaving no blank lines at the end
# Only the first two rows of multiple adjacent empty rows are retained.
Sed '/^ $/N;/\ n $/N; // d'
# Delete all blank lines at the top of the file
Sed '/./, $! D'
# Delete all empty lines at the end of the file
Sed-e: a-e '/^ \ n * $/{$ d; N; ba'-e'} '# valid for all sed
Sed-e: a-e '/^ \ n * $/N;/\ n $/Ba' # Same as above, but only valid for gsed 3. 02. *
# Delete the last line of each paragraph
Sed-n'/^ $/{p; h ;};/./{x;/./p ;}'
Special applications:
# Remove the nroff mark from the man page. In Unix System V or bash shell
# You may need to add the-e option when using the 'echo 'command.
Sed "s/. 'echo \ B '// g" # The outer double brackets are required (Unix environment)
Sed's/. ^ H // G' # In bash or tcsh, press Ctrl-V and then press Ctrl-H
Sed's/. \ x08/G' # sed 1.5, GNU sed, and ssed hexadecimal Representation
# Extract the header of a news group or email
Sed '/^ $/Q' # delete all content after the first empty line
# Extract the body of a newsgroup or email
Sed '1,/^ $/d' # delete all content before the first empty line
# Extract "Subject" (title bar field) from the mail header and remove the "Subject:" at the beginning
Sed '/^ Subject :*/! D; s //; Q'
# Obtain the reply address from the email header
Sed '/^ Reply-To:/q;/^ From:/h;/./d; g; Q'
# Obtain the email address. Based on the line of mail headers generated by the previous script
# Part of the address. (See the previous script)
Sed's/* (. *) //; s/>. * //; s/. * [: <] */'
# Add angle brackets and spaces (reference information) at the beginning of each line)
Sed's/^/> /'
# Delete the angle brackets and spaces at the beginning of each line (unreference)
Sed's/^> //'
# Remove most HTML tags (including cross-row tags)
Sed-e: a-e's/<[^>] *> // g;/</N; // Ba'
# Decode uencode files that are divided into multiple volumes. Removes the file header information and only keeps the uencode part.
# The file must be transmitted to sed in a specific order. The script of the first version can be directly input in the command line;
# The second version can be placed in a shell script with the execution permission. (One of Rahul Dhesi
# The script is modified .)
Sed '/^ end/,/^ begin/d' file1 file2... fileX | uudecode # vers. 1
Sed '/^ end/,/^ begin/d' "$ @" | udecode # vers. 2
# Sort the paragraphs in the file alphabetically. Paragraphs are separated by (one or more) blank rows. Use GNU sed
# The character "\ v" is used to represent vertical tabs. It is used as a placeholder for line breaks-you can also
# Replace it with other characters that are not used in the file.
Sed '/. /{H; d ;}; x; s/\ n/={ NL }=/ G' file | sort | sed '1s/= {NL }= //; s/= {NL} =/\ n/G'
Gsed '/. /{H; d}; x; y/\ n/\ v/'file | sort | sed '1s/\ v //; y/\ v/\ n /'
# Compress each. TXT file, compress the original file, and delete the compressed. ZIP file.
# Name it the same as the original name (only with Different Extensions ). (DOS environment: "dir/B"
# Display file names without paths ).
Echo @ echo off> zipup. bat
Dir/B *. txt | sed "s/^ \ (. * \) \. TXT/pkzip-mo \ 1 \ 1.TXT/"> zipup. bat
Use SED: Sed to accept one or more editing commands, and apply these commands in sequence after each line is read.
After reading the first line of input, sed applies all the commands to it and then outputs the results. Then read the second line of input and apply all the commands to it ...... Repeat this process. In the previous example, sed is input by the standard input device (that is, the command interpreter, usually in the form of pipeline Input. When the command line provides one or more file names as parameters, these files Replace the standard input device as sed input. Sed output will be delivered to standard output (Display ). Therefore:
Cat filename | sed '10q' # input using MPs queue
Sed '10q' filename # achieves the same effect, but does not use MPs queue input.
Sed '10q' filename> newfile # redirects the output to the disk.
To learn how to use the sed command, including using the script file instead of the command line, see sed & awk version 2, author Dale Dougherty and Arnold Robbins (o'reilly, 1997; http://www.ora.com), UNIX Text Processing, author Dale Dougherty and Tim o'reilly (Hayden Books, 1987) or a tutorial written by Mike Arst-the compressed package name is "U-SEDIT2.ZIP" (found on many sites ). To explore the potential of sed, you must have a sufficient understanding of the regular expression. For more information about Regular Expressions, see Jeffrey Friedl (O 'Reilly 1997), author of Mastering Regular Expressions ).
The manual pages provided by Unix systems ("man") will also be helpful (try these commands "man sed" and "man regexp ", or look at the regular expression section in "man ed", but the information provided in the Manual is "abstract"-which has always been criticized. However, it is not intended to teach beginners how to use sed or regular expressions, but to provide some text references for those who are familiar with these tools.
Bracket Syntax: the preceding example basically uses single quotes ('... ') instead of double quotation marks ("... ") This is because sed is usually used on Unix platforms. Under single quotes, the Unix shell (command interpreter) will not explain and execute the dollar sign ($) and the quotation mark. In double quotation marks, the dollar sign is expanded as a variable or parameter value. The command in the quotation marks is executed and the output result replaces the content in the quotation marks. The exclamation point (!) is used in "csh" and its derivative shell (!) Add a backslash (like this :\!) To ensure that the preceding example can run properly (including when single quotes are used ). All DOS Sed versions use double quotation marks ("...") instead of quotation marks to enclose the command.
'\ T' usage: to make this article concise, we use' \ t' in the script to represent a tab. However, in most versions, sed does not recognize '\ t'. Therefore, when a tab is entered for the script in the command line, you should press the TAB key directly to enter the TAB rather than '\ t '. The following tool software supports '\ t' as a regular expression character to represent tabs: awk, perl, HHsed, sedmod, and GNU sed v3.02.80.
SED of different versions: sed of different versions may have some differences. It is conceivable that there will be Syntactic differences between them. Specifically, most of them do not support using tags (: name) or branch commands (B, t) in the editing command, unless they are placed at the end of those. In this document, we try to use highly portable syntaxes so that most sed users can use these scripts. However, the GNU sed version allows more concise syntax. Imagine the mood when the reader sees a long command:
Sed-e '/AAA/B'-E'/BBB/B '-E'/CCC/B'-e d
The good news is that GNU sed makes the command more compact:
Sed '/AAA/B;/BBB/B;/CCC/B; D' # It can even be written
Sed '/AAA \ | BBB \ | CCC/B; D'
In addition, note that although many versions of sed accept such as "/one/s/RE1/RE2/", it is null before 'S'
But some of these versions do not accept the command: "/one /! S/RE1/RE2 /". Then
You only need to remove the spaces in the middle.
Speed Optimization: when the input file is large, the processor or hard disk is slow for some reason
When running a command, you can consider adding an address expression before the replacement command ("s /.../").
Increase speed. For example:
Sed's/foo/bar/G' filename # standard replacement command
Sed '/foo/s/foo/bar/G' filename # Faster
Sed '/foo/s // bar/G' filename # abbreviated form
When you only need to display the front part of the file or delete the following content, you can use "q" in the script"
Command (exit command ). This saves a lot of time when processing large files. Therefore:
Sed-n'45, 50p' filename # display 45th to 50 rows
Sed-n'51q; 45, 50p' filename # is the same, but it is much faster.
If you have other single-line scripts to share with you or find errors in this document, please email the author of this document (Eric Pement ). In the email, please remember to provide your sed version, the operating system of the sed, and the appropriate description of the problem. The single-line script mentioned in this Article refers to the sed script with a command line length of 65 characters or less 〕
In most cases, the sed script can be written as a single line regardless of the length (using the '-e' option and'; ')-as long as the command interpreter supports it, therefore, in addition to writing a single line, the single-line script mentioned here also imposes restrictions on the length. Because these single-line scripts do not mean they appear in a single line. It makes sense to allow users to easily use these compact scripts in the command line.