Use sed to convert plain text articles to custom csdn blog articles

Source: Internet
Author: User
Tags processing text uppercase character

Abstract: summarizes SED's functions and features, basic syntax, and precautions. Finally, we will provide several small examples, especially the script examples used to convert this article from plain text to a csdn blog.

1 sed Overview

Sed is the abbreviation of stream editor. It is a non-interactive Editor, which is very different from the interactive vim.

1.1 Main application fields
  • (1) Batch editing of a large number of documents;
  • (2) embed the editing command into the script.
1.2 unique characteristics of SED
  • (1) process text line by line, and only one line of text is processed at a time. The advantage is that, no matter how large the file to be processed, sed only
  • Read and process a row with low memory usage and high speed;
  • (2) fully supports pipelines, which can be read from standard input and written to standard output, and are suitable for Unix pipeline processing;
  • (3) do not change the source file. All edit commands operate on the mode space and output the results to the standard output without worrying about destroying the source file.
1.3 General sed Workflow
  • (1) read the first row from the input to the mode space;
  • (2) Determine whether the current row meets the row constraints. If not, go to (4 );
  • (3) edit the mode space;
  • (4) print the content of the mode space to the standard output;
  • (5) judge whether the current row is the last row. If yes, go to (7 );
  • (6) Clear the mode space, read the next row, and go to (2); ** keep the space not cleared **
  • (7) end.

It should be noted that the row constraint is only a criterion for determining the editing. rows that do not meet the row constraint will be automatically read to the mode space by sed, because sed processes all input rows from start to end (except that the Q command causes sed to exit early ).

2 sed row Constraints

The basic syntax format of SED is very concise. That is, the line constraint is added with the edit command. If no row constraint is specified, all rows are satisfied by default. There are two types of row constraints: one is a separate row constraint and the other is a continuous block constraint. Examples:

  • (1) Sed-e 'd 'myfile' all rows meet the row Constraints
  • (2) Sed-e '11d'. Only 11th rows of myfile meet the row constraints.
  • (3) Sed-e '11, 23D 'myfile 11,12,..., 23. These rows meet the row constraints.
  • (4) Sed-e '/^ [A-Z]/d' myfile matches the regular expression ^ the row of [a-Z] satisfies the row constraint.
  • (5) Sed-e '/regex1/,/regex2/d' myfile: the first row matching regex1 is satisfied, and the subsequent rows are satisfied until the first row meeting regex2 is met, the subsequent rows do not meet the requirements.

The judgment process is:

  • (A) whether the current row meets regex1; if not, read the next row and go to ();
  • (B) edit the statement to determine whether the current row meets regex2. If yes, all subsequent rows will not be satisfied. Otherwise, the statement is forwarded to (B ).

Based on this, it can be introduced: 3, 1 indicates that from 3,... all are satisfied, because the first row cannot appear after the third row; 3, 3 indicates that only the third row is satisfied.

As you can see, use to represent continuous blocks. In addition, sed supports nesting of row constraints as follows:

/Regex1/{operation 1/regex2/operation 2 Operation 3}

In this case, only the rows matching regex1 and regex2 meet the row constraints. Finally, sed also supports anti-constraints !, For example, $! All rows except the last row are satisfied.

3 sed basic commands

Before editing commands in detail, we should first remind you that the order of commands is very important because multiple sed editing commands act on the same mode space. The same group of commands have different order, and the results may be completely different.

3.1 =, P, d

The row number corresponding to the content in the p output mode space, and P indicates the content in the output mode space. D indicates the content in the delete mode space, and the execution flow ends.

3.2 s

This may be the most important sed editing command to replace text. The syntax format is S/the regular expression to be matched/New Content /. By default, only the first matched content in a row is replaced. If you want to replace all matched content in the row, use S/regular/New Content/g. In fact, the most important thing here is the knowledge of regular expressions, which is exactly the same as the regular expressions in grep. It should be noted that the separator/can be any character, and the "new content" part also has its own specific metacharacters (such as &), which is different from the regular metacharacters.

3.3 I, a, c

I is used to insert one or more rows above the current mode space, A is similar to I, but is inserted under the mode space, C is to replace the content of the mode space. The syntax format of these commands is multi-line, so they are basically written in the SED script file. For example, I \ inserts the first row under the current row \ inserts the second row under the current row

3.4 n

In the previous discussion, the SED execution stream first reads a row to the mode space, then executes the editing and outputs. After the current row is fully processed, sed first clears the mode space, then, the system automatically reads and removes a row to the mode space. In fact, you can use the command N (EXT) to read the next row of the file during the editing process. In this case, the content in the mode space is completely replaced by the next row, and the current row number is increased by 1, however, the control flow does not return to the starting point, but continues after the N (EXT) command. For example, the following command prints an even number of lines of the file:

sed -n -e 'n;p'
3.5 R, R, W, W

These four commands are used to read and write external files, and there are few applications.

4 sed advanced command 4.1 preserve space and mode space N, G, H, G, H, x, D, P

The mode space has been mentioned earlier, but in the previous example, only one line of text exists in the mode space. In fact, the mode space is not limited to one row, and there can be multiple rows. In addition to the mode space, sed also has a secondary memory space called the reserved space. With these two memories, sed is omnipotent.

Previously we used the N command to read the next row to the mode space and replace the content of the current mode space. The uppercase character N is different. Although all of them read the content of the next line, N attaches the read content to the content of the current mode space, instead of replacing it. G is used to read the content of the reserved space to the mode space. H is the opposite. commands in the upper-case version are not overwrite but appended. X is used to swap the mode space and the content of the reserved space. D is to delete the content of the entire mode space, and D is to delete the first line in the mode space. Both will cause the execution stream to return to the beginning again, but if the mode space is not empty, sed does not read the next line of the input. P corresponds to P, which prints the first row of the pattern space.

4.3 execution Stream B, T

Under normal circumstances, a set of SED commands are executed in the order of writing. For some advanced edits, sed provides special commands for changing the sequence of execution streams. The syntax format is very similar to that of C. B is the abbreviation of branch and T is the abbreviation of test. B Indicates unconditional transfer, and t indicates conditional transfer, based on whether the replacement of the last s command after T is successful.

Edit operation 1: mylabel edit operation 2 [address] T mylabel
5. Note 5.1 Definition of metacharacters in different environments

The row number constraint of SED uses $ to represent the last line, while $ in the regular expression represents the end position of the line, and the new content part of the S command also contains special metacharacters of its own. If you add the shell metacharacters, the situation is more complex. Therefore, you must pay attention to the usage environment of metacharacters.

6. Practical sed script 6.1 UNIX line breaks windows line breaks are converted to each other
sed -e 's/$/\r/' myunix.txt > mydos.txtsed -e 's/.$//' mydos.txt > myunix.txt
6.2 reverse row order

This function is similar to the TAC command

Sed-e'1! G; h; $! D' forward.txt> backword.txt

6.3 convert plain text into custom blog articles

When I write a blog on csdn, the built-in editor is very uncomfortable, And the csdn Blog system does not support user-defined styles. What makes the user feel! I personally like writing in VIM plain text mode, so I thought of using the powerful sed editor to convert plain text into a custom-style csdn blog. In plain text format, the beginning of a paragraph must be blank with 2 cells. The title should be fixed in the format of 1, 1.1, 1.1.1. The list should be defined in the format of (1), (2 ),... or (a), (B ),.... The structures are separated by blank lines. Use: nochange and nochange: the enclosed text is output as is. In this article, the following sed script is automatically generated. Before processing, perform security HTML encoding to prevent characters that conflict with HTML in the original text. The script used is as follows:

HTML Security Processing sed script:

#htmlencodes/&/\&amp;/gs/>/\&gt;/gs/</\&lt;/g

Convert the SED script from plain text to HTML:

# Insert css1 {I \ <Div class = "smstong" >\< style> \ before the first line \. smstong {font-size: 14px;} \ H1, H2, H3 {font-family: "" ;}\ H1 {font-size: 20px ;} \ H2 {font-size: 18px ;}\ H3 {font-size: 16px ;}\ ul {list-style-type: None ;}\ P {font-size: 14px; text-indent: 2em;} \ P. summary {font-size: 14px; font-family = "" ;}\</style >}# the question must be a line/[[: Space:] * Title/D # Summary/[[: Space:] * Summary [:]/{I \ <P class = "summary">: summaryns; ^ $; </P>; t doneb summary} # titles of all levels S/^ [0-9] \ + [^.] * $/

The plain text version of this article is attached:

Title: non-interactive editor sed Abstract: summarizes SED's functions and features, basic syntax, and precautions. Finally, we will provide several small examples, especially the script examples used to convert this article from plain text to a csdn blog. 1 sed overview SED is short for stream editor. It is a non-interactive Editor, which is very different from the interactive vim. 1.1 Main application fields (1) Batch editing of a large number of documents; (2) Embedding edit commands into scripts. 1.2sed's uniqueness (1) processing text by line, processing only one line at a time. The advantage is that, no matter the size of the file to be processed, sed only reads one row for processing, which occupies a small amount of memory and is fast. (2) pipelines are fully supported, it can be read from the standard input and written to the standard output, which is suitable for Unix pipeline processing. (3) it does not change the source file. All the edit commands operate in the mode space and the results are output to the standard output, you don't have to worry about destroying source files. 1.3 sed general workflow (1) read the first row from the input to the mode space; (2) Determine whether the current row meets the row constraints. If not, go to (4 ); (3) edit the mode space; (4) print the content of the mode space to the standard output; (5) Determine whether the current row is the last row; (5) If yes, go to (7); (6) Clear the mode space, read the next row, and go to (2); ** keep space not cleared ** (7. it should be noted that the row constraint is only a criterion for determining the editing. rows that do not meet the row constraint will be automatically read to the mode space by sed, because sed processes all input rows from start to end (except that the Q command causes sed to exit early ). The basic syntax format of the SED row constraint is very concise. That is, the line constraint is added with the edit command. If the row constraint is not specified, all rows are satisfied by default. There are two types of row constraints: one is a separate row constraint and the other is a continuous block constraint. Several examples: (1) Sed-e 'd 'myfile' all rows meet the row constraint (2) Sed-e '11d' myfile only 11th rows meet the row constraint (3) sed-e '11, 23D 'myfile 11,12 ,..., 23. These rows meet the row constraints (4) sed-e '/^ [A-Z]/d' myfile matching Regular Expression ^ rows in [A-Z] Meet the row constraint (5) Sed-e'/regex1 /, /regex2/d' myfile: the first row that matches regex1 is satisfied, and the subsequent rows are satisfied until the first row that matches regex2 is satisfied, and the subsequent rows are not satisfied. The process is as follows: (a) whether the current row meets regex1; if not, read the next row and go to (a); (B) for editing; and check whether the current row meets regex2, if yes, all the subsequent rows will not be satisfied; otherwise, go to (B ). Based on this, it can be introduced: 3, 1 indicates that from 3,... all are satisfied, because the first row cannot appear after the third row; 3, 3 indicates that only the third row is satisfied. As you can see, use to represent continuous blocks. In addition, sed supports nesting of row constraints, as follows: nochange/regex1/{operation 1/regex2/operation 2 Operation 3} nochange: in this case, only the rows matching regex1 and regex2 meet the row constraints. Finally, sed also supports anti-constraints !, For example, $! All rows except the last row are satisfied. 3 before editing the basic sed commands in detail, you must first remind everyone that the order of commands is very important because multiple sed editing commands act on the same mode space. The same group of commands have different order, and the results may be completely different. 3.1 =, P, DP the row number corresponding to the output mode space content, p is the content of the output mode space, D is the content of the delete mode space, the execution flow ends. 3.2 S, which may be the most important sed Editing Command, is used to replace text. The syntax format is S/the regular expression to be matched/New Content /. By default, only the first matched content in a row is replaced. If you want to replace all matched content in the row, use S/regular/New Content/g. In fact, the most important thing here is the knowledge of regular expressions, which is exactly the same as the regular expressions in grep. It should be noted that the separator/can be any character, and the "new content" part also has its own specific metacharacters (such as &), which is different from the regular metacharacters. 3.3 I, A, and CI are used to insert one or more rows above the current mode space. A is similar to I, but is only inserted under the mode space. C is used to replace the content of the mode space. The syntax format of these commands is multi-line, so they are basically written in the SED script file. For example, I \ inserts the first row under the current row \ inserts the second row under the current row 3.4 n before our discussion, the execution stream of SED reads a row to the mode space first, then executes the editing and then outputs. After the current row is fully processed, sed first clears the mode space, and then automatically reads the row to the mode space. In fact, you can use the command N (EXT) to read the next row of the file during the editing process. In this case, the content in the mode space is completely replaced by the next row, and the current row number is increased by 1, however, the control flow does not return to the starting point, but continues after the N (EXT) command. For example, the following command prints the even lines of the file: nochangesed-n-e 'n'; P' nochange: 3.5 R, R, W, and W, which are used to read and write external files, few applications. 4 sed advanced command 4.1 preserve space and mode space N, G, H, G, H, x, D, and P mode space as mentioned earlier, but in our example, only one line of text exists in the mode space. In fact, the mode space is not limited to one row, and there can be multiple rows. In addition to the mode space, sed also has a secondary memory space called the reserved space. With these two memories, sed is omnipotent. Previously we used the N command to read the next row to the mode space and replace the content of the current mode space. The uppercase character N is different. Although all of them read the content of the next line, N attaches the read content to the content of the current mode space, instead of replacing it. G is used to read the content of the reserved space to the mode space. H is the opposite. commands in the upper-case version are not overwrite but appended. X is used to swap the mode space and the content of the reserved space. D is to delete the content of the entire mode space, and D is to delete the first line in the mode space. Both will cause the execution stream to return to the beginning again, but if the mode space is not empty, sed does not read the next line of the input. P corresponds to P, which prints the first row of the pattern space. 4.3 When Stream B and T are executed normally, a set of SED commands are executed in the order of writing. For some advanced edits, sed provides special commands for changing the sequence of stream execution. The syntax format is very similar to that of C. B is the abbreviation of branch and T is the abbreviation of test. B Indicates unconditional transfer, and t indicates conditional transfer, based on whether the replacement of the last s command after T is successful.: Nochange edit operation 1: mylabel edit operation 2 [address] T mylabelnochange: 5 Note: 5.1 the line number constraints of SED defined by metacharacters in different environments use $ to represent the last line, in the regular expression, $ indicates the position at the end of the line. The new content of the S command also contains special metacharacters. If you add the shell metacharacters, the situation is more complex. Therefore, you must pay attention to the usage environment of metacharacters. 6. Practical sed script 6.1 UNIX line breaks windows line breaks: nochangesed-E's/$/\ r/'myunix.txt> mydos.txt sed-E's /. $ // 'mydos.txt> myunix.txt nochange: 6.2 line Order reversal this function is similar to the TAC command sed-e' 1! G; h; $! D 'forward.txt> backword.txt 6.3 converts plain text into a style custom blog post to write a blog on csdn, and its built-in editor is very uncomfortable, moreover, the csdn Blog system does not support user-defined styles. What makes the user feel! I personally like writing in VIM plain text mode, so I thought of using the powerful sed editor to convert plain text into a custom-style csdn blog. In plain text format, the beginning of a paragraph must be blank with 2 cells. The title should be fixed in the format of 1, 1.1, 1.1.1. The list should be defined in the format of (1), (2 ),... or (a), (B ),.... The structures are separated by blank lines. Use: nochange and nochange: the enclosed text is output as is. In this article, the following sed script is automatically generated. Before processing, perform security HTML encoding to prevent characters that conflict with HTML in the original text. The script used is as follows: HTML Security Processing sed Script: nochange # htmlencodes/& // \ & amp;/GS/>/\ & gt;/GS/</\ & lt;/gnochange: conversion from plain text to HTML sed Script: nochange # insert css1 {I \ <Div class = "smstong" >\< style> \. smstong {font-size: 14px;} \ H1, H2, H3 {font-family: "" ;}\ H1 {font-size: 20px ;} \ H2 {font-size: 18px ;}\ H3 {font-size: 16px ;}\ ul {list-style-type: None ;}\ P {font-size: 14px; text-indent: 2em;} \ P. summary {font-size: 14px; font-family = "" ;}\</style >}# the question must be a line/[[: Space:] * Title/D # Summary/[[: Space:] * Summary [:]/{I \ <P class = "summary">: summaryns; ^ $; </P>; t doneb summary} # titles of all levels S/^ [0-9] \ + [^.] * $/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.