Sed command + Regular Expression

Last Update:2018-12-04 Source: Internet

Author: User

Tags uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sed is a non-interactive text editor that edits files or Copies files exported from standard input. The standard input may be from the keyboard, file redirection, string or variable, or a pipe file. Sed can edit small or large files at will. There are many sed commands used to edit and delete files and allow them to be absent from the site during this operation. Sed processes all changes at a time and thus becomes very effective. The most important thing for users is to save time. Sed must specify the text line to be changed through the row number and regular expression. How does sed read data:
Sed reads data from a text line of a file or from several standard input formats, copies the data to an editing buffer, and then reads the First Command of the command line or script, use these commands to search for the mode or locate the row number to edit it. Repeat the process until the command ends calling the SED command:
Type a command in the command line, insert the SED command into the script file, then call SED, insert the SED command into the script file, and make the SED script executable
Sed [Option] The SED command input file uses the SED command in the command line. The actual command must be enclosed in single quotes.
Sed [Option]-F sed script file input file use sed script file
Sed script file [Option] The SED script file with the SED command interpreter in the first line of the input file
Option:
N does not print; Sed does not write the editing row to the standard output. By default, all rows are printed (edited and not edited). The p command can be used to print and edit rows.
C. The next command is the edit command. This option is added when multiple edits are used.
F if the SED script file is being called, use this option to notify sed that a script file supports the SED Command, as shown in figure
Sed-F myscript. Sed input_file here myscript. Sed is a file that supports the SED command.
You can use a redirected file to save sed output. You can use sed to locate text in the text:
X is a row number, such as 1
X, Y indicates that the row number ranges from X to Y. For example, 2nd indicates that the row number ranges from 5th.
/Pattern/query rows in the include mode, such as/Disk/or/[A-Z]/
/Pattern/query rows that contain two modes, such as/Disk/disks/
/Pattern/, X queries the rows in the include mode on the given row number, such as/Disk/, 3
X,/pattern/query matched rows by row number and pattern, such as 3,/Disk/
X, Y! The query command does not include the basic sed edit command for lines X and Y:
P print matching line C \ replace positioning text with new text
= Display file row number S replace the corresponding mode with the replacement Mode
A \ add new text information after locating the row number r read text from another text
I \ Insert new text information after locating the row number W write text to a file
D. Delete the positioning row Q. Exit or exit immediately after the first mode match is complete.
L display the y transfer character equivalent to the octal ASCII code
N read the next line of text from another text, and append the command group executed in the next line {}
G paste Mode 2 to/pattern N/basic sed programming example:
Use P (RINT) to display rows: sed-N '2p' temp.txt only shows 2nd rows, use option n
Print range: sed-n'1, 3 p 'temp.txt print 1st rows to 3rd rows
Print mode: sed-n'/movie/'P temp.txt print the row containing movie
Use mode and row number query: sed-n'3,/movie/'P temp.txt only searches for movie in row 3rd and prints
Show the entire file: sed-n'1, $ 'P temp.txt $ is the last line
Arbitrary character: sed-n'/. * ing/'P temp.txt note that it is. * ing, not * ing
Print row number: sed-e '/music/= 'temp.txt
Additional text: (create sed script file) chmod U + x script. Sed, runtime./script. Sed temp.txt
#! /Bin/sed-F
/Name1/A \ # A \ indicates adding text to the line feed
Here add new line. # added text content
Insert text:/name1/A \ changed to 4 I \ 4 to indicate the row number, I insert
Modify text:/name1/A \ change to/name1/C \ modify the entire line, C modify
Delete text: sed '1d 'temp.txt or SED '1, 4D 'temp.txt
Replace text: SED's/source/okstr/'temp.txt replace source with okstr
Sed's/\ $ // G' temp.txt deletes all the $ symbols in the text
Sed's/source/okstr/W temp2.txt 'temp.txt writes the record to the file temp2.txt
Replace the modified string SED's/source/"Add before" &/P 'temp.txt
The result will be added "Add before" before the source string. Here the & represents the source character found and saved
Sed results are written to the file: sed '1, 2 W temp2.txt 'temp.txt
Sed '/name/W temp2.txt' temp.txt
Read text from the file: sed '/name/R temp2.txt' temp.txt
Add the following text to the end of each column: SED's/[0-9] */& pass/G' temp.txt
Transmit the value from shell to SED: Echo $ name | sed "s/go/$ rep/G". Note that you need to use double quotation marks to run a quick line:
'S/\. $ // G' Delete the row ending with a period
'-E/ABCD/d': Delete the row containing ABCD.
'S/[] [] [] */[]/G' Delete more than one space and replace it with one space
'S/^ [] [] * // G' Delete the first space in the row
'S/\. [] [] */[]/G' Delete the ending period followed by two or more spaces. replace it with one space.
'/^ $/D' Delete empty rows
'S/^. // G' Delete the first character, difference:'s/\. // G' delete all periods
'S/COL/(... \) // G' Delete the last three letters followed by Col
'S/^ \ ///G' Delete the first \//////////////////////// //////////////////////////////////////// ////// use a period to match a single-character period. "can match any single character. "." Can match the string header or any character in the middle. Assume that a text file is being filtered. For a script set with 1 0 characters, the first four characters must be followed by X c. The matching operation is as follows :.... X c ....
2. match strings or character sequences ^ at the beginning of a row to match only characters or words at the beginning of a row. The first 4th characters in a row are 1, and the matching operation is expressed as ^... 1.
3. match a string or character with $ at the end of a row. It can be said that $ is the opposite of ^. It matches a string or character at the end of a row, and $ is placed after a match word. If the word j e t 0 1 is matched at the end of a row, perform the following operations: j e t 0 1 $ if only one character row is returned, perform the following operations: ^. $
4. Use * to match a single character in a string or its recurring series. Use this special character to match any character or string's repeated expressions multiple times.
5. When you use \ to block the meaning of a special character, you sometimes need to find some characters or strings that contain a character specified by the system as a special character. To match all objects ending with *. p a s in a regular expression, do the following: \ * \. P A S
6. Use [] to match a range or set and use [] to match a specific string or string set. You can use commas to separate different strings to be matched in the ARC, however, this is not mandatory (some systems advocate using commas in complex expressions), which can increase the readability of the mode. "-" Indicates a string range, indicating that the string range starts from the character on the left of "-" and ends with the character on the right. If you want to match any number, you can use: [0 1 2 3 4 5 6 7 8 9] to match any letter, use: [A-z A-Z] indicates the letter range from a-Z and A-Z.
7. Use * to match any number of matching results in the \ {\} match mode. However, use \ {\} to specify the number of matching results \{\}, this mode has three forms:
Pattern \ {n \} match mode appears n times.
Pattern \ {n, \} match mode appears at least N times.
Pattern \ {n, m} match mode appears between N to m times, n, m is 0-2 5 any integer in 5.
The matching letter A appears twice and ends with B. The operation is as follows: A \ {2 \} B matches a for at least four times. Use: A \ {4, \} B in S H E L programming, one of the differences between a good script and a perfect script is to be familiar with regular expressions and learn to use them. In comparison, extracting a piece of text using one command saves a lot of time than using three or four commands to get the same result. //////////////////////////////////////// //// // Detailed description of the regular expression (1)
-Lu Xiaobo if we ask UNIX fans what they like most, the answer is not only stable systems and remote startup, but also regular expressions; if we ask them what the biggest headache is, it may be a regular expression in addition to complicated process control and installation processes. So what is a regular expression? How can we grasp Regular Expressions and use them correctly and flexibly? This article will introduce you and hope to help readers who are eager to understand and master regular expressions. In brief, regular expressions are a powerful tool for pattern matching and replacement. We can find regular expressions in almost all UNIX-based tools, such as the VI Editor, Perl or PHP scripting language, and awk or SED Shell programs. In addition, client scripting languages such as JavaScript also provide support for regular expressions. It can be seen that regular expressions have gone beyond the limits of a language or system and become widely accepted concepts and functions. Regular Expressions allow users to construct matching modes by using a series of special characters, and then compare the matching modes with target objects such as data files, program input, and form input on the web page, execute the corresponding program based on whether the comparison object contains the matching mode. For example, the most common application of regular expressions is to verify whether the format of the email address entered by the user online is correct. If the regular expression is used to verify that the email address format is correct, the form information entered by the user will be processed normally. Otherwise, if the email address entered by the user does not match the regular expression mode, A prompt will pop up asking the user to re-enter the correct email address. It can be seen that regular expressions play an important role in the logic judgment of Web applications. After a preliminary understanding of the functions and functions of a regular expression, let's take a look at the syntax format of the regular expression. The regular expression format is generally as follows:/love/where the part between the "/" delimiters is the pattern to be matched in the target object. You only need to place the pattern content of the desired matching object between the "/" delimiters. To enable users to customize the mode content more flexibly, regular expressions provide special "metacharacters ". Metacharacters are special characters that have special meanings in regular expressions. They can be used to specify the mode in which the leading character (that is, the character before the metacharacters) appears in the target object. Frequently Used metacharacters include "+", "*", and "?". The "+" metacharacter specifies that its leading character must appear one or more times consecutively in the target object, the "*" metacharacter specifies that the leading character must appear zero or multiple times in the target object, and "?" Metacharacter specifies that the leading object must appear zero or once consecutively in the target object. Next, let's take a look at the specific application of the regular expression metacharacters. /FO +/because the regular expression above contains the "+" metacharacter, it can be used with the "fool", "FO ", or "football", and so on, one or more character strings that match the letter "F" consecutively. /EG */because the above regular expression contains "*" metacharacters, it can be used with "easy", "ego ", or, "egg" and other strings that appear after the letter E are matched with zero or multiple Letter g consecutively. /Wil? /Because the above regular expression contains "?" Metacharacter, indicating that it can match the "win" or "Wilson" in the target object, and matches zero or one character string after the letter I. In addition to metacharacters, you can also precisely specify the frequency of occurrence of a pattern in a matching object. For example, the/Jim {}/regular expression specifies that the character m can appear 2-6 times in a row in the matching object. Therefore, the regular expression can match strings such as Jimmy or jimmmmmy. After a preliminary understanding of how to use regular expressions, let's take a look at the usage of several other important metacharacters. \ S: Used to match a single space character, including the tab key and line break; \ s: Used to match all characters except a single space character; \ D: Used to match numbers from 0 to 9; \ W: Used to match letters, numbers, or underscores; \ W: Used to match all characters that do not match \ W ;.: Used to match all characters except line breaks. (Note: we can regard \ s and \ W and \ w as inverse operations.) let's take a look at how to use the above metacharacters in regular expressions through examples. /\ S +/the above regular expression can be used to match one or more space characters in the target object. /\ D000/if we have a complex financial statement in hand, we can easily find all the total amount of RMB through the above regular expression. In addition to the metacharacters described above, regular expressions also have a unique special character, that is, the positioning character. Specifies the position where the matching mode appears in the target object. Commonly used positioning characters include "^", "$", "\ B", and "\ B ". The "^" operator specifies that the matching mode must start with the target string, and the "$" operator specifies that the matching mode must end with the target object, the \ B locator specifies that the matching mode must appear at either the beginning or end of the target string, the "\ B" Locator specifies that the matched object must be within the two boundary of the start and end of the target string. That is, the matched object cannot start with the target string, it cannot end with the target string. Similarly, we can regard "^" and "$" as well as "\ B" and "\ B" as two sets of operators for inverse operation. For example:/^ hell/because the above regular expression contains the "^" locator, you can use "hell" with the target object ", the string starting with "hello" or "Hellhound" matches. /AR $/because the regular expression above contains the "$" operator, it can match the string ending with "car", "bar", or "Ar" in the target object. /\ Bbom/because the above regular expression pattern starts with "\ B", it can match a string starting with "bomb" or "Bom" in the target object. /Man \ B/because the above regular expression pattern ends with the "\ B" positioning character, you can use "human" with the target object ", the string ending with "woman" or "man" matches. To make it easier for users to set matching modes flexibly, regular expressions allow users to specify a range in the matching mode, not limited to specific characters. For example:/[A-Z]/the above regular expression will match any uppercase letter from A to Z. /[A-Z]/the above regular expression will match any lowercase letter in the range from A to Z. /[0-9]/the above regular expression will match any number from 0 to 9. /([A-Z] [A-Z] [0-9]) +/the above regular expression will be associated with any string consisting of letters and numbers, for example, "ab0" matches. Note that you can use "()" in a regular expression to combine strings. The content contained by the "()" symbol must appear in the target object at the same time. Therefore, the above regular expression cannot match strings such as "ABC", because the last character in "ABC" is a letter rather than a number. If we want to implement the "or" operation similar to the programming logic in the regular expression, and select one of multiple different modes for matching, we can use the pipe character "| ". For example:/to | too | 2/the above regular expression will match "to", "too", or "2" in the target object. There is also a common operator in the regular expression, that is, the negative character "[^]". Unlike the positioning character "^" described above, the "[^]" negation specifies that the target object cannot contain strings specified in the pattern. For example:/[^ A-C]/the above string will match any character except A, B, and C in the target object. In general, when "^" appears in "[]", it is regarded as a negative operator. When "^" is located outside of "[]" or, it should be regarded as a positioning character. Finally, you can use the Escape Character "\" to add metacharacters to the regular expression mode and find matching objects. For example, the/Th \ */regular expression will match the "th *" in the target object rather than ". This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/cxqdong/archive/2008/01/01/2007884.aspx

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More