Early knowledge of sed and gawk in Linux Command Line and shell script Programming
Source: Internet
Author: User
These two tools can greatly simplify the data processing tasks required.
19.1 Text Processing
Simple command-line editing for easily formatting, inserting, modifying, or deleting text elements.
sed and gawk have the above functions
19.1.1 sed editor
Called the stream editor.
The stream editor edits the data stream before it is processed by a set of rules provided by the editor.
sed processes the data in the data stream according to commands. Commands can be entered from the command line or stored in a command text file.
The sed editor does the following:
1) Read one line of data from the input at a time
2) Match the data according to the provided editing commands
3) Modify the data in the stream according to the command
4) Output new data to STDOUT
After sed matches all commands with one line of data, it reads the next line and repeats the process.
The format of the sed command is as follows:
sed options script file
Options allow you to modify the behavior of the sed command. The options that can be used are in the following table:
Options | Description
-e script | Adds the command specified in the script to the existing command when processing input
-f file | Adds commands specified in file to existing commands when processing input
-n | Does not produce command output, use printf command to complete the output
It is usually a single command. If multiple commands are required, add the -e option and separate them with a semicolon. There must be no space between the end of the command and the semicolon.
1. Define editor commands on the command line
By default, the sed editor applies the specified command to the STDIN input stream. This can directly input data to the sed command for processing through the pipeline.
$ echo “hahaha, I am xiaochongyong” | sed ‘s / xiaochongyong / Kobe Bryant /’
Replaced xiaochongyong with Kobe Bryant.
sed sends results to STDOUT
You can also modify the data in the specified file like this:
$ sed ‘s / dog / cat /’ my.txt
// Replace the dog of my.txt with cat output. This does not change my.txt. It outputs to STDOUT
2. Use multiple editing commands
Need to use -e option;
$ sed --e ‘s / dog / cat /; s / red / yellow /’ my.txt
The sed command applies each command specified to each line in the text file.
3. Read editing commands from a file
Need to specify the file with the -f option.
sed will read the command in the specified file and apply it to each line in the data file
such as:
file.sed has
s / dog / cat /
s / red / blue /
s / xiao / yang /
Can be used like this:
$ sed --f file.sed my.txt.
Tip: You can use .sed as the extension for the sed script file
19.1.2 gawk program
It provides a programming language instead of just editor commands. In the gawk programming language, you can do:
1) Define variables to save data
2) Use arithmetic and string operators to process data
3) Use the concept of structured programming to add processing logic to data processing
4) Generate a formatted report by extracting the data elements from the data file, rearranging or formatting them.
The report generation capabilities of the gawk program are often used to extract data elements from large text files and format them into readable reports. For example, format the log file to find the error lines in the log file.
1.gawk command format
gawk options program file
Here is a description of the available options:
-F fs specifies the field delimiter for dividing data fields in a row
-f file read program from specified file
-v var = value defines a variable var and sets the default value
-mf N specifies the maximum number of fields in the data file to be processed
-mr N specifies the maximum number of data lines in the data file
-w keyword specifies gawk compatibility mode or warning level
Its power lies in program scripts, which can write scripts to read data in text lines, then process and display the data, and create any type of output report.
2. Read the program script from the command line
The gawk program script is defined with a pair of curly braces. The script command must be placed between the two curly braces {}.
such as:
$ gawk ‘{print“ hello, shell ”}’
Because no file name is specified, this gawk program will receive data from STDIN, and the runtime will always wait for text input from STDIN.
Ctrl + D will generate an EOF character in bash, which can terminate the program.
3. Using field field variables
It automatically assigns a variable to each data element in a row.
such as:
$ 0 for the entire line of text
$ 1 represents the first data field in the text line
$ 2 represents the second data field in the text line
$ n represents the nth data field in the text line
example:
$ gawk ‘{print $ 2}’ data.txt // Output the second data field of all rows of data.txt
By default, they are separated by spaces, and other separators can be specified.
such as:
$ gawk –F: ‘{print $ 1}’ / etc / passwd // Output the first field of / etc / passwd, separated by a colon.
4. Using multiple commands in a program script
Just put a semicolon in the middle of the command
$ echo “My name is xcy” | gawk ‘{% 2 =“ age ”; $ 4 = 23; print $ 0}’
Note: Double quotes are required around age. There can be spaces before and after =.
5. Read the program from the file
For example, a script test.gawk reads as follows:
{print $ 1 “’ s home is ”$ 6}
usage:
$ gawk --F: -f test.gawk / etc / passwd
The content of test.gawk can also be like this: This is equivalent to specifying multiple commands, and each command is released on a separate line.
{
text = ”’ s home is ”
print $ 1 text $ 6
}
6. Run the script before processing the data
gawk also allows specifying when a program script runs.
By default gawk reads a line of text from the input and then executes a program script on that line of data.
Sometimes you need to run a script before processing the data. BEGIN is used to do this.
The highlighted part is the command, and the red text is processed by BEGIN. BEGIN is followed by a script, which also needs to be enclosed in {}.
7. Run the script after processing the data
With BEGIN Lacey, you can also specify a script to be executed after reading the data.
such as:
$ gawk 'BEGIN {print "The test3 file:"} {print $ 0} END {print "The file End"}' test3
Red is the script specified by END.
example:
File ga.gawk:
1 BEGIN {
2 print "The latest list of users and shells"
3 print "UserID \ t Shell"
4 print "--------- \ t ---------"
5 FS = ":"
6}
7
8 {
9 print $ 1 "\ t" $ 7
10}
11
12 END {
13 print "This concludes the listing"
14}
usage:
$ gawk --f ga.gawk / etc / passwd
Run the effect, a little surprise.
19.2 Sed Editor Basics
19.2.1 More Replacement Options
1. Replace command syntax:
s / pattern / replacement / flags
flags has four options:
Number: indicates where the new text will replace the pattern match.
g: Replace all matching text
p: represents the original content to be printed
w file: write the replacement result to a file
such as:
$ sed ‘s / old / new / 2’ data.txt means the second old is replaced with new.
$ sed ‘s / old / new / g’ data.txt means replace all
$ sed ‘s / old / new / p’ data.txt prints lines that match the pattern specified in the replace command
$ sed -n ‘s / old / new / p’ data.txt -n will suppress sed editor output
$ sed ‘s / old / new / w data.bak’ data.txt wReplacing the tag produces the same output, but saves the output to the specified file.
The normal output of the sed editor is in STDOUT, and only those lines that match the pattern will be saved in the specified output file. (If data.txt has 3 lines, the first and third lines have old, and the second line does not, then the second line will not be saved in the specified output file.
Replacement character
Sometimes encounter characters that are not convenient to use in the replacement mode, such as /
For example, if you want to replace the bash shell in / etc / passwd with a c shell, you can do this:
$ sed ‘s / \ / bin \ / bash / \ / bin \ / csh /’ / etc / passwd
// Probably this format: s / new / old /
Workaround: Run the replacement string delimiter in the command with another character, such as an exclamation mark.
$ sed ‘s! / bin / bash! / bin / csh!’ / etc / passwd
Other symbols are also acceptable, such as commas. This makes path names easier to understand and read.
19.2.2 Using Address
The default command applies to all lines, but you can also apply commands to specific lines.
Workaround: Row addressing.
There are two methods of row addressing:
1) Row interval has been expressed in digital form
2) Use text mode to filter out lines
Command format:
[address] command
You can also group multiple commands at a specific address
[address] {
command1
command2
command3
}
Digital line addressing
The specified address can be a single line number, such as:
$ sed ‘2s / old / new /’ data.txt // Replace only line 2
It can also be a line within a range starting with comma plus ending, for example:
$ sed ‘2,4s / old / new /’ data.txt // Replace lines 2-4
$ sed ‘2, $ s / old / new /’ data.txt // Replace 2 to the end line, $ means the last line
2. Use text mode filters
Allows you to specify a text pattern to filter out lines that the command will act on. The format is as follows:
/ pattern / command
The pattern to be specified must be enclosed with a forward slash. The sed editor applies this command to lines containing the specified text pattern.
For example, modify only the line containing xcy:
$ sed ‘/ xcy / s / bash / csh /’ / etc / passed
The red text is equivalent to / pattern /, and the purple text is command.
The sed editor uses a feature called regular expressions in text patterns to help you create better matching patterns. Is the pattern above.
3. Command combination
You can also control multiple commands to run on a specified line.
such as:
$ sed ‘2 {s / old / new /; s / dog / cat /}’ data.txt
$ sed ‘3, $ {s / old / new /; s / dog / cat /}’ data.txt
19.2.3 Delete Row
s command is text replacement
d command is to delete the line
such as:
$ sed ‘d’ data.txt // delete the specified line
$ sed ‘2d’ data.txt // delete line 2
$ sed ‘2, $ d’ data.txt // delete 2 to the last line
The pattern matching feature also applies to delete commands:
$ sed ‘/ xcy / d’ data.txt // delete the line containing xcy
It is not actually deleted in the file, it is just deleted in the sed command output.
19.2.4 Inserting and Appending Text
sed editor allows inserting and appending text lines to the data stream
Insert: command (i) adds a new line before the specified line
Attach Append: command (a) adds a new line after the specified line
They cannot be used on a single command line, you must specify whether to insert or append a line to another line.
The format is as follows:
sed ‘[address] command \ new line’
For example: // Here is the line inserted before line 2
$ echo “This is line 2” | sed ‘i \ This is line 1’
Or: // Append to the next line of line 2
$ echo “This is line 2” | sed ‘a \ This is line 1’
To insert or append data to a data stream, you must indicate where to add it.
$ sed ‘2a / this is append line’ data.txt
$ sed ‘3i / this is append line’ data.txt
The following example is to add two lines, press this key after this is insert line 2 'data.txt to run the command.
Adding multiple lines must use backslashes for each line in the cherub or additional new text, like this is insert line 1 \ below
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed '1i \
this is insert line 1 \
this is insert line 2 'data.txt
this is insert line 1
this is insert line 2
This is line 1
This is line 2
This is line 3
This is line 4
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
19.2.5 Modifying Lines
Change (change) allows to modify the contents of the entire line in the data stream. It is the same as inserting additional working mechanisms. You must specify a new line separately in the sed command.
$ sed ‘2c \ This is change line.’ data.txt
$ sed ‘2,3c \ This is change line.’ data.txt // This will replace 2 or 3 lines with one line.
You can also use text mode for addressing: modify the line 3 line. This will modify multiple lines. If multiple lines are matched.
$ sed ‘/ line 3 / c \ This is change line.’ data.txt
19.2.6 Conversion Command
The transform command (y) is the only sed command editor command that can handle a single character.
The format is as follows:
[address] y / inchars / outchars /
The conversion command performs a one-to-one mapping of the inchars and outchars values.
The first character of inchars will be converted to the first character of outchars
The second character of inchars will be converted to the second character of outchars
And so on.
The length of inchars and outchars must be the same, otherwise an error will be reported.
example:
$ sed ‘y / 123 / abc /’ data.txt // 1-> b 2-> b 3-> c
$ sed ‘2,3y / 123 / abc /’ data.txt // can also specify the line
The conversion command is a global command that automatically converts all specified characters found in a text line, regardless of where they appear
19.2.7 Review Print
Three commands can also be used to print information in the data stream:
p command to print text lines
The equal sign (=) command is used to print line numbers
l (lowercase L) is used to list lines
Print line
$ echo “This is test” | sed ‘p’
Print lines in the file. The -n option is used to suppress the output of the sed editor.
$ sed --n ‘2,3 / p’ data.txt
$ sed --n ‘/ line 2 / p’ data.txt // match lines in text pattern
Here is a complicated usage,
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed -n '/ 3 / {
> p
> s / line / new_line / p
>} 'data.txt
This is line 3
This is new_line 3
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
First find the line containing the number 3, then output, and then replace the line of the specified line with new_line and output. The output shows both the original line text and the new line text.
2.Print line number
$ sed ‘=’ data.txt
You can also print the content and line number containing the specified text: print the line number and content containing line 3.
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed -n '/ line 3 / {
> =
> p
>} 'data.txt
3
This is line 3
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
3. List the lines
l Can print text and non-printable ASCII characters in the data stream.
$ sed ‘l’ data.txt
A newline character at the end of a line is replaced by a dollar sign.
19.2.8 Processing Files with sed
Write to file
The w command is used to write lines to a file, the format is as follows:
[address] w filename
The filename can be a relative path or an absolute path. The file needs to have write permission.
example:
$ sed ‘2,3w write.txt’ data.txt // read 2 or 3 lines and write to write.txt
$ cat write.txt
$ sed ‘/ xiaochongyong / w write.txt’ data.txt // read the line containing xiaochongyong and write to write.txt
$ cat write.txt
2. Read data from a file
The read command (r) allows you to insert data from a separate file into the data stream.
The format is as follows:
[address] f filename
When using an address range in a read command, you can only specify a single line number or text mode address. The sed editor inserts text in the file after the specified address.
example:
$ sed ‘3r read.txt’ data.txt // insert the data of read.txt into the third line of data.txt
$ sed ‘$ r read.txt’ data.txt // Add text at the end of the data stream
$ sed ‘/ xiaochongyong / r read.txt’ data.txt // You can also use text matching
You can also use it like this:
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed '/ line 2 / {
r read.txt
> d
>} 'data.txt
This is line 1
This is read line 1
This is read line 2
This is read line 3
This is line 3
This is line 4
xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
Underlined are the contents of read.txt.
This example first finds the line containing line 2, then reads the contents of read.txt, and then deletes the line containing line 2.
19.3 Summary
The key to using sed and gawk programs is how to use regular expressions. Regular expressions are the key to creating custom filters for extracting and processing data in text files.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.