Early knowledge of sed and gawk in Linux Command Line and shell script Programming

Source: Internet
Author: User
These two tools can greatly simplify the data processing tasks required.

19.1 Text Processing
Simple command-line editing for easily formatting, inserting, modifying, or deleting text elements.

sed and gawk have the above functions

19.1.1 sed editor
Called the stream editor.

The stream editor edits the data stream before it is processed by a set of rules provided by the editor.

sed processes the data in the data stream according to commands. Commands can be entered from the command line or stored in a command text file.

The sed editor does the following:

1) Read one line of data from the input at a time

2) Match the data according to the provided editing commands

3) Modify the data in the stream according to the command

4) Output new data to STDOUT

After sed matches all commands with one line of data, it reads the next line and repeats the process.

The format of the sed command is as follows:

sed options script file

Options allow you to modify the behavior of the sed command. The options that can be used are in the following table:

Options | Description

-e script | Adds the command specified in the script to the existing command when processing input

-f file | Adds commands specified in file to existing commands when processing input

-n | Does not produce command output, use printf command to complete the output

It is usually a single command. If multiple commands are required, add the -e option and separate them with a semicolon. There must be no space between the end of the command and the semicolon.

1. Define editor commands on the command line

By default, the sed editor applies the specified command to the STDIN input stream. This can directly input data to the sed command for processing through the pipeline.

$ echo “hahaha, I am xiaochongyong” | sed ‘s / xiaochongyong / Kobe Bryant /’

Replaced xiaochongyong with Kobe Bryant.

sed sends results to STDOUT

You can also modify the data in the specified file like this:

$ sed ‘s / dog / cat /’ my.txt

// Replace the dog of my.txt with cat output. This does not change my.txt. It outputs to STDOUT

2. Use multiple editing commands

Need to use -e option;

$ sed --e ‘s / dog / cat /; s / red / yellow /’ my.txt

The sed command applies each command specified to each line in the text file.

3. Read editing commands from a file

Need to specify the file with the -f option.

sed will read the command in the specified file and apply it to each line in the data file

such as:

file.sed has

s / dog / cat /

s / red / blue /

s / xiao / yang /

Can be used like this:

$ sed --f file.sed my.txt.

Tip: You can use .sed as the extension for the sed script file

19.1.2 gawk program
It provides a programming language instead of just editor commands. In the gawk programming language, you can do:

1) Define variables to save data

2) Use arithmetic and string operators to process data

3) Use the concept of structured programming to add processing logic to data processing

4) Generate a formatted report by extracting the data elements from the data file, rearranging or formatting them.

The report generation capabilities of the gawk program are often used to extract data elements from large text files and format them into readable reports. For example, format the log file to find the error lines in the log file.

1.gawk command format

gawk options program file

Here is a description of the available options:

-F fs specifies the field delimiter for dividing data fields in a row

-f file read program from specified file

-v var = value defines a variable var and sets the default value

-mf N specifies the maximum number of fields in the data file to be processed

-mr N specifies the maximum number of data lines in the data file

-w keyword specifies gawk compatibility mode or warning level

Its power lies in program scripts, which can write scripts to read data in text lines, then process and display the data, and create any type of output report.

2. Read the program script from the command line

The gawk program script is defined with a pair of curly braces. The script command must be placed between the two curly braces {}.

such as:

$ gawk ‘{print“ hello, shell ”}’

Because no file name is specified, this gawk program will receive data from STDIN, and the runtime will always wait for text input from STDIN.

Ctrl + D will generate an EOF character in bash, which can terminate the program.

3. Using field field variables

It automatically assigns a variable to each data element in a row.

such as:

$ 0 for the entire line of text

$ 1 represents the first data field in the text line

$ 2 represents the second data field in the text line

$ n represents the nth data field in the text line

example:

$ gawk ‘{print $ 2}’ data.txt // Output the second data field of all rows of data.txt

By default, they are separated by spaces, and other separators can be specified.

such as:

$ gawk –F: ‘{print $ 1}’ / etc / passwd // Output the first field of / etc / passwd, separated by a colon.

4. Using multiple commands in a program script

Just put a semicolon in the middle of the command

$ echo “My name is xcy” | gawk ‘{% 2 =“ age ”; $ 4 = 23; print $ 0}’

Note: Double quotes are required around age. There can be spaces before and after =.

5. Read the program from the file

For example, a script test.gawk reads as follows:

{print $ 1 “’ s home is ”$ 6}

usage:

$ gawk --F: -f test.gawk / etc / passwd

The content of test.gawk can also be like this: This is equivalent to specifying multiple commands, and each command is released on a separate line.

{

text = ”’ s home is ”

print $ 1 text $ 6

}

6. Run the script before processing the data

gawk also allows specifying when a program script runs.

By default gawk reads a line of text from the input and then executes a program script on that line of data.

Sometimes you need to run a script before processing the data. BEGIN is used to do this.

such as:

$ gawk 'BEGIN {print "The test3 file:"} {print $ 0}' test3

The highlighted part is the command, and the red text is processed by BEGIN. BEGIN is followed by a script, which also needs to be enclosed in {}.

7. Run the script after processing the data

With BEGIN Lacey, you can also specify a script to be executed after reading the data.

such as:

$ gawk 'BEGIN {print "The test3 file:"} {print $ 0} END {print "The file End"}' test3

Red is the script specified by END.

example:

File ga.gawk:

   1 BEGIN {
   2 print "The latest list of users and shells"
   3 print "UserID \ t Shell"
   4 print "--------- \ t ---------"
   5 FS = ":"
   6}
   7
   8 {
   9 print $ 1 "\ t" $ 7
  10}
  11
  12 END {
  13 print "This concludes the listing"
  14}
 
usage:

$ gawk --f ga.gawk / etc / passwd

Run the effect, a little surprise.

19.2 Sed Editor Basics
19.2.1 More Replacement Options
1. Replace command syntax:

s / pattern / replacement / flags

flags has four options:

Number: indicates where the new text will replace the pattern match.

g: Replace all matching text

p: represents the original content to be printed

w file: write the replacement result to a file

such as:

$ sed ‘s / old / new / 2’ data.txt means the second old is replaced with new.

$ sed ‘s / old / new / g’ data.txt means replace all

$ sed ‘s / old / new / p’ data.txt prints lines that match the pattern specified in the replace command

$ sed -n ‘s / old / new / p’ data.txt -n will suppress sed editor output

$ sed ‘s / old / new / w data.bak’ data.txt wReplacing the tag produces the same output, but saves the output to the specified file.

The normal output of the sed editor is in STDOUT, and only those lines that match the pattern will be saved in the specified output file. (If data.txt has 3 lines, the first and third lines have old, and the second line does not, then the second line will not be saved in the specified output file.


Replacement character

Sometimes encounter characters that are not convenient to use in the replacement mode, such as /

For example, if you want to replace the bash shell in / etc / passwd with a c shell, you can do this:

$ sed ‘s / \ / bin \ / bash / \ / bin \ / csh /’ / etc / passwd

// Probably this format: s / new / old /

Workaround: Run the replacement string delimiter in the command with another character, such as an exclamation mark.

$ sed ‘s! / bin / bash! / bin / csh!’ / etc / passwd

Other symbols are also acceptable, such as commas. This makes path names easier to understand and read.


19.2.2 Using Address
The default command applies to all lines, but you can also apply commands to specific lines.

Workaround: Row addressing.

There are two methods of row addressing:

1) Row interval has been expressed in digital form

2) Use text mode to filter out lines

Command format:

[address] command

You can also group multiple commands at a specific address

[address] {

         command1

         command2

         command3

}

Digital line addressing

The specified address can be a single line number, such as:

$ sed ‘2s / old / new /’ data.txt // Replace only line 2

It can also be a line within a range starting with comma plus ending, for example:

$ sed ‘2,4s / old / new /’ data.txt // Replace lines 2-4

$ sed ‘2, $ s / old / new /’ data.txt // Replace 2 to the end line, $ means the last line

2. Use text mode filters

Allows you to specify a text pattern to filter out lines that the command will act on. The format is as follows:

/ pattern / command

The pattern to be specified must be enclosed with a forward slash. The sed editor applies this command to lines containing the specified text pattern.

For example, modify only the line containing xcy:

$ sed ‘/ xcy / s / bash / csh /’ / etc / passed

The red text is equivalent to / pattern /, and the purple text is command.

The sed editor uses a feature called regular expressions in text patterns to help you create better matching patterns. Is the pattern above.

3. Command combination

You can also control multiple commands to run on a specified line.

such as:

$ sed ‘2 {s / old / new /; s / dog / cat /}’ data.txt

$ sed ‘3, $ {s / old / new /; s / dog / cat /}’ data.txt

19.2.3 Delete Row
s command is text replacement

d command is to delete the line

such as:

$ sed ‘d’ data.txt // delete the specified line

$ sed ‘2d’ data.txt // delete line 2

$ sed ‘2, $ d’ data.txt // delete 2 to the last line

The pattern matching feature also applies to delete commands:

$ sed ‘/ xcy / d’ data.txt // delete the line containing xcy

It is not actually deleted in the file, it is just deleted in the sed command output.

19.2.4 Inserting and Appending Text
sed editor allows inserting and appending text lines to the data stream

Insert: command (i) adds a new line before the specified line

Attach Append: command (a) adds a new line after the specified line

They cannot be used on a single command line, you must specify whether to insert or append a line to another line.

The format is as follows:

sed ‘[address] command \ new line’

For example: // Here is the line inserted before line 2

$ echo “This is line 2” | sed ‘i \ This is line 1’

Or: // Append to the next line of line 2

$ echo “This is line 2” | sed ‘a \ This is line 1’

To insert or append data to a data stream, you must indicate where to add it.

$ sed ‘2a / this is append line’ data.txt

$ sed ‘3i / this is append line’ data.txt

The following example is to add two lines, press this key after this is insert line 2 'data.txt to run the command.

Adding multiple lines must use backslashes for each line in the cherub or additional new text, like this is insert line 1 \ below

 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed '1i \
 this is insert line 1 \
 this is insert line 2 'data.txt
 this is insert line 1
 this is insert line 2
 This is line 1
 This is line 2
 This is line 3
 This is line 4
 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
 
19.2.5 Modifying Lines
Change (change) allows to modify the contents of the entire line in the data stream. It is the same as inserting additional working mechanisms. You must specify a new line separately in the sed command.

$ sed ‘2c \ This is change line.’ data.txt

$ sed ‘2,3c \ This is change line.’ data.txt // This will replace 2 or 3 lines with one line.

You can also use text mode for addressing: modify the line 3 line. This will modify multiple lines. If multiple lines are matched.

$ sed ‘/ line 3 / c \ This is change line.’ data.txt

19.2.6 Conversion Command
The transform command (y) is the only sed command editor command that can handle a single character.

The format is as follows:

[address] y / inchars / outchars /

The conversion command performs a one-to-one mapping of the inchars and outchars values.

The first character of inchars will be converted to the first character of outchars

The second character of inchars will be converted to the second character of outchars

And so on.

The length of inchars and outchars must be the same, otherwise an error will be reported.

example:

$ sed ‘y / 123 / abc /’ data.txt // 1-> b 2-> b 3-> c

$ sed ‘2,3y / 123 / abc /’ data.txt // can also specify the line

The conversion command is a global command that automatically converts all specified characters found in a text line, regardless of where they appear

19.2.7 Review Print
Three commands can also be used to print information in the data stream:

p command to print text lines

The equal sign (=) command is used to print line numbers

l (lowercase L) is used to list lines

Print line

$ echo “This is test” | sed ‘p’

Print lines in the file. The -n option is used to suppress the output of the sed editor.

$ sed --n ‘2,3 / p’ data.txt

$ sed --n ‘/ line 2 / p’ data.txt // match lines in text pattern

Here is a complicated usage,

 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed -n '/ 3 / {
 > p
 > s / line / new_line / p
 >} 'data.txt
 This is line 3
 This is new_line 3
 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
 
First find the line containing the number 3, then output, and then replace the line of the specified line with new_line and output. The output shows both the original line text and the new line text.

2.Print line number

$ sed ‘=’ data.txt

You can also print the content and line number containing the specified text: print the line number and content containing line 3.

 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed -n '/ line 3 / {
 > =
 > p
 >} 'data.txt
 3
 This is line 3
 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
 
3. List the lines

l Can print text and non-printable ASCII characters in the data stream.

$ sed ‘l’ data.txt

A newline character at the end of a line is replaced by a dollar sign.

19.2.8 Processing Files with sed
Write to file

The w command is used to write lines to a file, the format is as follows:

[address] w filename

The filename can be a relative path or an absolute path. The file needs to have write permission.

example:

$ sed ‘2,3w write.txt’ data.txt // read 2 or 3 lines and write to write.txt

$ cat write.txt

$ sed ‘/ xiaochongyong / w write.txt’ data.txt // read the line containing xiaochongyong and write to write.txt

$ cat write.txt

2. Read data from a file

The read command (r) allows you to insert data from a separate file into the data stream.

The format is as follows:

[address] f filename

When using an address range in a read command, you can only specify a single line number or text mode address. The sed editor inserts text in the file after the specified address.

example:

$ sed ‘3r read.txt’ data.txt // insert the data of read.txt into the third line of data.txt

$ sed ‘$ r read.txt’ data.txt // Add text at the end of the data stream

$ sed ‘/ xiaochongyong / r read.txt’ data.txt // You can also use text matching

You can also use it like this:

 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $ sed '/ line 2 / {
 r read.txt
 > d
 >} 'data.txt
 This is line 1
 This is read line 1
 This is read line 2
 This is read line 3
 This is line 3
 This is line 4
 xcy @ xcy-virtual-machine: ~ / shell / 19zhang $
 
Underlined are the contents of read.txt.

This example first finds the line containing line 2, then reads the contents of read.txt, and then deletes the line containing line 2.

19.3 Summary
The key to using sed and gawk programs is how to use regular expressions. Regular expressions are the key to creating custom filters for extracting and processing data in text files.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.