First knowledge of SED and gawk

Source: Internet
Author: User
Tags first string stdin

Handle any type of data in a shell script, using the SED gwak

Automatically process text in a text file ...


=====

SED Editor:

Also known as the Flow Editor (stream editor), it is exactly the opposite of a normal interactive text editor. In an interactive editor such as VIM, you can use keyboard commands to interactively insert, delete, or replace text in the data. The Liu Editor will edit the data flow based on a set of predefined rules before the editor processes the data ...

The SED editor can process data in a data stream based on commands entered into the command line or stored in a command text file.   It reads a row from the input, matches the data with the supplied editor command, modifies the data in the stream as specified in the command, and then prints the resulting data to the stdout. After the flow editor matches the command to a row of data, he reads the next line of commands and repeats the process ... When the flow editor finishes processing all the data in the stream, it terminates ...

So, the commands are all on one line, and you have to make changes to the text at once, so it's fast ...

SED format:

Sed options Script File

Options

-e script adds the command specified in the script to the running command when processing input

-F file adds the command specified in file to the running command when processing input

-n do not generate output for each command, wait for the print command to output ...


The script parameter specifies a single command that will act on the stream data. If you need to use more than one command, you must use the-e option to specify them on the command line, or use the-f option to specify them in a separate file. There are a number of commands that can be used to process data.


----------

1. Using sed at the command line

By default, the SED editor applies the specified command to the stdin input stream.

[Oh@localhost shell]$ Echo ' This is a test ' | Sed ' s/test/sed test/' is
a sed test
[oh@localhost shell]$ 

Here, I used the pipeline, SED, to use the S command: s/aa/bb/replaces all AA with BB

[Oh@localhost shell]$ cat Testfile This is the ' the ' is the ' the ' is the ' the ' is ' the ' is '
a test
ond 
7/>this is the "End"
[oh@localhost shell]$ sed ' s/this/that/' testfile that's is the '
St Ond line This is the "third line" is the "end line
[Oh@localhost shell]$ 
It's very fast.
[Oh@localhost shell]$ cat Testfile This is the ' the ' is the ' the ' is the ' the ' is ' the ' is '
a test
ond 
4/>this is the "End"
[oh@localhost shell]$ sed ' s/this/that/' testfile ' is ' the ' the ' the ' 's ' is the '
T ond line "that" is the "third line" is the "end"
[oh@localhost shell]$ cat testfile This is 
t He-a-is-a-test ond line This is the ' third line ' is the ' end line
[Oh@localhost s hell]$ 

You can see that this does not affect the original file ... SED will only send the modified data to the STDOUT.


Now I want to use multiple SED statements on the command line:

Use the-e option

Sed-e ' s/bra/under/;s/asd/er/' file

[Oh@localhost shell]$ cat Testfile This is the ' the ' is the ' the ' is the ' the ' is ' the ' is '
a test
ond 
4/>this is the "End"
[oh@localhost shell]$ sed-e ' s/this/that/s/is/are/' testfile that 
are />thare is a test ond line that are the third line this are the end line
[Oh@localhost shell]$ 

The command is separated by a number, with no space between the head and the tail.


Of course you can also use the secondary prompt:> in the shell without using a semicolon;

[Oh@localhost shell]$ sed-e '
> s/this/that/
> S/is/are/
> '
testfile
testfile





Be sure to end the command on the ' No ' line. Bash Shell once the single quotation mark is closed ...

So the top is wrong.

[Oh@localhost shell]$ sed-e '
> s/th/aa/
> s/is/yu/
> ' testfile
aayu is ' >thyu is a test ond line aayu are the third line Aayu are the end line
[Oh@localhost shell]$ 

That's all.

=======

Read the edit command from a file

Which is to put a lot of sed processing commands in a file.

[Oh@localhost shell]$ cat sedd
s/this/that/
s/is/are/
[Oh@localhost shell]$ 

It's just so casual. There is no semicolon at the end of the command.

Then: Sed-f sedd testfile

[Oh@localhost shell]$ cat sedd
s/this/that/
s/is/are/
[oh@localhost shell]$ sed-f sedd the testfile that 
Are the The "the" thare is a test ond line that are the third line that are the end line
[oh@local Host shell]$ 

=========

Introduce first: gawk

It provides a class programming environment that allows you to modify and rearrange the data in a file, more advanced than SED.

The GNU version of the original awk program from UNIX ... Gawk let the flow editor to a new level, no longer just command processing, but a programming language ...


Apply a lot to generate reports, format log files ...

Gawk Options Program File


Options

-F FS Specifies a field separator in a row that separates data fields

-F file Specifies the file name of the Read program

-V Var=value defines a variable in the Gawk program and its default value

-MF N Specifies the maximum number of fields in the file to be processed

-MR N Specifies the maximum number of data rows in the data file

-W keyword specifies gawk compatibility mode or warning level

----------

To use gawk from the command line:

Need to use curly braces {inside the command} ' to wrap {} '

When you: Gawk ' {print ' Hello Oh '} '

Print is a built-in command for Gawk

If this is the only way to run, Gawk will wait for the input from stdin until you send a signal that the stream is over: eof:end-of-file

Keyboard Press Ctrl+d

[Oh@localhost shell]$ gawk ' {print ' Hello Oh '} '
oh
hello oh
hi
Hello oh
aaaaaa
Hello Oh
[ Oh@localhost shell]$ 

---

To use a data field variable:

For data in one text, Gawk automatically assigns a variable to each element in each row. By default, variables are assigned as follows:

$ represents the entire line of text

The first data field in a text line

Second data field in a $ $ line of text

$n the nth data field in a line of text


Fields are divided by field delimiters ...

That is, the data in the text is processed by the gawk, and the default field delimiter is any white space character (such as a space or a tab)

[Oh@localhost shell]$ cat TF This is the ' the ' This is the ' the ' the ' is the '
S is the "End
" [Oh@localhost shell]$ gawk ' {print} ' tf this to this
ond
[ Oh@localhost shell]$ 

[Oh@localhost shell]$ cat TF This is the ' the ' This is the ' the ' the ' is the '
S is the "End
" [Oh@localhost shell]$ gawk ' {print} ' tf this to this
ond
[ Oh@localhost shell]$ gawk ' {print $} ' TF This is the "This is the" is the "the" 
Hird line This is the "End
" [oh@localhost shell]$ gawk ' {print} ' tf this this
ond
This
is
[Oh@localhost shell]$ 

You can see the nth fields that output the entire file


Gawk-f: ' {print $} '/etc/passed

The delimiter specified is:-F:

[Oh@localhost shell]$ gawk-f: ' {print $} '/etc/passwd
root
bin
daemon
adm
...
nfsnobody
abrt
GDM
Tomcat
webalizer
sshd
mysql
tcpdump
oprofile
Oh
[Oh@localhost shell]$ 


Execute more than one command:

Use semicolons or >

[Oh@localhost shell]$ echo "My name is Oh" | Gawk ' {$4= ' HHH '; print $} ' My 
name is HHH
[Oh@localhost shell]$ gawk ' {
> $4= ' ohhh '
> Print $} ' kkdfkds sjfksj sjfsklf fsfls//  I entered
KKDFKDS sjfksj SJFSKLF ohhh//It output
ksjfkljj JJ JJ jj//I typed
ksjfkljj JJ JJ ohhh//it output
o o o//i entered the
o o O hhh//it output
[Oh@localhost shell]$ 

-------

To write a command in a file:

Cat ASD:

{

Test= "Oh Oh"

Print $ test $

}
Use: Gawk-f:-F ASD/ETC/PASSWD


No need to use the $ symbol, there are a lot of commands in a curly brace, no, just another line ...


-------

To run a script before processing data:

Gawk ' BEGIN {print ' Hello World '} '

Sometimes you might want to run a script before you work with data, such as creating the first part of the report ... The BEGIN keyword has this feature.

He will force Gawk to execute the program script specified after the BEGIN keyword before reading the data:

[Oh@localhost shell]$ gawk ' BEGIN {print ' Hello World '} '
Hello world
[Oh@localhost shell]$ 

Shows that Hello world will quickly exit without waiting for any data input. The line gawk command for the BEGIN keyword is used only to display text,

The script to process the data has to be written somewhere else ...

[Oh@localhost shell]$ cat TF This is the ' the ' This is the ' the ' the ' is the '
S is the end line
[Oh@localhost shell]$ 

Gawk ' BEGIN {print ' The data4 file contents: '} {print '} ' TF

Write with a {} but write in "'

Gawk ' BEGIN {print ' The data4 file contents: '} {print '} ' TF
[Oh@localhost shell]$ gawk ' BEGIN {print ' data4 file contents: '} {print $} ' tf the
data4 file contents: 
th is
this
ond
this


---------

Since there are in the begin so naturally also have the end keyword ....

End is used to process data before running

[Oh@localhost shell]$ gawk ' BEGIN {print ' data4 file contents: "} {print} end {print" End of File '} ' tf the
dat A4 File contents: This is ond this is end of
file
[oh@localhost shell]$ 


A small example.

[Oh@localhost shell]$ gawk-f script1/etc/passwd The latest list of users and shells UserID Shell----------Root /bin/bash bin/sbin/nologin daemon/sbin/nologin adm/sbin/nologin lp/sbin/nologin Sync/bin/sync Bin/shutdown halt/sbin/halt mail/sbin/nologin uucp/sbin/nologin operator/sbin/nologin games/sbin/nologin Go Pher/sbin/nologin ftp/sbin/nologin nobody/sbin/nologin dbus/sbin/nologin usbmuxd/sbin/nologin rpc/sbin/n Ologin rtkit/sbin/nologin avahi-autoipd/sbin/nologin vcsa/sbin/nologin apache/sbin/nologin haldaemon/sbin/n
Ologin ntp/sbin/nologin saslauth/sbin/nologin postfix/sbin/nologin pulse/sbin/nologin rpcuser/sbin/nologin			Nfsnobody/sbin/nologin abrt/sbin/nologin gdm/sbin/nologin tomcat/sbin/nologin webalizer/sbin/nologin sshd /sbin/nologin Mysql/bin/bash Tcpdump/sbin/nologin Oprofile/sbin/nologin Oh/bin/bash This concludes the Listin G [Oh@localhost shell]$Cat script1 BEGIN {print "The latest list of users and shells" print userid shell "print"----------"fs=": "} { 
 print ' $} end {print ' This concludes the listing '} [Oh@localhost shell]$

Assigning an FS variable to a script this is another way to define a field delimiter ...

As you can see, BEGIN is performed only once before text processing ...

End is processed only once after text has been executed

The middle command executes once for each line ...


----------

Back to the SED editor:

About the substitution command: s is the shorthand for the substitute command. But this is too simple ...

1. Replacement Mark:

Only the first string to match can be replaced by default ...

[Oh@localhost shell]$ echo "AA aa" |sed ' s/aa/bb/'
bb aa
[Oh@localhost shell]$ 

To replace the words, you have to use the replacement tag (substitution flag)

S/pattern/replacement/flags

There are four kinds of flag:

Number: The first few matches are replaced.

G: All matches to the content

W File: Writes the result of the substitution to the file ...

[Oh@localhost shell]$ echo "AA aa" |sed ' s/aa/bb/'
bb aa
[Oh@localhost shell]$ echo "AA" |sed ' s/aa/bb/2 '
AA BB
[Oh@localhost shell]$ echo "AA" |sed ' s/aa/bb/2 1 '
sed:-E expression #1, char 11:multiple number options T O ' command
[oh@localhost shell]$ echo "AA aa" |sed ' s/aa/bb/2,1 '
sed:-E expression #1, char 10:unknown option
to ' [Oh@localhost shell]$ echo ' AA ' |sed ' s/aa/bb/2;1 '
sed:-E expression #1, char 11:missing command
[o H@localhost shell]$ 
[Oh@localhost shell]$ echo "AA aa" |sed ' s/aa/bb/g '
bb bb

Output only replaced rows using p plus sed-n option;;-N Disables the SED editor output ... But P will output the modified rows ... This allows you to output only the rows that have been modified by the substitute command.

Normally, the output of sed is in stdout ... When you use W file, only rows containing matching patterns are saved to the specified output file.


2.

To replace some awkward characters:

When you want to replace the forward slash, you may encounter some problems, more trouble ...

have to use \

Sed ' s/\/etc/\/opt/'/etc/passwd

Poor readability ....

So, use. To replace the original/. As a string separator.

Sed ' s!/etc!/opt! '/etc/passwd


-------------

Use address ...

Row addressing (line addressing) when you don't want to match all the rows, but only certain rows.

There are two ways to address:

1. Range of numbers of rows.

2. Use text mode to filter the output line ...

Both of these are the following ways of ordering:

[Address] Command

Or:

Address {

Command1

Command2

Command3

}















Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.