Shang 15th Week SED study notes

Source: Internet
Author: User
Tags character classes posix regular expression first row egrep

Note: The code and images in this document are fromsed and awk(second edition)

Operation of a text

SED is a "non-interactive" character stream-oriented editor, and AWK is a programming language responsible for pattern matching.

A typical example of this is converting data to a formatted report.

Understanding the basic operations of sed awk

Example one: file1.txt

John Daggett, 341 King Road, Plymouth MA

Alice Ford, East Broadway, Richmond VA

Orville Thomas, 11345 Oak Bridge Road, Tulsa OK

Terry Kalkas, 402 Lans Road, Beaver Falls PA

Eric Adams, Post Road, Sudbury MA

Hubert Sims, 328A Brook Road, Roanoke VA

Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

Sal Carpenter, 6th Street, Boston MA

Replacing MA with Massachusetts

$ sed ' s/ma/massachusetts/' file1.txt

Using multiple Directives

$ sed ' s/ma/, massachusetts/; s/pa/, pennsylvania/' file1.txt

Or:

$ Sed-e ' s/ma/, massachusetts/'-e ' s/pa/, pennsylvania/'

File1.txt

Script: Sedsrc

s/ma/, massachusetts/

s/pa/, pennsylvania/

s/ca/,california/

s/va/, virginia/

s/ok/, oklahoma/

Using script files

$ sed-f sedsrc file1.txt

Save output

$ sed-f sedsrc file1.txt > Newfile.txt

Prevent input lines from appearing automatically

$ Sed-n ' s/ma/massachusetts/p ' file1.txt

Common error messages

"Does not match

s/src/dst/lack of final "/"

Using awk

Using script files

Awk-f script files

Print the first field of each line of the input file

$ Awk ' {print '} ' file1.txt

Print each line that matches this pattern

$ awk '/ma/' file1.txt

Limit output to only the first field of each record

$ Awk '/ma/{print '} ' file1.txt

changing separator characters

$ awk-f, '/ma/{print $ ' file1.txt

Use multiple commands, separated by a "semicolon"

$ awk-f, ' {print $; print $; print $} ' file1.txt

Common error messages

No curly braces {} Enclose the procedure

Don't use single quotes to "surround" instructions.

The regular expression is not enclosed with a slash//slashes.

Optiondescription

-F Filename of script follows.

-F Change Field separator.

-vvar=value follows.

Three understand the regular expression syntax Regular expression

Expression (pattern-matching)

An arithmetic expression:

1+23*5 1+2*3 (1+2)

A specific pattern:

ABCADC AEC ...

Ababb abbb abbbb abbbb ...

A regular expression describes a pattern or sequence of characters

The matching process of regular expressions

Metacharacters

. Matches any single character except newline characters, which can match line breaks in awk

* Match any one (including 0) characters in front of it

[...] Matches any one of the characters in the square brackets, ^ is a negative match,-represents the range of characters

^ as the first character of a regular expression, matches the beginning of the line. Line breaks can be embedded in awk

$ as the last character of the regular expression, matching the end of the line. Line breaks can be embedded in awk

\{n,m\} matches any number of times between N and M, and the \{n\} match occurs n times. \{n,\} matches at least

N Times Now

\ escape Character

Extended meta-characters

Extendedmetacharacters (Egrep and awk)

+ Match one occurrence or multiple occurrences of the preceding regular expression

? Matches 0 occurrences of the preceding regular expression or one occurrence

| Can match previous or subsequent regular expressions (alternatives)

() grouping Regular expressions

{N,m} matches the number of N to M occurrences, and the {n} match appears n times. {N,} matches appear at least n times and most awk is not supported for POSIX egrep and POSIX awk

3 steps to write a regular expression:

1 knows what to match and how it appears in the text.

2 Write a pattern to describe what to match

3 test mode to see what it matches

Results from pattern matching:

Hits (HIT)

This is the line I want to match.

Misses (Miss)

This is the line I don't want to match.

Omissions (omitted)

This is the line that I can't match but I want to match

Falsealarms (False alarm)

This is the line that I don't want to match, but it matches.

Character class

[Ww]hat

\. H[12345]

The range of characters

[A-z]

[0-9]

[Cc]hapter[1-9]

[-+*/]

[0-1] [0-9] [-/] [0-3] [0-9] [-/] [0-9] [0-9]

Exclude character classes

[^0-9]

Repeated occurrences of the character

10

50

100

500

1000

5000

[15]0*

[15]00*

The span of a character

* With \{n,m\}

Matching of phone numbers

[0-9]\{3\}-[0-9]\{7,8\}

Grouping operations

Compan (y|ies)

Note: Most sed and grep cannot match parentheses (), but in Egrep and

All versions of awk are available

Four writing sed scripts

Mode space

Sed-e ' s/pig/cow/'-e ' s/cow/horse/'

Global perspective on Addressing (addressable)

SED applies commands to each input line, which can specify 0, one, or two addresses. Each address

is a regular expression that describes a pattern, line number, or line addressing symbol.

Example File2.txt

. Ts

Beijing,cn

. TE

Shanghai,cn

Guangzhou,cn

Shenyang,cn

$ sed '/beijing/s/cn/china/' file2.txt

Delete all rows

D

Delete only the first row

1d

Delete the last line by using the addressing symbol $

$d

Delete empty lines, regular expressions must be enclosed in slash//

/^$/d

Delete. TBL input for TS and. TE tags

/^\. ts/,/^\.te/d

Delete all rows from line fifth to the end

5, $d

Mix line address and mode address

10

$ sed ' 1,/^$/d ' file2.txt

Delete rows other than those rows

1,5!d

grouping commands

/^\. ts/,/^\.te/{

s/cn/china/

s/beijing/bj/

}

Sed ' 2,3s{/cn/china/;s/a/b/} ' file.txt two substitutions of the same range can be enclosed in curly braces, with a semicolon in the middle

Five basic SED commands

The syntax of the SED command

[Address]command

The line address is optional for any command, it can be a pattern, or a regular expression enclosed by a slash, line number, or line addressing symbol, most SED commands can accept two addresses separated by commas, and some commands accept only a single row address

Commands can also be grouped with curly braces, and the first command can be placed on the same line as the curly braces.

But the closing brace must be on its own line

Replace

[Address]s/pattern/replacement/flags

Flag flags are:

n can be 1-512, which indicates that the nth occurrence is replaced

G Global Change

P Print Mode space content

Wfile write to a file

The replacement section will have a special meaning with the following characters:

& Replace with content matched by regular expressions

\ nthe callback parameter

$ cattest1

First:second

One:two

$ sed ' s/\ (. *\): \ (. *\)/\2:\1/' test1

Second:first

Two:one

Delete

[Address]d

Delete the contents of the schema space while changing the control flow of the script, after executing this command, in the "empty

"Mode space no longer has command execution. Deleting a command causes a new input row to be read

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.