Introduction to SED and awk

Source: Internet
Author: User

Awk is a programming language dedicated for text processing. yes, it is a programming language. It only serves text processing, so you cannot use it to write system software or scientific computing (of course, it can also do mathematical computing ), it can only be used for text processing. unlike SED, awk has the programming language features, including built-in functions, logical statements, and input and output statements. In fact, it looks like a C language, but all the functions are concentrated on text processing.
Unlike SED, the most powerful feature of awk is to process structured text, that is, the text has a certain organizational structure.

Command Format

Awk [-F value] [-V Var = value] 'program text' [files...]
Awk [-F value] [-V Var = value]-F Program-file [files...]

For example:

[alex@alexon:~]$awk '{print}' persons.txt 1011, Alex Perkins, Product, Software Developer3923, Jimmey Mills, Operation, COO23934, Kevin Kim, Management, CEO2321, Chris Paul, UI, Designer

See CAT again. What makes more sense:

[alex@alexon:~]$awk -F , -v OFS=: '{print $1, $2, $3, $4}' persons.txt 1011: Alex Perkins: Product: Software Developer3923: Jimmey Mills: Operation: COO23934: Kevin Kim: Management: CEO2321: Chris Paul: UI: Designer

Awk can recognize the text structure and format the output.
Program format

That is, the content in 'gram text' or program file:

Begin {actions}/pattern/{actions} end {actions}

Begin is executed before processing the file. It is called body loop in the middle. The end is executed after the processing is completed.
You can use \ to implement branch input:

Begin {action }\
/Pattern {action }\
End {action}

If it is written in a file, it can be written like a C language.

Program-file.awk:
Begin {
Actions;
}
/Pattern /{
Actions;
}
End {
Actions;

}


The awk execution method first executes the content in the begin segment, and then executes the body for each row of the file. After all rows are processed, the end segment is executed. that is to say, begin and end are executed only once, while the body loop needs to be executed many times, depending on the number of rows and pattern matching. because it needs to be executed multiple times, it is called the body loop.

Built-in Variables

Awk assumes that the input text is a structured text in the form of a table. Each row is a record and each column is a field ). when awk reads data, it processes the text in a structured manner, and some built-in variables are used:

FS-- The delimiter of the field separator field, which is separated by blank spaces by default
RS-- Delimiter of the record separator, which is separated by line breaks by default
Filename-- Current filename
NF-- Number of feilds in current record
NR-- Number of records input by record, equivalent to the same row number. Multiple files are incremented.
FNR-- File number of record input current number of records, calculated separately for each file
$0-- The whole record the current entire record
$ N-- The nth field of the current record and the nth Field

With these built-in variables, awk can process the text after reading the text, so as to break down the structured text: the input is converted into a structured information in the form of a table.
Right. The corresponding variables are also used to control the output format:

OFS
-- Domain Separator Used when ouput field separator outputs
ORS-- Output record separator: The record delimiter used for output

Statement (Actions) print statement

Output in the form of a string. Each variable is treated as a string. When separated by commas, OFS is used to separate the fields. If the fields are separated by spaces, OFS is used as the OFS:

[alex@alexon:~]$awk -F, 'BEGIN {OFS=";"} {print $1,$2,$3,$4}' persons.txt 1011; Alex Perkins; Product; Software Developer3923; Jimmey Mills; Operation; COO23934; Kevin Kim; Management; CEO2321; Chris Paul; UI; Designer[alex@alexon:~]$awk -F, 'BEGIN {OFS=";"} {print $1 $2 $3 $4}' persons.txt 1011 Alex Perkins Product Software Developer3923 Jimmey Mills Operation COO23934 Kevin Kim Management CEO2321 Chris Paul UI Designer

When print is not followed by a parameter, the current record is output.
Printf statement

It can be formatted and output very similar to the C language.

[alex@alexon:~]$awk -F, 'BEGIN {OFS=";"} {printf "%d: ", NR; print $1,$2,$3,$4}' persons.txt 1: 1011; Alex Perkins; Product; Software Developer2: 3923; Jimmey Mills; Operation; COO3: 23934; Kevin Kim; Management; CEO4: 2321; Chris Paul; UI; Designer

Programming Language

Similar to C. there are operators, built-in functions, and variables that can implement very powerful functions. This part is usually not used and is not clear in an article, you can refer to awk man documents or books. recommended <sed & awk> <SED and awk 101 hacks>

Regular Expression

Metacharacters include:^ $. [] | () * +?
Awk is the same as the standard regular expression:

Location character:

^ --- Beginning of Line
$ ---- End of line
. ---- Any non-linefeed '\ N'
\ B ---- end of a word. A word is defined as a series of letters or numbers. It can be placed at either end or two ends.

Limit character

* --- Zero or one or more
+ --- One or more
? --- 0 or 1
{M} --- M appears
{M, n} --- appears m to n times. For example, {} indicates 1 to 5 times (1, 2, 3, 4, 5 times)

Escape Character

\ --- Escape special characters

Character Set

[] --- Any character in it
[^] --- Match any character that is not in this character set

Operator

| ---- Or operation, ABC \ | 123 matches 123 or ABC
(...) ---- Combination, forming a group, mainly used for Indexing
\ N ---- the nth combination above,/\ (123 \) \ 1/matches 123123

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.