Awk is a programming language dedicated for text processing. yes, it is a programming language. It only serves text processing, so you cannot use it to write system software or scientific computing (of course, it can also do mathematical computing ), it can only be used for text processing. unlike SED, awk has the programming language features, including built-in functions, logical statements, and input and output statements. In fact, it looks like a C language, but all the functions are concentrated on text processing.
Unlike SED, the most powerful feature of awk is to process structured text, that is, the text has a certain organizational structure.
Command Format
Awk [-F value] [-V Var = value] 'program text' [files...]
Awk [-F value] [-V Var = value]-F Program-file [files...]
For example:
[alex@alexon:~]$awk '{print}' persons.txt 1011, Alex Perkins, Product, Software Developer3923, Jimmey Mills, Operation, COO23934, Kevin Kim, Management, CEO2321, Chris Paul, UI, Designer
See CAT again. What makes more sense:
[alex@alexon:~]$awk -F , -v OFS=: '{print $1, $2, $3, $4}' persons.txt 1011: Alex Perkins: Product: Software Developer3923: Jimmey Mills: Operation: COO23934: Kevin Kim: Management: CEO2321: Chris Paul: UI: Designer
Awk can recognize the text structure and format the output.
Program format
That is, the content in 'gram text' or program file:
Begin {actions}/pattern/{actions} end {actions}
Begin is executed before processing the file. It is called body loop in the middle. The end is executed after the processing is completed.
You can use \ to implement branch input:
Begin {action }\
/Pattern {action }\
End {action}
If it is written in a file, it can be written like a C language.
Program-file.awk:
Begin {
Actions;
}
/Pattern /{
Actions;
}
End {
Actions;
}
The awk execution method first executes the content in the begin segment, and then executes the body for each row of the file. After all rows are processed, the end segment is executed. that is to say, begin and end are executed only once, while the body loop needs to be executed many times, depending on the number of rows and pattern matching. because it needs to be executed multiple times, it is called the body loop.
Built-in Variables
Awk assumes that the input text is a structured text in the form of a table. Each row is a record and each column is a field ). when awk reads data, it processes the text in a structured manner, and some built-in variables are used:
FS-- The delimiter of the field separator field, which is separated by blank spaces by default
RS-- Delimiter of the record separator, which is separated by line breaks by default
Filename-- Current filename
NF-- Number of feilds in current record
NR-- Number of records input by record, equivalent to the same row number. Multiple files are incremented.
FNR-- File number of record input current number of records, calculated separately for each file
$0-- The whole record the current entire record
$ N-- The nth field of the current record and the nth Field
With these built-in variables, awk can process the text after reading the text, so as to break down the structured text: the input is converted into a structured information in the form of a table.
Right. The corresponding variables are also used to control the output format:
OFS
-- Domain Separator Used when ouput field separator outputs
ORS-- Output record separator: The record delimiter used for output
Statement (Actions) print statement
Output in the form of a string. Each variable is treated as a string. When separated by commas, OFS is used to separate the fields. If the fields are separated by spaces, OFS is used as the OFS:
[alex@alexon:~]$awk -F, 'BEGIN {OFS=";"} {print $1,$2,$3,$4}' persons.txt 1011; Alex Perkins; Product; Software Developer3923; Jimmey Mills; Operation; COO23934; Kevin Kim; Management; CEO2321; Chris Paul; UI; Designer[alex@alexon:~]$awk -F, 'BEGIN {OFS=";"} {print $1 $2 $3 $4}' persons.txt 1011 Alex Perkins Product Software Developer3923 Jimmey Mills Operation COO23934 Kevin Kim Management CEO2321 Chris Paul UI Designer
When print is not followed by a parameter, the current record is output.
Printf statement
It can be formatted and output very similar to the C language.
[alex@alexon:~]$awk -F, 'BEGIN {OFS=";"} {printf "%d: ", NR; print $1,$2,$3,$4}' persons.txt 1: 1011; Alex Perkins; Product; Software Developer2: 3923; Jimmey Mills; Operation; COO3: 23934; Kevin Kim; Management; CEO4: 2321; Chris Paul; UI; Designer
Programming Language
Similar to C. there are operators, built-in functions, and variables that can implement very powerful functions. This part is usually not used and is not clear in an article, you can refer to awk man documents or books. recommended <sed & awk> <SED and awk 101 hacks>
Regular Expression
Metacharacters include:^ $. [] | () * +?
Awk is the same as the standard regular expression:
Location character:
^ --- Beginning of Line
$ ---- End of line
. ---- Any non-linefeed '\ N'
\ B ---- end of a word. A word is defined as a series of letters or numbers. It can be placed at either end or two ends.
Limit character
* --- Zero or one or more
+ --- One or more
? --- 0 or 1
{M} --- M appears
{M, n} --- appears m to n times. For example, {} indicates 1 to 5 times (1, 2, 3, 4, 5 times)
Escape Character
\ --- Escape special characters
Character Set
[] --- Any character in it
[^] --- Match any character that is not in this character set
Operator
| ---- Or operation, ABC \ | 123 matches 123 or ABC
(...) ---- Combination, forming a group, mainly used for Indexing
\ N ---- the nth combination above,/\ (123 \) \ 1/matches 123123