Awk is a great tool, and if you don't learn it, the shell script is really hard to write ...
There are many different versions of awk, with the gawk (as I see it), which is used to retrieve files, find rows that match a specified pattern, and then perform the specified action on those lines.
The basic format is as follows:
Gwak Options ' pattern {action} ' file-list
Among them, File-list is the data source of gawk; options is a specific option; The main part of the "mode + action" must be placed in single quotes, and the action is enclosed in curly braces. The main part, if too many, can also be written into a file (named program) and set up, and then use
-F program instead of single quotes and their contents, of course, this is gawk more advanced usage, and the program itself has some of its own grammatical rules, later I will write an article, this article does not involve this piece, only briefly introduce some basic usage.
1. Basic
The general pattern and action are paired, and then there can be many such pairs in single quotes, that is, performing different actions in different modes, in fact the content written in the program is essentially these. If the default pattern selects all rows of a file by default, the default action prints the matching line to the screen. Two parts can be the default at the same time, but the single quotation mark or must be hit up, otherwise it will be error, and at the same time the default gawk will not do anything.
The processing of the file-list is done in one place, that is, the first execution of all the pattern-action on a file, and then the processing of the next, instead of using the same pattern-action to finish all the files at once, then use the next Pattern-action to process all files.
2. Pattern
The basic form is a string surrounded by two slashes, such as/abc/, and then gawk retrieves the line containing the string. patterns must be surrounded by slashes, and bare strings do not work . A string can also be represented as a regular expression, using various meta-characters (such as ^, representing the beginning of a line) and logical operators (such as | , expressed or). Pattern can also be used as a logical operation symbol ( such as | |, unlike the "or" form used inside the slash ) to concatenate two patterns together. Such as:
Gawk '/^ab/' file printing starts with a line of AB
The pattern can also be reversed, like grep's-v option, by adding "!" in front of the first slash. ", as follows
Gawk '!/^ab/' file prints lines that start with not AB
The more common form of pattern is probably the field and the variable, the preceding pattern type is only coarse with the entire line to do the match, as long as the line has a pattern of figure, then this line will be selected, then action is also in the behavior unit. But using fields and variables can go deep into the fine-grained operations inside the line , which is actually more gawk than grep and SED, and it's better at dealing with structured data, where the data in file is arranged "neatly" in some format, just like in Excel. , with the embryonic form of a "cell". For example, in the terminal input NETSTAT-APN, the following screen will appear:
Proto recv-q send-q Local address Foreign address State Pid/program Name
TCP 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN-
TCP 0 0 0.0.0.0:139 0.0.0.0:* LISTEN-
TCP 0 0 127.0.1.1:53 0.0.0.0:* LISTEN-
TCP 0 0 127.0.0.1:631 0.0.0.0:* LISTEN-
TCP 0 0 0.0.0.0:538 0.0.0.0:* LISTEN-
Each row of data is "formatted" to form a unified structure, where the data is divided into different fields , where there is a distinct delimiter between the fields, or a tab or a space or other form.
In pattern, the field-level data can be referenced in the form of $ $4 ..., $n represents a row from the Nth field to the right. So we can match to the field level instead of the whole line. The matching method is $n ~/str/, for example:
Gawk ' $ $ ~/man/' file print the second field contains "Man" Row
gawk ' $!~/man/' file print The second field does not contain a "man" row (mode inversion)
There are also two unique patterns, the begin and end, which do not match any row of file, just a pit . Begin means that the action,end associated with the file is executed before retrieving it, indicating that the action associated with it is executed after all the rows of file have been processed.
3. Action
The action must be enclosed in curly braces, which has been emphasized earlier. The action can be a command or multiple commands, but the commands are separated by semicolons.
Action is rich and varied, essentially a small shell script that can be used to perform a variety of command combinations, as well as control structures such as if, and define variables. However, as mentioned earlier, if the default action is the Print command, the printing of the relevant content to standard output is actually performed. can also be explicitly specified, such as: Gawk ' {print} ' file.
4. Option options
The main thing is-F and-F, the former has been said before, the role of the latter is to specify the delimiter of each row. By default, the delimiter for each row is tab, but some files use a variety of other delimiters, such as colons, which must be re-specified, or gawk cannot recognize different fields. As follows:
Gawk-f: ' {print} ' file This will read the colon view delimiter to the file's data
If you have more than one delimiter, you need to enclose the delimiter in square brackets, such as:
Gawk-f [:-] ' {print} ' file treats: and-both as delimiters, note that this does not mean that they are together as delimiters, but each is a separate delimiter.
5. Built-in variables
In Gawk, you can define variables within curly braces, and in addition to gawk, there are some variables that are defined internally, which can be used directly, such as the $ $ $ just. Here are the other:
$ $ This variable represents the current entire line of content (also known as a record), such as Gawk '/man/{print $} ' file will print a matching entire line of content (OK, this is the default option for printing); but gawk '/man/{prin T ' $ ' file prints only the first field of a matching row.
NF represents the number of fields in each row
Nr represents the number of the line currently being processed, and the NR automatically adds 1 for each line processed
FileName represents the file name currently being processed (null represents standard input)
FS represents the field delimiter when reading a file (default is a space or line break)
OFS represents the field delimiter used when outputting (default is a space)
ORS represents output with record (row) delimiter (default is line wrapping)
RS delimiter used when reading files (default is line wrapping)
awk Basic Usage