Are you learning Linux? This article introduces very useful AWK text operation tools. AWK utility comes with its own self-contained language. it is not only one of the most powerful data processing engines in Linux but also in any environment. This programming and data operation language (its name is
Are you learning Linux? This article introduces very useful AWK text operation tools.
AWK utility comes with its own self-contained language. it is not only one of the most powerful data processing engines in Linux but also in any environment. The maximum functionality of this programming and data operation language (its name is derived from the first letter of its founder, Alfred Aho, Peter Weinberger, and Brian Kernighan) depends on a person's knowledge. It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports. There are countless other functions.
What is AWK?
In short, AWK is a programming language tool used to process text. The language of the AWK utility is similar to the shell programming language in many ways, although AWK has its own syntax. When AWK was initially created, it was intended for text processing, and the basis of this language is to execute a series of commands as long as there is a pattern match in the input data. This utility scans each row of the file to find the pattern that matches the content given in the command line. If the matching content is found, perform the next programming step. If no matching content is found, process the next row.
Although operations may be complex, the command syntax is always:
awk {pattern + action} {filenames}
WherePatternThe content that AWK looks for in the data. action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group a series of commands according to a specific mode.
Understanding fields
Utility divides each input line into records and fields.RecordIs a single row input, and each record contains several fields. The default field separator is space or tab, while the record separator is line feed. By default, both tabs and spaces are treated as field delimiters (multiple spaces are still regarded as one separator), but separators can be changed from spaces to any other character.
For demonstration, see the following employee list file saved as emp_names:
46012 DULANEY EVAN MOBILE AL46013 DURHAM JEFF MOBILE AL46015 STEEN BILL MOBILE AL46017 FELDMAN EVAN MOBILE AL46018 SWIM STEVE UNKNOWN AL46019 BOGUE ROBERT PHOENIX AZ46021 JUNE MICAH PHOENIX AZ46022 KANE SHERYL UNKNOWN AR46024 WOOD WILLIAM MUNCIE IN46026 FERGUS SARAH MUNCIE IN46027 BUCK SARAH MUNCIE IN46029 TUTTLE BOB MUNCIE IN
When AWK reads the input content, the entire record is allocated to the variable.$0. Each field is separated by a field separator and assigned to a variable.$1, $2, $3And so on. In essence, a row can contain countless fields and each field can be accessed through the field number. Therefore
awk {print $1,$2,$3,$4,$5} names
The printed output is
46012 DULANEY EVAN MOBILE AL46013 DURHAM JEFF MOBILE AL46015 STEEN BILL MOBILE AL46017 FELDMAN EVAN MOBILE AL46018 SWIM STEVE UNKNOWN AL46019 BOGUE ROBERT PHOENIX AZ46021 JUNE MICAH PHOENIX AZ46022 KANE SHERYL UNKNOWN AR46024 WOOD WILLIAM MUNCIE IN46026 FERGUS SARAH MUNCIE IN46027 BUCK SARAH MUNCIE IN46029 TUTTLE BOB MUNCIE IN
It is worth noting that AWK interprets the five fields separated by spaces, but when it prints the display content, there is only one space between each field. By specifying a unique number for each field, you can select to print only specific fields. For example, to print only the names of each record, you only need to select the second and third fields for printing:
$ awk {print $2,$3} emp_namesDULANEY EVANDURHAM JEFFSTEEN BILLFELDMAN EVANSWIM STEVEBOGUE ROBERTJUNE MICAHKANE SHERYLWOOD WILLIAMFERGUS SARAHBUCK SARAHTUTTLE BOB$
You can also specify to print fields in any order, regardless of how they exist in the record. Therefore, you only need to display the name field and display it in reverse order. first, display the name and then display the last name:
$ awk {print $3,$2} emp_namesEVAN DULANEYJEFF DURHAMBILL STEENEVAN FELDMANSTEVE SWIMROBERT BOGUEMICAH JUNESHERYL KANEWILLIAM WOODSARAH FERGUSSARAH BUCKBOB TUTTLE$
Usage mode
By adding a pattern that must be matched, you can choose to operate only specific records instead of all records. The simplest form of pattern matching is search. the items to be matched are included in the diagonal line (/Pattern/. For example, only employees living in Alabama perform the following operations:
$ awk /AL/ {print $3,$2} emp_namesEVAN DULANEYJEFF DURHAMBILL STEENEVAN FELDMANSTEVE SWIM$
If you do not specify the field to be printed, the entire matching entry is printed:
$ awk /AL/ emp_names46012 DULANEY EVAN MOBILE AL46013 DURHAM JEFF MOBILE AL46015 STEEN BILL MOBILE AL46017 FELDMAN EVAN MOBILE AL46018 SWIM STEVE UNKNOWN AL$
You can use semicolons (;. For example, print the name in one row and the city and state name in the other row:
$ awk /AL/ {print $3,$2 ; print $4,$5} emp_namesEVAN DULANEYMOBILE ALJEFF DURHAMMOBILE ALBILL STEENMOBILE ALEVAN FELDMANMOBILE ALSTEVE SWIMUNKNOWN AL$
If the semicolon (Print $3, $2, $4, $5), All contents are displayed in the same row. On the other hand, if two print statements are given respectively, different results will be generated:
$ awk /AL/ {print $3,$2} {print $4,$5} emp_namesEVAN DULANEYMOBILE ALJEFF DURHAMMOBILE ALBILL STEENMOBILE ALEVAN FELDMANMOBILE ALSTEVE SWIMUNKNOWN ALPHOENIX AZPHOENIX AZUNKNOWN ARMUNCIE INMUNCIE INMUNCIE INMUNCIE IN$
Only available in the listALFields 3 and 2 are provided. However, Fields 4 and 5 are unconditional and always print them. Only the commands in the first set of curly braces are adjacent to the commands (/AL/.
The results are not easy to read and can be made clearer. First, insert a space and comma between the city and the state. Then, place an empty line after each two rows are displayed:
$ awk /AL/ {print $3,$2 ; print $4", "$5""} emp_namesEVAN DULANEYMOBILE, ALJEFF DURHAMMOBILE, ALBILL