Article title: AWK: Smart toolkit for Linux administrators. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
AWK utility comes with its own self-contained language. it is not only one of the most powerful data processing engines in Linux but also in any environment. The maximum functionality of this programming and data operation language (its name is derived from the first letter of its founder, Alfred Aho, Peter Weinberger, and Brian Kernighan) depends on a person's knowledge. It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports. There are countless other functions.
What is AWK? In short, AWK is a programming language tool used to process text. The language of the AWK utility is similar to the shell programming language in many ways, although AWK has its own syntax. When AWK was initially created, it was intended for text processing, and the basis of this language is to execute a series of commands as long as there is a pattern match in the input data. This utility scans each row of the file to find the pattern that matches the content given in the command line. If the matching content is found, perform the next programming step. If no matching content is found, process the next row.
Although operations may be complex, the command syntax is always:
Awk '{pattern + action}' {filenames}
Specifically, pattern indicates the content that AWK looks for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group a series of commands according to a specific mode.
Understanding fields Utility divides each input line into records and fields. A record is a single row input, and each record contains several fields. The default field separator is space or tab, while the record separator is line feed. By default, both tabs and spaces are treated as field delimiters (multiple spaces are still regarded as one separator), but separators can be changed from spaces to any other character.
For demonstration, see the following employee list file saved as emp_names:
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46019 BOGUE ROBERT PHOENIX AZ
46021 JUNE MICAH PHOENIX AZ
46022 KANE SHERYL UNKNOWN AR
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
When AWK reads the input content, the entire record is allocated to the variable $0. Each field is separated by a field separator and assigned to variables $1, $2, and $3. In essence, a row can contain countless fields and each field can be accessed through the field number. Therefore
Awk '{print $1, $2, $3, $4, $5}' names
The printed output is
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46019 BOGUE ROBERT PHOENIX AZ
46021 JUNE MICAH PHOENIX AZ
46022 KANE SHERYL UNKNOWN AR
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
It is worth noting that AWK interprets the five fields separated by spaces, but when it prints the display content, there is only one space between each field. By specifying a unique number for each field, you can select to print only specific fields. For example, to print only the names of each record, you only need to select the second and third fields for printing:
$ Awk '{print $2, $3}' emp_names
DULANEY EVAN
DURHAM JEFF
STEEN BILL
FELDMAN EVAN
SWIM STEVE
BOGUE ROBERT
JUNE MICAH
KANE SHERYL
WOOD WILLIAM
FERGUS SARAH
BUCK SARAH
TUTTLE BOB
$
You can also specify to print fields in any order, regardless of how they exist in the record. Therefore, you only need to display the name field and display it in reverse order. first, display the name and then display the last name:
$ Awk '{print $3, $2}' emp_names
EVAN DULANEY
JEFF DURHAM
BILL STEEN
EVAN FELDMAN
STEVE SWIM
ROBERT BOGUE
MICAH JUNE
SHERYL KANE
WILLIAM WOOD
SARAH FERGUS
SARAH BUCK
BOB TUTTLE
$
Usage mode By adding a pattern that must be matched, you can choose to operate only specific records instead of all records. The simplest form of pattern matching is search. the items to be matched are included in the slash (/pattern. For example, only employees living in Alabama perform the following operations:
$ Awk '/AL/{print $3, $2}' emp_names
EVAN DULANEY
JEFF DURHAM
BILL STEEN
EVAN FELDMAN
STEVE SWIM
$
If you do not specify the field to be printed, the entire matching entry is printed:
$ Awk '/AL/'emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
$
Multiple commands of the same dataset can be separated by semicolons. For example, print the name in one row and the city and state name in the other row:
$ Awk '/AL/{print $3, $2; print $4, $5}' emp_names
EVAN DULANEY
MOBILE AL
JEFF DURHAM
MOBILE AL
BILL STEEN
MOBILE AL
EVAN FELDMAN
MOBILE AL
STEVE SWIM
UNKNOWN AL
$
If the semicolon (print $3, $2, $4, $5) is not used, all content is displayed in the same line. On the other hand, if two print statements are given respectively, different results will be generated:
$ Awk '/AL/{print $3, $2} {print $4, $5}' emp_names
EVAN DULANEY
MOBILE AL
JEFF DURHAM
MOBILE AL
BILL STEEN
MOBILE AL
EVAN FELDMAN
MOBILE AL
STEVE SWIM
UNKNOWN AL
PHOENIX AZ
PHOENIX AZ
UNKNOWN AR
MUNCIE IN
MUNCIE IN
MUNCIE IN
MUNCIE IN
$
Fields 3 and 2 are provided only when AL is found in the list. However, Fields 4 and 5 are unconditional and always print them. Only the commands in the first set of curly braces work for the commands (/AL/) next to each other.
The results are not easy to read and can be made clearer. First, insert a space and comma between the city and the state. Then, place an empty line after each two rows are displayed:
Between the fourth and fifth fields, add a comma and a space (between quotation marks). after the fifth Field, print a line break (\ n ). In the AWK print statement, you can also use all the special characters that can be used in the echo command, including:
\ N (line feed)
\ T (tabulation)
\ B (return)
\ F (paper feed)
\ R (press enter)
Therefore, to read all five fields originally separated by tabs and print them using tabs, you can program the following:
$ Awk '{print $1 "\ t" $2 "\ t" $3 "\ t" $4 "\ t" $5} 'emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46019 BOGUE ROBERT PHOENIX AZ
46021 JUNE MICAH PHOENIX AZ
46022 KANE SHERYL UNKNOWN AR
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
$
Multiple criteria are set consecutively and separated by the pipe (|) symbol. you can search for multiple pattern matches at a time:
$ Awk '/AL | IN/' emp_names
46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
46017 FELDMAN EVAN MOBILE AL
46018 SWIM STEVE UNKNOWN AL
46024 WOOD WILLIAM MUNCIE IN
46026 FERGUS SARAH MUNCIE IN
46027 BUCK SARAH MUNCIE IN
46029 TUTTLE BOB MUNCIE IN
$
In this way, we can find matching records for each of the members of the Alabama and Silicon Valley. But when trying to find out who lives in Arizona, there is a problem:
$ Awk '/AR/'emp_names
46019 BOGUE ROBERT PHOENIX AZ
46021 JUNE MICAH PHOENIX AZ
46022 KANE SHERYL UNKNOWN AZ
46026 FERGUS
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.