The AWK utility comes with its own self-contained language. it is one of the most powerful data processing engines available in any environment in Unix/Linux. This programming and data operation language (its name is derived from the first word SyntaxHighligh in the surname of its founder AlfredAho, PeterWeinberger, and BrianKernighan
The AWK utility comes with its own self-contained language. it is one of the most powerful data processing engines available in any environment in Unix/Linux. The maximum functionality of this programming and data operation language (its name is derived from the first letter of its founder, Alfred Aho, Peter Weinberger, and Brian Kernighan) depends on a person's knowledge. It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports. There are countless other functions.
What is AWK? In short, AWK is a programming language tool used to process text. The language of the AWK utility is similar to the shell programming language in many ways, although AWK has its own syntax. When AWK was initially created, it was intended for text processing, and the basis of this language is to execute a series of commands as long as there is a pattern match in the input data. This utility scans each row of the file to find the pattern that matches the content given in the command line. If the matching content is found, perform the next programming step. If no matching content is found, process the next row. Although operations may be complex, the command syntax is always: awk {pattern + action} where pattern indicates the content that AWK searches for in data, action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group a series of commands according to a specific mode. Understanding the field utility divides each input line into records and fields. A record is a single row input, and each record contains several fields. The default field separator is space or tab, while the record separator is line feed. By default, both tabs and spaces are treated as field delimiters (multiple spaces are still regarded as one separator), but separators can be changed from spaces to any other character. For demonstration, see the following employee list file saved as emp_names: 46012 dulaney evan mobile short durham jeff mobile short steen bill mobile AL46017 feldman evan mobile AL46018 swim steve unknown uncle bogue robert phoenix Jun june micah phoenix short kane sheryl unknown mongowood william muncie parse FERGUS SARAH MUNCIE IN46027 buck sarah muncie IN46029 tuttle bob muncie in when AWK reads the input content, the entire record is assigned to the variable. Each field is separated by a field separator and allocated to variables, and so on. In essence, a row can contain countless fields and each field can be accessed through the field number. Therefore, the command awk {print ,,,,} names will generate a printed output of 46012 dulaney evan mobile AL46013 durham jeff mobile AL46015 steen bill mobile AL46017 feldman evan mobile AL46018 swim steve unknown should Boger robert phoenix should june micah phoenix should KANE SHERYL UNKNOWN AR46024 wood william muncie IN46026 fergus sarah muncie IN46027 buck sarah muncie IN46029 tuttle bob muncie in is worth noting that, AWK interprets the five fields separated by spaces, but when it When the display content is printed, there is only one space between each field. By specifying a unique number for each field, you can select to print only specific fields. For example, to print only the names of each record, you only need to select the second and third fields for printing: $ awk {print ,} incluevandurham jeffsteen billfeldman evanswim steveboger Robert tjune micahkane sherylwood William amfergus sarahbuck sarahtuttle bob $ You can also specify to print fields in any order, regardless of how they exist in the record. Therefore, you only need to display the name field and display it in reverse order. first show the name and then show the last name: $ awk {print ,} emp_namesEVAN dulaneyjeff durhambill steenevan feldmansteve Robert w.junesheryl kanewilliam woodsarah fergussarah buckbob tuttle $ Use mode by including a required pattern, you can choose to only operate on specific records rather than all records. The simplest form of pattern matching is search. the items to be matched are included in the slash (/pattern. For example, you can only perform previous operations on employees living in Alabama: $ awk/AL/{print ,} emp_namesEVAN dulaneyjeff durhambill steenevan feldmansteve swim $ if you do not specify the field to be printed, the entire matched entry is printed: $ awk/AL/emp_names46012 dulaney evan mobile AL46013 durham jeff mobile AL46015 steen bill mobile AL46017 feldman evan mobile AL46018 swim steve unknown al $ use semicolons (;) separated. For example, print the name in one row and the city and state name in the other row: $ awk/AL/{print,; print ,} emp_namesEVAN dulaneymobile aljeff durhammobile albill steenmobile alevan feldmanmobile alsteve unknown AL $ if no semicolon (print,) is used, all content is displayed in the same row. On the other hand, if two print statements are given respectively, the results will be completely different: $ awk/AL/{print,} {print ,} emp_namesEVAN dulaneymobile aljeff durhammobile albill steenmobile alevan feldmanmobile alsteve has unknown alphoenix azphoenix azunknown armuncie inmuncie in $ fields and two fields are provided only when AL is found IN the list. However, Fields 4 and 5 are unconditional and always print them. Only the commands in the first set of curly braces work for the commands (/AL/) next to each other. The results are not easy to read and can be made clearer. First, insert a space and comma between the city and the state. Then, place an empty row after each two rows: $ awk/AL/{print,; print "," "n"} emp_namesEVAN DULANEYMOBILE, aljeff durhammobile, albill steenmobile, alevan feldmanmobile, ALSTEVE unknown, AL $ adds a comma and a space (between quotation marks) between the fourth and fifth fields, after the fifth field, print a linefeed (n ). All special characters that can be used in the echo command can also be used in the AWK print statement, including: n (line feed) t (tabulation) B (backspace) f (paper feed) r (press enter). Therefore, you need to read all five fields originally separated by tabs and print them using tabs, you can program $ awk {print "t" t ""} emp_names46012 dulaney evan mobile AL46013 durham jeff mobile AL46015 steen bill mobile AL46017 feldman evan mobile AL46018 SWIM STEVE UNKNOWN AL46019 bogue robert phoenix AZ46021 june micah phoenix AZ46022 kane sheryl unknown AR46024 wood william muncie IN4 6026 fergus sarah muncie IN46027 buck sarah muncie IN46029 tuttle bob muncie in $ separate them by setting multiple criteria consecutively and using the pipe (|) symbol. you can search for multiple pattern matches at a time: $ awk/AL | IN/emp_names46012 dulaney evan mobile AL46013 durham jeff mobile AL46015 steen bill mobile AL46017 feldman evan mobile AL46018 swim steve unknown wood william muncie fergus sarah muncie buck sarah muncie Ben tuttle bob muncie in $ to find each Alaba Matching records of residents of the state and the region. But when trying to find out who lives in Arizona, there is a problem: $ awk/AR/emp_names46019 bogue robert phoenix AZ46021 june micah phoenix AZ46022 kane sheryl unknown AZ46026 fergus sarah muncie IN46027 buck sarah muncie in $ employee 46026 and 46027 did not live IN Arizona; however, their names contain the searched character sequence. Remember, when pattern matching is performed in AWK, such as grep, sed, or most other Linux/Unix commands, the matching will be found anywhere in the record (line, unless other operations are specified. To solve this problem, you must associate the search with a specific field. By using the font size (?) And descriptions of specific fields can achieve this purpose, as shown in the following example: $ awk? /AR/emp_names46019 bogue robert phoenix AZ46021 june micah phoenix AZ46022 kane sheryl unknown az $ represents a font size (indicating a match) with an exclamation mark (!?). These character notification programs, if the search sequence does not appear in the specified field, find all the rows that match the search sequence: $ awk !? /AR/durdulaney evan mobile AL46013 durham jeff mobile AL46015 steen bill mobile copyright feldman evan mobile copyright swim steve unknown AL46024 WOOD WILLIAM MUNCIE FERGUS SARAH MUNCIE BUCK SARAH MUNCIE BOB MUNCIE IN $ in this case, all rows without AR in the fifth field will be displayed-including two Sarah entries, which indeed contain AR, but are in the third field rather than the fifth field. Curly braces and field separator parentheses play an important role in the AWK command. The operation that appears between parentheses indicates what will happen and when. When only one pair of parentheses is used: {print,} all operations between parentheses occur simultaneously. When more than one pair of parentheses are used: {print} executes the first group of commands. after the command is completed, the second group of commands is executed. Note the differences between the following two lists: $ awk {print ,} namesEVAN dulaneyjeff durhambill steenevan feldmansteve Robert w.junesheryl kanewilliam woodsarah fergussarah buckbob tuttle $ awk {print} tags $ Use multiple sets of parentheses to perform repeated searches and execute the commands in the first group; then process the second group of commands. If there are a third group of commands, execute them after the second group is complete, and so on. There are two separate print commands in the generated print output. Therefore, execute the first command first and then the second command, so that each entry is displayed in two lines instead of one line. The field separator for distinguishing two fields is not always a space; it can be any identifiable character. For demonstration, assume that the emp_names file uses a colon instead of a tab to separate fields: $ cat emp_names46012: DULANEY: EVAN: MOBILE: AL46013: DURHAM: JEFF: MOBILE: AL46015: STEEN: BILL: MOBILE: AL46017: FELDMAN: EVAN: MOBILE: AL46018: SWIM: STEVE: UNKNOWN: AL46019: BOGUE: ROBERT: PHOENIX: AZ46021: JUNE: MICAH: PHOENIX: AZ46022: KANE: SHERYL: UNKNOWN: AR46024: WOOD: WILLIAM: MUNCIE: IN4602