Nawk Handbook
Preface to the first chapter
Chapter II Introduction
Chapter III Reading input files
The fourth chapter prints out
Fifth Chapter Patterns
Sixth Chapter formula (Expression) as the narration of actions
Seventh chapter the control narration in the actions
Eighth chapter-Built-in function (built-in functions)
Chapter Nineth user-defined functions
Chapter Tenth Examples
11th Chapter Conclusion
Preface to the first chapter
Awk is a programming language that has a strong function in processing data. For the data in the text file to modify, compare, extract and so on, awk can be a very short program
Easy to finish. If you use a language such as C or Pascal to write a program to do this, it will be inconvenient and time-consuming, and the program will be very large.
awk can decompose input data according to the user's defined format, or print the data according to the user-defined format.
The origin of the awk name is named after the first letter of its original designer's last name: Alfred v. Aho, Peter J. Weinberger, Brian W. Kernighan.
Awk was first completed in 1977. A new version of Awk was published in 1985, and its functionality is much stronger than the older version.
Gawk is a GNU Awk,gawk that was first completed in 1986 and is constantly being refined and updated. Gawk contains all the features of awk.
The gawk will be illustrated with the following 2 input files.
File ' bbs-list ':
Aardvark 555-5553 1200/300 B
Alpo-net 555-3412 2400/1200/300 A
Barfly 555-7685 1200/300 A
Bites 555-1675 2400/1200/300 A
Camelot 555-0542 C
Core 555-2912 1200/300 C
Fooey 555-1234 2400/1200/300 B
Foot 555-6699 1200/300 B
Macfoo 555-6480 1200/300 A
Sdace 555-3430 2400/1200/300 A
Sabafoo 555-2127 1200/300 C
File ' shipped ':
13 25 15 115
Feb 15 32 24 226
Mar 15 24 34 228
APR 31 52 63 420
May 16 34 29 208
June 31 42 75 492
April 24 34 67 436
Aug 15 34 47 316
Sep 13 55 37 277
OCT 29 54 68 525
Nov 20 87 82 577
Dec 17 35 61 401
21 36 64 620
Feb 26 58 80 652
Mar 24 75 70 495
APR 21 70 74 514
Chapter II Introduction
The main function of gawk is to search for the specified patterns for each line of the file. The specified actions are executed on this line when the specified Patterns,gawk is in line. Gawk process each line of the input file in this manner until the input file is finished.
The Gawk program consists of a lot of pattern and action, which is written in braces {}, followed by an action on the back of a pattern. The entire gawk program will look something like the following:
Pattern {Action}
Pattern {Action}
The rules in the Gawk program, pattern or action can be omitted, but two cannot be omitted at the same time. If pattern is omitted, the action is executed for each row in the input file. If the action is omitted, the default action prints all input lines that match the pattern.
2.1 How to execute the gawk program
Basically, there are 2 ways to execute a gawk program.
If the Gawk program is short, the gawk can be written directly in command line, as follows:
Gawk ' program ' Input-file1 input-file2 ...
The program includes some pattern and action.
If the Gawk program is longer, it is convenient to have a file in the Gawk program, that is patterns and actions written in file name program-file files, the format of the execution gawk is as follows:
Gawk-f program-file input-file1 Input-file2 ...
When more than one file is in the Gawk program, the format of the execution gawk is as follows:
Gawk-f program-file1-f program-file2 ... input-file1 input-file2 ...
2.2 A simple example
Now let's give a simple example, because the Gawk program is very short, so write the gawk program directly on command line.
Gawk '/foo/{print $} ' bbs-list
The actual gawk program is/foo/{print $}. /foo/is pattern, meaning that every line in the search input file contains a substring of ' foo ' and action if it contains ' foo '.
The action is print $, which prints the contents of the current line. Bbs-list is the input file.
After executing the above instructions, the following results will be printed:
Fooey 555-1234 2400/1200/300 B
Foot 555-6699 1200/300 B
Macfoo 555-6480 1200/300 A
Sabafoo 555-2127 1200/300 C
2.3 A more complex example
Gawk ' = = = Feb ' {sum=$2+$3} end {print sum} ' shipped
Now This example compares the first field of the input file ' shipped ' with ' Feb ', and if it is equal, the value of the corresponding 2nd and 3rd fields is added to the variable sum.
Repeat for each row of the input file until each line in the input file is processed. Finally, the value of sum is printed out. The end {print sum} means to perform a print sum action once all input has been read, that is, to print the value of sum.
The following are the results of the execution:
84
Chapter III Reading input files
Gawk input can be read from standard input or from a specified file. The input unit is called a "record" (Records), and gawk is processed with a record (P9 of 46) recorded. The default value for each record is one line, and a record is divided into multiple fields (fields).
3.1 How to decompose input into records (records)
The gawk language breaks the input into records (record). Between records and records is separated by a record separator, the default value of the record separator is to represent the new line of characters (newline character), so the default record separator makes each line of text an entry.
The record separator changes with the changes in the built-in variable RS. RS is a string whose default value is "". Only the first character of Rs is valid, it is treated as a record separator, and other characters of Rs are ignored.
3.2 Fields (field)
Gawk automatically decomposes each record into fields (field). Similar to words in a row, Gawk's default action will think that the field is separated by whitespace. In gawk, whitespace means one or more blanks or tabs.
In the Gawk program, the first field is represented by ' $ ', the second field, and so on. For example, suppose that the line entered is as follows:
This is seems like a pretty nice example.
The first field or $ is ' this ', the second field or $ is ' seems ', and so on.
There is a place worth paying special attention to, the seventh field or $ is ' example. ' Rather than ' example '.
No matter how many fields there are, $NF can be used to represent the last field of a record. Take the example above, $NF is the same as $ $, i.e. ' example. '
NF is an builtin variable whose value represents the current number of fields in this record. $, which looks like a 0th field, is a special case that represents the entire record.
The following is a more complex example:
Gawk ' $1~/foo/{print $} ' bbs-list
The results are as follows:
Fooey 555-1234 2400/1200/300 B
Foot 555-6699 1200/300 B
Macfoo 555-6480 1200/300 A
Sabafoo 555-2127 1200/300 C
This example examines the first field in each record of the input file ' bbs-list ' and if it contains a substring ' foo ', the record is printed.
3.3 How to break records into fields
Gawk breaks a record into fields based on the field separator. Field Sepa-rator variable FS representation.
For example, if the field separator is ' OO ', the following line:
Moo Goo gai Pan
Will be divided into three fields: ' m ', ' G ', ' Gai Pan '.
In the Gawk program, you can use ' = ' to change the value of FS. For example:
Gawk ' BEGIN {fs= ', '}; {Print $} '
The input lines are as follows:
John Q. Smith, Oak St., Walamazoo, MI 42139
The results of the execution of the Gawk will print the string ' Oak St. '. The action behind the BEGIN is executed once before the first record is read.
The fourth chapter prints out
In the Gawk program, the most common thing the actions do is print out (printing). Simple print out, using printe narration. Printed in a complex format, using printf narration.
4.1 Print Narration
The print description is used in a simple, standard output format. The format of the narration is as follows:
Print item1, item2, ...
Output, each item will be separated by a blank, and the last line will be wrapped (newline).
If the ' print ' description does not follow anything, it is the same as the ' print $ ', and it prints out the current record. To print a blank line, use ' print '
""'。 Print a fixed text, you can enclose the text in double quotes, such as
' Print ' Hello there '.
Here is an example that prints the first two fields of each input record:
Gawk ' {print $1,$2} ' shipped
The results are as follows:
13
Feb 15
Mar 15
APR 31
May 16
June 31
April 24
Aug 15
Sep 13
OCT 29
Nov 20
Dec 17
Feb 26
Mar 24
APR 21
4.2 Output Separators
As we have mentioned earlier, if the print narration contains multiple item,item separated by commas, the item will be separated by a blank when printed. Can you use any string as output field separator to reside support 诮 were thinking 涫? OFS settings to change the output field separator. The initial value of the OFS is "", that is, the blank of a grid.
The output of the entire print description is called an output record. After the print narration outputs the output record, a string is output, called the output record separator. The built-in variable ORS is used to indicate this string. The initial value of the ORS is "", which is the newline.
The following example prints the first field and the second field of each record, with a semicolon between the two (P16 of 46) fields. Separately, a blank line is added to the output of each line.
Gawk ' BEGIN {ofs= '; Ors= ""} {print $, $} ' bbs-list
The results are as follows:
aardvark;555-5553
alpo-net;555-3412
barfly;555-7685
bites;555-1675
camelot;555-0542
core;555-2912
fooey;555-1234
foot;555-6699
macfoo;555-6480
sdace;555-3430
sabafoo;555-2127
4.3 printf Narration
The printf narration makes the output format easier to control accurately. The printf narration can specify the width of each item, or you can specify various types of numbers.
The format of the printf narration is as follows:
printf format, item1, ITEM2, ...
The difference between print and printf is in format, and printf has more arguments than print (p18 of string format). The format is the same as for ANSI C, which is printf. printf does not do the automatic wrapping action. Built-in variables OFS and ORS have no effect on printf narratives.
The format designation starts with the character '% ', followed by a format that controls the letter.
The format control letter looks like this:
' C ' prints the number in ASCII characters.
For example, ' printf '%c ', 65 ' will print a character ' a '.
' d ' prints a 10-digit integer.
' I ' prints a 10-digit integer.
' e ' prints the numbers in the form of scientific symbols.
For example
Print "$4.3e", 1950
(P19 of
The results will print ' 1.950e+03 '.
' F ' prints the number in a floating-point form.
' G ' Prints the numbers in the form of scientific symbols or floating-point forms. The absolute value of a number is printed in floating-point form if it is greater than 0.0001, otherwise it is printed in the form of a scientific symbol.
Out
' O ' prints an octal integer with no number.
' s ' prints out a string.
' x ' prints an unsigned 16-digit integer. 10 to 15 is indicated by ' a ' to ' f '.
' X ' prints an unsigned 16-digit integer. 10 to 15 are expressed as ' A ' to ' F '.
'% ' It is not really the format of the control letter, '%% ' will print out '% '.
Adding modifier,modifier between% and format control letters is used to further control the output format. The possible modifier are as follows:
'-' use before width to indicate that it is aligned to the left. If '-' does not appear, it will be aligned to the right of the specified width. For example:
printf "%-4s", "foo"
Will print out ' foo '.
A number ' width ' indicates the width of the corresponding field when it is printed. For example:
printf "%4s", "foo"
Will print out ' foo '.
The value of width is a minimum width, not the maximum width. If the value of an item needs to be wider than the width, it is not affected by the width. For example
printf "%4s", "Foobar"
Will print out ' foobar '.
'. Prec ' This number specifies the accuracy of the print. It specifies the number of digits to the right of the decimal point. If you want to print a string, it specifies how many words the string will be printed at most
Yuan.
Fifth chapter Patterns
In the Gawk program, the corresponding action is executed when pattern conforms to the current input record.
Types of 5.1 pattern
Here are the various pattern patterns of gawk:
/regular expression/
(P22 of
A regular expression as a pattern. Each time the input record contains regular expression, it is considered compliant.
Expression
A single expression. When a value is not 0 or a string is not empty, it can be considered compliant.
Pat1,pat2
A pair of patterns are separated by commas, specifying the circumference of the record.
BEGIN
End
This is a special pattern, in which gawk executes the action relative to the begin or end at the beginning or ending.
Null
This is an empty pattern, which is considered to match pattern for each input record.
(P23 of
5.2 Regular Expressions as patterns
A regular expression can be abbreviated to REGEXP and is a method of describing strings. A regular expression is surrounded by a slash ('/') as the pattern of gawk.
If the input record contains regexp, it is considered compliant. For example: pattern is/foo/, and any input record containing ' Foo ' is considered to be compliant.
The following example prints the 2nd field of the input record containing ' foo '.
Gawk '/foo/{print $} ' bbs-list
The results are as follows:
555-1234
555-6699
555-6480
555-2127
RegExp can also be used in the comparison equation.
(P24 of
Exp ~/regexp/
If exp conforms to regexp, the result is true (true).
Exp!~/regexp/
If exp does not conform to regexp, the result is true.
5.3 Comparison equations as patterns.
The comparison pattern is used to test the relationship between two digits or strings, such as greater than, and less than. Some of the comparison pattern is listed below:
x x<=y if x is less than or equal to Y, the result is true.
X> y if x is greater than Y, the result is true.
X> =y if x is larger than Y, the result is true.
x==y if x is equal to Y, the result is true.
X!=y if x is not equal to Y, the result is true.
X~y if x conforms to regular expression Y, the result is true.
(P25 of
X!~y if x does not conform to regular expression Y, the result is true.
The x and y mentioned above, if both are numbers, are considered comparisons between numbers, otherwise they will