"Reprint Update" of the Linux tool awk 2. Basics

Source: Internet
Author: User
Tags posix


1.awk Introduction

Awk is a programming language that is used to process text and data under Linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script. awk handles text and data in such a way that it scans the file line-by-row, from the first line to the last line, looks for rows that match a particular pattern, and makes the actions you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen), and if no pattern is specified, all rows specified by the operation are processed. Awk represents the first letter of its author's last name, respectively. Because its author is three people, respectively is Alfred Aho, Brian Kernighan, Peter Weinberger. Gawk is the GNU version of AWK, which provides some extensions to the Bell Lab and GNU. Awk is described below as an example of the gawk of the gun, which has been linked to gawk in the Linux system, so all of this is described in awk below.

2.awk command formats and options
2.1. AWK has two forms of syntax
awk [Options] ' script ' Var=value file (s)
awk [Options]-F scriptfile var=value file (s)
Basic usage:
1) Invoke awk on the shell command Line Input command:
awk [-F Domain delimiter] ' awk program segment ' Input file
2) after inserting the AWK program segment into the script file, then invoke it via the awk command:
Awk-f awk script file input file "the first line of the script file does not start with #!/bin/awk–f"
3) after inserting the awk command into the script file, set the script file as executable and execute the script file directly in the format:
./awk script File Input file "the first line of the script file starts with #!/bin/awk–f"

awk '
BEGIN {Actions}
/pattern/{Actions}
/pattern/{Actions}
END {Actions}
' Files
2.2. Command options

-F FS or--field-separator FS Specifies the input file delimiter, FS is a string, or is a regular expression, such as-f:.
-V Var=value or--asign var=value Assigns a user-defined variable.
-F scripfile or--file ScriptFile Reads the awk command from the script file.
-MF nnn AND-MR nnn Set intrinsic limits on the NNN value, the-MF option limits the maximum number of blocks assigned to NNN, and the-MR option limits the maximum number of records. These two features are the extended functionality of the Bell Lab version of AWK and are not available in standard awk.
-W compact or--compat,-w traditional or--traditional Run awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.
-W copyleft or--copyleft,-w copyright or--copyright Print a brief copyright message.
-W Help or--help,-w usage or--usage Print all awk options and a short description of each option.
-W Lint or--lint Print warnings for structures that cannot be ported to traditional UNIX platforms.
-W lint-old or--lint-old Print a warning about a structure that cannot be ported to a traditional UNIX platform.
-W POSIX Turn on compatibility mode. However, the following limitations are not recognized: \x, function keyword, func, swap sequence, and when FS is a space, the new row is used as a domain delimiter, and the operators * * and **= are not valid in lieu of ^ and ^=;fflush.
-W re-interval or--re-inerval Allows the use of interval regular expressions, reference (POSIX character class in grep), such as parenthesis expression [[: Alpha:]].
-W source Program-text or--source Program-text Use Program-text as the source code, which can be mixed with the-f command.
-W version or--version Print the version of the bug report information.



3.    mode and Actions
The awk script is made up of patterns and actions:
The pattern {action} is like $ awk '/root/' test, or the awk ' $ < ' test. The
is optional, and if there is no pattern, the action is applied to all records, and if there is no action, the output matches all records. By default, each input line is a record, but the user can specify a different delimiter to delimit by using the RS variable.

3.1. Mode

Mode can be any one of the following:
   /Regular expression/: An extension set using wildcard characters.
    Relationship expression: You can use the relational operator in the following operator table, which can be a string or numeric comparison, such as $2>$1 to select a second field that is longer than the first word.
    pattern matching expression: with operator ~ (match) and ~ ~ (not matched).
    mode, Mode: Specifies the range of a row. The syntax cannot include the begin and end patterns.
    BEGIN: Let the user specify the action that occurs before the first input record is processed, and you can usually set a global variable here.
    End: The action that occurs after the last input record is read by the user.

3.2. The action

Action consists of one or more commands, functions, and expressions, separated by a newline or semicolon, and enclosed in curly braces. There are four main parts:
    variable or array assignment
    output command
    built-in function
     Control Flow Command

4.    awk environment variable

Table 1. AWK environment Variables

Variable Describe
$n The nth field of the current record, separated by FS between the fields.
$ The complete input record.
ARGC The number of command-line arguments.
Argind The location of the current file in the command line, starting at 0.
Argv An array that contains the command-line arguments.
Convfmt Number conversion format (default is%.6g)
ENVIRON An associative array of environment variables.
Errno Description of the last system error.
FieldWidths A list of field widths separated by a space key.
FILENAME The current file name.
FNR Same as NR, but relative to the current file.
Fs The field delimiter (the default is any space).
IGNORECASE If true, the matching of the case is ignored.
Nf The number of fields in the current record.
Nr The current number of records.
Ofmt The output format of the number (the default value is%.6g).
OFS The Output field delimiter (the default value is a space).
ORS The output record delimiter (the default value is a newline character).
Rlength The length of the string that is matched by the match function.
Rs Record delimiter (default is a line break).
Rstart The first position of a string that is matched by the match function.
Subsep Array subscript delimiter (the default value is \034).

5. awk operator
Table 2. Operator

Operator Describe
= += -= *= /= %= ^= **= Assign value
?: C-Conditional expression
|| Logical OR
&& Logic and
~ ~! Match regular expressions and mismatched regular expressions
< <= > >= = = = Relational operators
Space Connection
+ - Add, Subtract
*/& Multiply, divide and seek remainder
+ - ! Unary Plus, minus and logical non-
^ *** exponentiation
++ -- To increase or decrease, as a prefix or suffix.
$ Field reference
Inch Array members

6.    records and Domains
6.1. Records

Awk calls each line that ends with a newline character a record.
Record delimiter: The default input and output separators are carriage returns and are saved in the built-in variables ors and Rs.
Variable: It refers to the entire record. such as $ Awk ' {print $} ' test will output all records in the test file.
Variable NR: A counter that increases the value of NR by 1 per record after processing. such as $ Awk ' {print nr,$0} ' test outputs all records in the test file and displays the record number before recording.
6.2. Each word in the domain
record is called a field, separated by a space or tab by default. Awk can track the number of fields and save the value in the built-in variable NF. such as $ Awk ' {print $1,$3} ' test will print the first and third columns (fields) separated by spaces in the test file.
6.3. The domain separator
Built in variable FS saves the value of the input field delimiter, which is the default space or tab. We can modify the value of FS with the-F command-line option. such as $ awk-f: ' {print $1,$5} ' test will print the contents of the first, fifth column with a colon delimiter. The
can use multiple domain separators at the same time, and the delimiter should be written in square brackets, such as $awk-f ' [: \ t] ' {print $1,$3} ' test, which represents a space, colon, and tab as delimiters. The
output field delimiter OFS is a space by default. such as $ awk-f: ' {print $1,$5} ' test,$1 and $ A comma is the value of OFS.

7.    gawk private Regular expression metacharacters
General-purpose meta-character sets are not spoken, refer to my sed and grep learning notes. The following are gawk-specific, awk that is not suitable for UNIX versions.

\y matches an empty string at the beginning or end of a word.
\b matches the empty string within the word.
\< matches an empty string at the beginning of a word, anchoring begins.
\> matches the end of a word with an empty string, anchored to the end.
\w match an alphanumeric word.
\w matches a non-alphanumeric word.
\ ' matches an empty string at the beginning of the string.
\ ' matches an empty string at the end of the string.

8. POSIX Character Set
Omitted
9. Match operator (~)
Used to match a regular expression within a record or domain. such as $ awk ' ~/^root/' test will display the row in the first column of the test file that starts with root.
10. Compare Expressions
Conditional expression1? Expression2:expression3, for example: $ Awk ' {max = {$ > $ "$: $3:print max} ' test. If the first field is larger than the third field, $ $ is assigned to Max, otherwise $ $ is assigned to Max.
$ Awk ' $ + $ < ' test. If the first and second fields are added greater than 100, the rows are printed.
$ Awk ' $ > 5 && $ < ' test if the first field is greater than 5, and the second field is less than 10, the lines are printed.
11. Scope Templates
A range template matches all rows from the first occurrence of the first template to the first occurrence of the second template. If a template does not appear, it matches to the beginning or end. such as $ awk '/root/,/mysql/' test will show the first time that root appears to all rows between MySQL first occurrence.
12. An example of verifying the validity of a passwd file
$ CAT/ETC/PASSWD | Awk-f: ' \
NF! = 7{\
printf ("line%d,does not having 7 fields:%s\n", nr,$0)}\
$!~/[a-za-z0-9]/{printf ("line%d,non Alpha and numeric user id:%d:%s\n,nr,$0)}\
$ = = "*" {printf ("line%d, no password:%s\n", nr,$0)} '

Cat outputs the result to Awk,awk to set the delimiter between the fields to a colon.
If the number of domains (NF) is not equal to 7, execute the following program.
printf print string "line??" Does not has 7 fields "and displays the record.
If the first field does not contain any letters and numbers, printf prints "No alpha and numeric user ID" and displays the number of records and records.
If the second field is an asterisk, the string "No passwd" is printed, followed by the number of records displayed and the record itself.

13. Multiple command execution
Write directly, without having to add the-e parameter as SED does. For example

awk '/[1-9]\. [0-9] [0-9]$/{print $, ' * '}/0\. [1-9] [1-9]/{print;} ' Zdd.txt
After a fruit with a price above $1 is added * to attract attention, there are two modes and actions on

14. Format Printing

The%s parameter, which is used to print a string, can specify width, insufficient fill space, positive number for right alignment, and negative number for left alignment. %3s indicates that the string width is 3 columns, the right side is aligned, and if the actual width of the string is greater than 3, the actual width is taken.

Left-aligned file name, size left-justified
Ls-l | awk ' {printf '%-16s%\t%-16s\n ', $9, $;} '
File name left-aligned, size right-aligned
Ls-l | awk ' {printf '%-16s%\t%16s\n ', $9, $;} '
Right-aligned file name, size left-justified
Ls-l | awk ' {printf '%16s%\t%-16s\n ', $9, $;} '
Right-aligned file name, size left-justified
Ls-l | awk ' {printf '%16s%\t%16s\n ', $9, $;} '

15. Several examples

$ Awk ' {print $} ' test Intercepts the contents of a third field (column).
$ Awk '/^ (no|so)/' Test Prints all lines that begin with mode no or so.
$ Awk '/^[ns]/{print $ ' test If the record starts with N or S, the record is printed.
$ Awk ' $ ~/[0-9][0-9]$/{print $ ' Test This record is printed if the first field ends with two digits.
$ Awk ' $ = = 100 | | $ < ' test If the first or equal 100 or the second field is less than 50, the line is printed.
$ Awk '! = Ten ' test If the first field is not equal to 10, the line is printed.
$ Awk '/test/{print $ + ten} ' test If the record contains a regular expression test, the first field is added 10 and printed out.
$ Awk ' {print ($ > 5? "OK" $: "Error" ($)} ' test If the first field is greater than 5, the expression value after the question mark is printed, otherwise the expression value after the colon is printed.
$ awk '/^root/,/^mysql/' test Prints all records in the range of records that begin with the regular expression "root" to the record beginning with the regular expressions. If a record of the beginning of a new regular expression root is found, continue printing until the next record begins with the regular expression MySQL, or to the end of the file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.