"Linux Tools--awk"

Source: Internet
Author: User
Tags posix

1. Introduction to Awk

Awk is a programming language that is used to process text and data under Linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script. awk handles text and data in such a way that it scans the file line-by-row, from the first line to the last line, looks for rows that match a particular pattern, and makes the actions you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen), and if no pattern is specified, all rows specified by the operation are processed.

Awk represents the first letter of its author's last name, respectively. Because its author is three people, respectively is Alfred Aho, Brian Kernighan, Peter Weinberger. Gawk is the GNU version of AWK, which provides some extensions to the Bell Lab and GNU. Awk is described below as an example of the gawk of the gun, which has been linked to gawk in the Linux system, so all of this is described in awk below.

2. awk command format and options

2.1. awk has two forms of syntax

? awk [Options] ' script ' Var=value file (s)

? awk [Options]-F scriptfile var=value file (s)

2.2. Command Options

-F FS or--field-separator FS

Specifies the input file delimiter, FS is a string, or is a regular expression, such as-f:.

-V Var=value or--asign var=value

Assigns a user-defined variable.

-F scripfile or--file ScriptFile

Reads the awk command from the script file.

-MF nnn AND-MR nnn

Set intrinsic limits on the NNN value, the-MF option limits the maximum number of blocks assigned to NNN, and the-MR option limits the maximum number of records. These two features are the extended functionality of the Bell Lab version of AWK and are not available in standard awk.

-W compact or--compat,-w traditional or--traditional

Run awk in compatibility mode. So Gawk's behavior is exactly the same as the standard awk, and all awk extensions are ignored.

-W copyleft or--copyleft,-w copyright or--copyright

Print a brief copyright message.

-W Help or--help,-w usage or--usage

Print all awk options and a short description of each option.

-W Lint or--lint

Print warnings for structures that cannot be ported to traditional UNIX platforms.

-W lint-old or--lint-old

Print a warning about a structure that cannot be ported to a traditional UNIX platform.

-W POSIX

Turn on compatibility mode. However, the following limitations are not recognized: \x, function keyword, func, swap sequence, and when FS is a space, the new row is used as a domain delimiter, and the operators * * and **= are not valid in lieu of ^ and ^=;fflush.

-W re-interval or--re-inerval

Allows the use of interval regular expressions, reference (POSIX character class in grep), such as parenthesis expression [[: Alpha:]].

-W source Program-text or--source Program-text

Use Program-text as the source code, which can be mixed with the-f command.

-W version or--version

Print the version of the bug report information.

3. modes and Operations

awk scripts are made up of patterns and operations:

The pattern {action} is like $ awk '/root/' test, or the awk ' $ < ' test.

Both are optional, and if there is no pattern, the action is applied to all records, and if there is no action, the output matches all records. By default, each input line is a record, but the user can specify a different delimiter to delimit by using the RS variable.

3.1. mode

The pattern can be any one of the following:

? /Regular expression/: An extension set that uses wildcard characters.

? Relational expressions: You can use the relational operator in the following operator table, which can be a string or numeric comparison, such as $2>%1 to select a row with a second field that is longer than the first word.

? Pattern matching expression: with operator ~ (match) and ~ ~ (not matched).

? Mode, Mode: Specifies the range of a row. The syntax cannot include the begin and end patterns.

? BEGIN: Lets the user specify the action that occurs before the first input record is processed, which is where the global variable is usually set.

? End: The action that occurs after the last input record has been read by the user.

3.2. Operation

An action consists of one or more commands, functions, and expressions, separated by a newline or semicolon, and enclosed in curly braces. There are four main parts:

? Variable or array assignment

? Output command

? Built-in functions

? Control Flow Command

4. awk 's environment variables

Table 1. AWK's environment variables

Variable

Describe

$n

The nth field of the current record, separated by FS between the fields.

$

The complete input record.

ARGC

The number of command-line arguments.

Argind

The location of the current file in the command line, starting at 0.

Argv

An array that contains the command-line arguments.

Convfmt

Number conversion format (default is%.6g)

ENVIRON

An associative array of environment variables.

Errno

Description of the last system error.

FieldWidths

A list of field widths separated by a space key.

FILENAME

The current file name.

FNR

Same as NR, but relative to the current file.

Fs

The field delimiter (the default is any space).

IGNORECASE

If true, the matching of the case is ignored.

Nf

The number of fields in the current record.

Nr

The current number of records.

Ofmt

The output format of the number (the default value is%.6g).

OFS

The Output field delimiter (the default value is a space).

ORS

The output record delimiter (the default value is a newline character).

Rlength

The length of the string that is matched by the match function.

Rs

Record delimiter (default is a line break).

Rstart

The first position of a string that is matched by the match function.

Subsep

Array subscript delimiter (the default value is \034).

5. awk operator

Table 2. Operator

Operator

Describe

= += -= *= /= %= ^= **=

Assign value

?:

C-Conditional expression

||

Logical OR

&&

Logic and

~ ~!

Match regular expressions and mismatched regular expressions

< <= > >= = = =

Relational operators

Space

Connection

+ -

Add, Subtract

*/&

Multiply, divide and seek remainder

+ - !

Unary Plus, minus and logical non-

^ ***

exponentiation

++ --

To increase or decrease, as a prefix or suffix.

$

Field reference

Inch

Array members

6. Records and Domains

6.1. Recording

Awk calls each line that ends with a newline character a record.

Record delimiter: The default input and output separators are carriage returns, which are stored in the built-in variables ors and Rs.

Variable: it refers to the entire record. such as $ Awk ' {print $} ' test will output all records in the test file.

Variable NR: A counter that increases the value of NR by 1 per record after processing. such as $ Awk ' {print nr,$0} ' test outputs all records in the test file and displays the record number before recording.

6.2. Domain

Each word in the record is called a field, separated by a space or tab by default. Awk can track the number of fields and save the value in the built-in variable NF. such as $ Awk ' {print $1,$3} ' test will print the first and third columns (fields) separated by spaces in the test file.

6.3. Domain Separators

The built-in variable FS holds the value of the input field delimiter, which is the default space or tab. We can modify the value of FS with the-F command-line option. such as $ awk-f: ' {print $1,$5} ' test will print the contents of the first, fifth column with a colon delimiter.

You can use multiple domain separators at the same time, you should write the delimiter in square brackets, such as $awk-f ' [: \ t] ' {print $1,$3} ' test, which represents a space, colon, and tab as delimiters.

The delimiter for the output field is a space by default and is saved in OFS. such as $ awk-f: ' {print $1,$5} ' test,$1 and $ A comma is the value of OFS.

7. Gawk dedicated regular expression meta-characters

General Common meta character sets are not spoken, can refer to my sed and grep learning notes. The following are gawk-specific, awk that is not suitable for UNIX versions.

\y

Matches an empty string at the beginning or end of a word.

\b

Matches an empty string within a word.

\<

Matches an empty string at the beginning of a word, anchoring begins.

\>

Matches an empty string at the end of a word, anchoring the end.

\w

Match A word that consists of an alphanumeric number.

\w

Matches a word that consists of a non-alphanumeric number.

\‘

An empty string that matches the beginning of the string.

\‘

Matches an empty string at the end of the string.

8. POSIX Character Set

Refer to my grep learning notes

9. match operator (~)

Used to match a regular expression within a record or domain.

such as $ awk ' ~/^root/' test will display the row in the first column of the test file that starts with root.

Comparison Expressions

Conditional expression1? Expression2:expression3,

For example:

$ Awk ' {max = {$ > $ $3:print} ' test.

If the first field is larger than the third field, $ $ is assigned to Max, otherwise $ $ is assigned to Max.

$ Awk ' $ + $ < ' test.

If the first and second fields are added greater than 100, the rows are printed.

$ Awk ' $ > 5 && $ $ < ten ' test,

If the first field is greater than 5, and the second field is less than 10, the lines are printed.

One . Scope Templates

A range template matches all rows from the first occurrence of the first template to the first occurrence of the second template. If a template does not appear, it matches to the beginning or end. such as $ awk '/root/,/mysql/' test will show the first time that root appears to all rows between MySQL first occurrence.

an Example of verifying the validity of a passwd file

$ CAT/ETC/PASSWD | Awk-f: ' \

NF! = 7{\

printf ("line%d,does not having 7 fields:%s\n", nr,$0)}\

$!~/[a-za-z0-9]/{printf ("line%d,non Alpha and numeric user id:%d:%s\n,nr,$0)}\

$ = = "*" {printf ("line%d, no password:%s\n", nr,$0)} '

Cat outputs the result to Awk,awk to set the delimiter between the fields to a colon.

If the number of domains (NF) is not equal to 7, execute the following program.

printf print string "line??" Does not has 7 fields "and displays the record.

If the first field does not contain any letters and numbers, printf prints "No alpha and numeric user ID" and displays the number of records and records.

If the second field is an asterisk, the string "No passwd" is printed, followed by the number of records displayed and the record itself.

several examples

? $ Awk ' {print $} ' Test-----intercept the contents of a third field (column).

? $ Awk '/^ (no|so)/' Test-----Prints all lines that begin with mode no or so.

? $ Awk '/^[ns]/{print $ ' test-----Print this record if the record starts with N or S.

? $ Awk ' $ ~/[0-9][0-9]$/(print $) test-----If the first field prints this record at the end of two digits.

? $ Awk ' $ = = 100 | | $ < ' test-----if the first or equal 100 or the second field is less than 50, the line is printed.

? $ Awk ' = Ten ' test-----print the first field if it is not equal to 10.

? $ Awk '/test/{print $10} ' test-----If the record contains a regular expression test, the first field is added to and printed out.

? $ Awk ' {print ($ > 5? "OK" $: "Error" ($)} ' test-----Print the expression value after the question mark if the first field is greater than 5, otherwise the expression value after the colon is printed.

? The $ awk '/^root/,/^mysql/' test----prints all records in the range of records that begin with the regular expression root with a record that begins with the regular expressions MySQL. If a record of the beginning of a new regular expression root is found, continue printing until the next record begins with the regular expression MySQL, or to the end of the file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.