Text Analysis Tool-awk

Last Update:2015-01-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction to AWK

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.

Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.

Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.

Ii. Usage

Awk '{pattern + action}' {filenames}

Although the operation may be complex, the syntax is always like this. pattern indicates the content that AWK searches for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.

The most basic function of the awk language is to browse and extract information based on specified rules in a file or string. Only after awk extracts information can other text operations be performed. A complete awk script is usually used to format information in a text file.

In general, awk is a row of files for processing. Every time an awk receives a line of files, it then executes the corresponding command to process the text.

-------------------------------------- Split line --------------------------------------

Introduction and use of AWK

AWK introduction and Examples

Shell script-AWK text editor syntax

Learning and using AWK in Regular Expressions

AWK diagram of Text Data Processing

How to Use the awk command in Linux

-------------------------------------- Split line --------------------------------------

Iii. Method of calling awk

There are three methods to call awk

1. Command Line

1	`awk [-F field-separator] 'commands' input-file(s)`

Commands is a real awk command, and the [-F domain separator] is optional. Input-file (s) is a file to be processed.

In awk, each line of a file is called a domain separated by a domain separator. Generally, the default domain separator is a space without specifying the-F domain separator.

2. shell script

Insert all the awk commands into a file and make the awk program executable. Then, the awk command interpreter serves as the first line of the script and is called by typing the script name again.

Equivalent to the first line of shell script :#! /Bin/sh

Can be changed :#! /Bin/awk

3. Insert all the awk commands into a separate file, and then call:

Awk-fawk-script-file input-file (s)

Among them, the-f option loads the awk script in the awk-script-file, and the input-file (s) is the same as above.

Iv. introduction to basic awk commands

Option:

-F [:]: Specifies the input field separator

-V var = var: assign values to built-in variables or custom Variables

Example 1: Use a comma as the field separator to print the first and third fields of the text content (the user name and UID are obtained)

12345 #gawk -F: '{print $1,$3}' /etc/passwd root 0bin 1daemon 2Omitted

Two fields are connected without commas, which are output delimiters.

12345 # gawk -F: '{print $1$3}' /etc/passwdroot0bin1daemon2Omitted

This is an example of awk + action. action {print $1, $3} is executed for each row }.

5. awk output commands: print and printf

Both print and printf are provided in awk.

5.1.print command:

Command usage:

1	`printitem1,item2……`

Usage tips:

1. Each item is separated by a comma, and the output separator is used for output.

2. Each output item can be a string or a value. The field ($ n) of the current record, a variable or an awk expression, and the value is implicitly converted to a character for output.

3. If the item after print is omitted, it is equivalent to print $0 (the entire line is output). print "" is used to output blank space "";

5.2.printf command:

Command Format:

1	`printf format,item1,item2……`

Usage tips:

1. The format character must be

2. line breaks are not automatically generated. You need to manually add line delimiters.

3. Specify a format character for each item following the format

Format character: Start With %, followed by a character

% C: the ASCII code of the character;

% I, % d: displays a decimal integer;

% E, % E: Numeric value displayed in scientific notation;

% F: displays floating point numbers;

% G, % G: numerical value is displayed in scientific notation or floating-point number format;

% S: string;

% U: unsigned integer;

%: Display % itself

Modifier:

# [. #]: First # display width, for example, % 30 s; second. # display decimal point Precision

-: Left alignment

+: Displays numeric symbols.

Vi. awk Variables

6. 1. built-in Variables

Records: Row-related

Fields: field-related

FS: input field seperator, which is a field separator. It is a blank character by default.

# Awk-v FS = ":" '{print $1, $3}'/etc/passwd

OFS: output fieldseparator, output field separator

The delimiter between a statement and a statement. The default Delimiter is space.

# Awk 'in in {FS = ":"; OFS = "="} {print $1, $3} '/etc/passwd

RS: input record seperator, which is the delimiter of the input record. The default value is new.

Example: Use a colon as the line break to output the full text

# Awk-v RS = ":" '{print $0}'/etc/passwd

ORS: Outpput Row Seperator, which is the line separator for output;

The default line Delimiter is generally a line break, which can be customized #

The following is to replace all Separators with the separator ":" With the separator:

# Awk 'in in {RS = ":"; ORS = "#"} {print $0} '/etc/passwd

NF: Number of Field, Number of fields in the current record

Count the number of fields in each line in the/etc/issue file:

# Awk '{print NF}'/etc/issue

Note: NF is a variable reference. You do not need to add $, $ NF to display the field location.

NR: number of inputrecords, number of current text lines

If there are multiple files, this number will count the processed files in a unified manner.

FNR: Unlike NR, FNR is used to record the rows being processed as the total number of rows being processed in the current file.

ARGV: array, saving the command itself. awk '{print $0}' file1 file2, meaning ARGV [0] saves awk,

ARGC: saves the number of parameters in the awk command, excluding the command itself;

This command has three parameters: awk/etc/fstab/etc/issue

FILENAME: current file name

IGNORECASE: determines whether to ignore case-insensitive characters.

. Custom Variables

Direct use

-V var = valname: variable names are case sensitive

1. variables can be defined in program

2. variables can be defined in Options

For example:

Equivalent:

# Awk-v file = "passwd" '{printfile, $1}'/etc/passwd

For more details, please continue to read the highlights on the next page:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Text Analysis Tool-awk

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Text Analysis Tool-awk

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support