Text Analysis Tool-awk

Source: Internet
Author: User

Text Analysis Tool-awk

I. Introduction to AWK

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful in data analysis and report generation. To put it simply, awk refers to reading files row by row. Each line is sliced with spaces as the default separator, and the cut part is analyzed and processed.

Awk has three different versions: awk, nawk, and gawk, which are generally gawk and gawk is the GNU version of AWK.

Awk is named from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan. In fact, AWK does have its own language: AWK programming language. The three creators have formally defined it as "style scanning and processing language ". It allows you to create short programs that read input files, Sort data, process data, perform calculations on input, and generate reports. There are countless other functions.

Ii. Usage

Awk '{pattern + action}' {filenames}

Although the operation may be complex, the syntax is always like this. pattern indicates the content that AWK searches for in the data, and action is a series of commands executed when matching content is found. Curly braces ({}) do not always appear in the program, but they are used to group A series of commands according to a specific mode. Pattern is the regular expression to be expressed and enclosed by a slash.

The most basic function of the awk language is to browse and extract information based on specified rules in a file or string. Only after awk extracts information can other text operations be performed. A complete awk script is usually used to format information in a text file.

In general, awk is a row of files for processing. Every time an awk receives a line of files, it then executes the corresponding command to process the text.

-------------------------------------- Split line --------------------------------------

Introduction and use of AWK

AWK introduction and Examples

Shell script-AWK text editor syntax

Learning and using AWK in Regular Expressions

AWK diagram of Text Data Processing

How to Use the awk command in Linux

-------------------------------------- Split line --------------------------------------

Iii. Method of calling awk

There are three methods to call awk

1. Command Line

1 awk [-F  field-separator]  'commands' input-file(s)

Commands is a real awk command, and the [-F domain separator] is optional. Input-file (s) is a file to be processed.

In awk, each line of a file is called a domain separated by a domain separator. Generally, the default domain separator is a space without specifying the-F domain separator.

2. shell script

Insert all the awk commands into a file and make the awk program executable. Then, the awk command interpreter serves as the first line of the script and is called by typing the script name again.

Equivalent to the first line of shell script :#! /Bin/sh

Can be changed :#! /Bin/awk

3. Insert all the awk commands into a separate file, and then call:

Awk-fawk-script-file input-file (s)

Among them, the-f option loads the awk script in the awk-script-file, and the input-file (s) is the same as above.

Iv. introduction to basic awk commands

Option:

-F [:]: Specifies the input field separator

-V var = var: assign values to built-in variables or custom Variables

 

Example 1: Use a comma as the field separator to print the first and third fields of the text content (the user name and UID are obtained)

12345 #gawk -F:  '{print $1,$3}' /etc/passwd root 0bin 1daemon 2Omitted

 

Two fields are connected without commas, which are output delimiters.

12345 # gawk -F:  '{print $1$3}' /etc/passwdroot0bin1daemon2Omitted

 

This is an example of awk + action. action {print $1, $3} is executed for each row }.

5. awk output commands: print and printf

Both print and printf are provided in awk.

5.1.print command:

Command usage:

1 printitem1,item2……
 

Usage tips:

1. Each item is separated by a comma, and the output separator is used for output.

2. Each output item can be a string or a value. The field ($ n) of the current record, a variable or an awk expression, and the value is implicitly converted to a character for output.

3. If the item after print is omitted, it is equivalent to print $0 (the entire line is output). print "" is used to output blank space "";

5.2.printf command:

Command Format:

1 printf  format,item1,item2……

Usage tips:

1. The format character must be

2. line breaks are not automatically generated. You need to manually add line delimiters.

3. Specify a format character for each item following the format

Format character: Start With %, followed by a character

% C: the ASCII code of the character;

% I, % d: displays a decimal integer;

% E, % E: Numeric value displayed in scientific notation;

% F: displays floating point numbers;

% G, % G: numerical value is displayed in scientific notation or floating-point number format;

% S: string;

% U: unsigned integer;

%: Display % itself

 

Modifier:

# [. #]: First # display width, for example, % 30 s; second. # display decimal point Precision

-: Left alignment

+: Displays numeric symbols.

 

 

 

Vi. awk Variables

6. 1. built-in Variables

Records: Row-related

Fields: field-related

FS: input field seperator, which is a field separator. It is a blank character by default.

# Awk-v FS = ":" '{print $1, $3}'/etc/passwd

 

OFS: output fieldseparator, output field separator

The delimiter between a statement and a statement. The default Delimiter is space.

# Awk 'in in {FS = ":"; OFS = "="} {print $1, $3} '/etc/passwd

RS: input record seperator, which is the delimiter of the input record. The default value is new.

Example: Use a colon as the line break to output the full text

# Awk-v RS = ":" '{print $0}'/etc/passwd

ORS: Outpput Row Seperator, which is the line separator for output;

The default line Delimiter is generally a line break, which can be customized #

The following is to replace all Separators with the separator ":" With the separator:

# Awk 'in in {RS = ":"; ORS = "#"} {print $0} '/etc/passwd

NF: Number of Field, Number of fields in the current record

Count the number of fields in each line in the/etc/issue file:

# Awk '{print NF}'/etc/issue

Note: NF is a variable reference. You do not need to add $, $ NF to display the field location.

 

NR: number of inputrecords, number of current text lines

If there are multiple files, this number will count the processed files in a unified manner.

 

FNR: Unlike NR, FNR is used to record the rows being processed as the total number of rows being processed in the current file.

 

ARGV: array, saving the command itself. awk '{print $0}' file1 file2, meaning ARGV [0] saves awk,

 

ARGC: saves the number of parameters in the awk command, excluding the command itself;

This command has three parameters: awk/etc/fstab/etc/issue

 

FILENAME: current file name

IGNORECASE: determines whether to ignore case-insensitive characters.

 

. Custom Variables

Direct use

-V var = valname: variable names are case sensitive

1. variables can be defined in program

2. variables can be defined in Options

For example:

Equivalent:

# Awk-v file = "passwd" '{printfile, $1}'/etc/passwd

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.