Introduction to the powerful Linux Command Awk in 20 minutes

Source: Internet
Author: User
Tags first string

Introduction to the powerful Linux Command Awk in 20 minutes
What is Awk?

Awk is a small programming language and command line tool. (The name is derived from the first letter of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan ). It is very suitable for log processing on servers, mainly because Awk can operate on files and usually build lines with readable text.

I said it applies to the server because of log files, dump files, or termination dump files of any text format to the disk, in addition, you have a large number of such files on each server. If you have experienced this situation-you have to analyze several GB of files on 50 different servers without tools like Splunk or other equivalent tools, you may find it terrible to get and download all these files and analyze them.

I have personally experienced this situation. When some Erlang nodes are about to die and leave a crash dump file (crash dump) of MB to 4 GB, or when I need to be on a small personal server (called VPS) when searching for a regular mode.

In any case, Awk is not just used to find data (otherwise, grep or ack is enough)-it also enables you to process and convert data.

-------------------------------------- Split line --------------------------------------

Introduction and use of AWK

AWK introduction and Examples

Shell script-AWK text editor syntax

Learning and using AWK in Regular Expressions

AWK diagram of Text Data Processing

How to Use the awk command in Linux

Text Analysis Tool-awk

-------------------------------------- Split line --------------------------------------

Code structure

The code structure of the Awk script is very simple. It is a series of pattern and action ):

# commentPattern1 { ACTIONS; }# commentPattern2 { ACTIONS; }# commentPattern3 { ACTIONS; }# commentPattern4 { ACTIONS; }

Each line of the scanned document must be matched and compared with each pattern, and only one pattern is matched at a time. If I provide a file containing the following content:

This is line 1

This is line 2

This is line 1 will match Pattern1. If the match is successful, ACTIONS is executed. Then this is line 1 will match Pattern2. If the match fails, it will jump to Pattern3 for matching, and so on.

Once all the modes have been matched, this is line 2 will be matched in the same step. The same applies to other rows until the entire file is read.

In short, this is the running mode of Awk.

Data Type

Awk has only two main data types: string and number. Even so, Awk strings and numbers can be converted to each other. A string can be interpreted as a number and its value is converted to a numeric value. If the string does not contain numbers, it is converted to 0.

They can all assign values to variables using the = operator in the ACTIONS section of your code. We can declare and use variables at any time and anywhere, or use uninitialized variables. At this time, their default value is an empty string: "".

Finally, Awk has an array type and they are dynamic one-dimensional correlated arrays. Their syntax is as follows: var [key] = value. Awk can simulate multi-dimensional arrays, but in any case, this is a big hack ).

Mode

There are three types of patterns available: regular expressions, Boolean expressions, and special patterns.

Regular Expressions and boolean expressions

The Awk regular expression you are using is lightweight. They are not PCRE under the Awk (but gawk can support this library-this depends on the specific implementation! Use awk

-Version view). However, most of the usage requirements are sufficient:

/admin/ { ... } # any line that contains 'admin'/^admin/ { ... } # lines that begin with 'admin'/admin$/ { ... } # lines that end with 'admin'/^[0-9.]+ / { ... } # lines beginning with series of numbers and periods/(POST|PUT|DELETE)/ # lines that contain specific HTTP verbs

Note that the mode cannot capture specific groups so that they are executed in the ACTIONS part of the code. The pattern specifically matches the content.

The Boolean expression is similar to the Boolean expression in PHP or Javascript. In particular, you can use & ("and"), | ("or"), and "),! ("Not") operator. You can find traces of them in almost all Class C languages. They can operate on common data.

A more similar feature to PHP and Javascript is the comparison operator, =, which performs fuzzy matching ). Therefore, the "23" string is equal to 23, and the "23" = 23 expression returns true .! = Operators are also used in awk, and do not forget other common operators: >,<,>=, and <=.

You can also use them together: boolean expressions can be used together with regular expressions. /Admin/| debug = true: the expression is valid when the line containing the word "admin" or the debug variable is true.

Note: If you have a specific string or variable that must match the regular expression ,~ And !~ Is the operator you want. Use them as follows: string ~ /Regex/and string !~ /Regex /.

Note that all modes are optional. An Awk script containing the following content:

{ACTIONS}

ACTIONS is executed for each line of input.

Special Mode

There are some special modes in Awk, but not many.

The first one is BEGIN, which matches only before all rows are input to the file. This is the main place where you can initialize your script variables and statuses of all types.

The other is END. As you may have guessed, it will match all the input. This allows you to clear the job and some final output before exiting.

It is difficult to classify the last mode. It is between variables and special values, which are usually called fields ). It is also worthy of the name.

Domain

An intuitive example can better explain the domain:

# According to the following line## $1 $2 $3# 00:34:23 GET /foo/bar.html# _____________ _____________/# $0# Hack attempt?/admin.html$/ && $2 == "DELETE" {print "Hacker Alert!";}

Fields (by default) are separated by spaces. The $0 field represents a whole line of strings. $1 indicates the first string (before any space), $2 indicates the last string, and so on.

An interesting fact (and things we want to avoid in most cases), you can modify the corresponding rows by assigning values to the corresponding domains. For example, if you execute $0 = "haha the line is gone" in a block, THE next mode will operate THE modified row instead of THE original row. Other domain variables are similar.

Action

There are a bunch of available behaviors (possible actions), but the most common and useful behavior (in my experience) is:

{ print $0; } # prints $0. In this case, equivalent to 'print' alone{ exit; } # ends the program{ next; } # skips to the next line of input{ a=$1; b=$0 } # variable assignment{ c[$1] = $2 } # variable assignment (array){ if (BOOLEAN) { ACTION }else if (BOOLEAN) { ACTION }else { ACTION }}{ for (i=1; i<x; i++) { ACTION } }{ for (item in c) { ACTION } }

This content will become the main tool of your Awk toolbox. You can use it whenever you process files such as logs.

All the variables in Awk are global variables. No matter what variables you define in the given block, it is visible to other blocks, or even to each row. This severely limits the size of your Awk scripts, otherwise they will cause unmaintainable and terrible results. Write as few scripts as possible.

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.