Awk under Linux (reproduced)

Source: Internet
Author: User
Tags first string message queue numeric value

What is awk

Awk is a small programming language and command-line tool. (its name is from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan's surname). It is well suited for log processing on the server, primarily because awk can manipulate files, often building lines in readable text.

I say it applies to servers because log files, dump files, or any text-formatted server that terminates dumps to disk can become very large, and you will have a large number of such files on each server. If you've ever experienced a situation where you have to analyze a few g of files in 50 different servers without a splunk or other equivalent tool, you'll feel bad about getting and downloading all of these files and analyzing them.

I have experienced this situation firsthand. When some Erlang nodes are going to die and leave a 700MB to 4GB crash dump file (crash dump), or when I need to quickly browse the log on a small personal server (called a VPS) to find a regular mode.

In any case, awk is not just looking for data (otherwise, grep or ACK is enough)-it also allows you to process the data and transform the data.

Code Structure

The code structure of the awk script is simple, which is a series of pattern and Behavior (action):

# comment

Pattern1{actions;}

# comment

Pattern2{actions;}

# comment

Pattern3{actions;}

# comment

Pattern4{actions;}

Each row of the scanned document must be matched against each pattern, and only one pattern at a time is matched. So, if I give you a file that contains the following:

This was Line 1
This was line 2

The line 1 will match the PATTERN1. If the match succeeds, the actions are executed. Then this was line 1 will be matched with Pattern2. If the match fails, it jumps to Pattern3 to match, and so on.

Once all of the patterns have been matched, this was line 2 will be matched with the same steps. The other rows are the same until the entire file is read.

In short, this is Awk's running mode

Data Type

Awk has only two primary data types: strings and numbers. Even so, awk's strings and numbers can be converted to each other. A string can be interpreted as a number and its value converted to a numeric value. If the string does not contain a number, it is converted to 0.

They can all use the = operator to assign values to variables in the Actions section of your code. We can declare and use variables at any time, anywhere, or use uninitialized variables, at which point their default value is an empty string: "".

Finally, awk has array types, and they are dynamic one-dimensional associative arrays. Their syntax is this: var[key] = value. AWK can simulate multidimensional arrays, but whatever it is, it's a big technique (big hack).

Mode

The patterns that can be used are divided into three main categories: regular expressions, Boolean expressions, and special patterns.

Regular Expressions and Boolean expressions

You use the awk regular expression to compare the light weight. They are not pcre under awk (but gawk can support the library-it depends on the specific implementation!). Please use awk–version to view), however, the majority of usage requirements are sufficient:

/admin/{...} # Any line that contains ' admin '

/^admin/{...} # lines that begin with ' admin '

/admin$/{...} # lines that end With ' admin '

/^[0-9.] + / {...} # Lines beginning with series of numbers and periods

/(post| Put| DELETE)/# lines that contain specific HTTP verbs

Note that patterns cannot capture specific groups (groups) so that they are executed in the Actions section of the code. Patterns are specifically matched to content.

Boolean expressions are similar to Boolean expressions in PHP or JavaScript. In particular, you can use && in awk ("with"), | | ("or"),! ("not") operator. You can find traces of them in almost all Class C languages. They can operate on regular data.

A more similar feature to PHP and JavaScript is the comparison operator, = =, which will be fuzzy-matched (matching). So the "23" string equals 23, "The 23″== 23 expression returns TRUE.! = operator is also used in awk, and do not forget the other common operators: >,<,>=, and <=.

You can also mix them: Boolean expressions can be used with regular expressions. /admin/| | Debug = True This usage is legal, and the expression will match successfully when it encounters a line containing the word "admin" or the debug variable equals True.

Note that if you have a specific string or variable to match with the regular expression, ~ and!~ are the operators you want. This uses them: string ~/regex/and string!~/regex/.

It is also important to note that all of the patterns are only optional. An awk script that contains the following:

{ACTIONS}

The actions are simply executed for each line you enter.

Special Mode

There are some special patterns in awk, but not many.

The first is the begin, which matches only when all the rows have been entered into the file. This is the main place where you can initialize your script variables and all kinds of states.

The other one is end. As you might have guessed, it will match after all the inputs have been processed. This allows you to do cleanup work and some final output before exiting.

The last type of pattern, it's difficult to classify it. It is between a variable and a special value, which we typically call a field. and truly.

Domain

Use the intuitive example to better interpret the domain:

# according to the following line

#

# $ $ $

# 00:34:23 Get/foo/bar.html

# _____________ _____________/

# $

# Hack attempt?

/admin.html$/&& $ = = "DELETE" {

Print "Hacker alert!";

}

Fields (by default) are separated by spaces. The $ A field represents a whole line of strings. The domain is the first string (before any spaces), the $ domain is the latter, and so on.

An interesting fact (and something we want to avoid in most cases), you can modify the corresponding row by assigning a value to the corresponding field. For example, if you execute $ = "HAHA the line is GONE" in a block, now the next mode will operate on the modified row instead of the original row. The other domain variables are similar.

Behavior

There are a bunch of available behaviors (possible actions), but the most common and useful behaviors (in my experience) are:

{print$0;} # prints. Equivalent to ' print ' alone

{exit;} # Ends The program

{Next;} # skips to the next line of input

{a=$1;b=$0}# variable assignment

{c[$1] = $2}# variable Assignment (array)

{if (BOOLEAN) {ACTION}

ElseIf (BOOLEAN) {ACTION}

Else{action}

}

{for (i=1;i<x;i++) {ACTION}}

{for Item inc {ACTION}}

These will be the main tools of your AWK toolkit, and you can use them whenever you work with files such as logs.

The variables in awk are global variables. No matter what variable you define in a given block, it is visible to the other blocks, even to each row. This severely limits your awk script size, or they can cause horrible results that are not maintainable. Please write as small a script as possible.

function

You can use the following syntax to invoke a function:

{Somecall ($)}

Here are some limited built-in functions that can be used, so I can give generic documentation of these functions (regular documentation).

User-defined functions are also simple:

# function arguments is Call-by-value

functionname (parameter-list) {

actions;# same actions as usual

}

# return is a valid keyword

Functionadd1 (val) {

returnval+1;

}

Special Variables

In addition to the regular variables (global, which can be used anywhere), there are a number of special variables, which function somewhat like configuration entries (config entries):

begin{# Can modified by the user

FS = ","; # Field Separator

RS = "n"; # Record Separator (lines)

OFS = ""; # Output Field Separator

ORS = "n"; # Output Record Separator (lines)

}

{# Can ' t be modified by the user

nf# number of fields in the current Record (line)

nr# number of Records seen so far

argv/argc# Script Arguments

}

I put modifiable variables in the begin, because I prefer to rewrite them there. However, the rewrite of these variables can be placed anywhere in the script and then take effect in the following line.

Example

These are the core elements of the awk language. I don't have a lot of examples here, because I tend to use awk to do a quick one-time task.

But I still have some script files that I carry with me to handle things and tests. One of my favorite scripts is to handle the crash dump file for Erlang, as follows:

=erl_crash_dump:0.3

Tue nov1802:52:442014

Slogan:init terminating Indo_boot ()

System Version:erlang/otp17[erts-6.2][source][64-bit][smp:8:8][async-threads:10][hipe][kernel-poll:false]

Compiled:fri sep1903:23:192014

Taints:

atoms:12167

=memory

total:19012936

processes:4327912

processes_used:4319928

system:14685024

atom:339441

atom_used:331087

binary:1367680

code:8384804

ets:382552

=hash_table:atom_tab

size:9643

used:6949

...

=allocator:instr

Optionm:false

Options:false

Optiont:false

=proc:<0.0.0>

State:running

Name:init

Spawned AS:OTP_RING0:START/2

Run queue:0

Spawned by: []

Started:tue nov1802:52:352014

Message Queue length:0

Number of heap fragments:0

Heap Fragment data:0

linklist: [<0.3.0>, <0.7.0>, <0.6.0>]

reductions:29265

stack+heap:1598

oldheap:610

Heap unused:656

Oldheap unused:468

memory:18584

Program COUNTER:0X00007F42F9566200 (INIT:BOOT_LOOP/2 + 64)

cp:0x0000000000000000 (Invalid)

=proc:<0.3.0>

State:waiting

...

=port: #Port <0.0>

slot:0

Connected: <0.3.0>

Links: <0.3.0>

Port Controls Linked-indriver:efile

=port: #Port <0.14>

slot:112

Connected: <0.3.0>

...

Produces the following result:

$awk-fqueue_fun.awk$path_to_dump

MESSAGE QUEUE length:current FUNCTION

======================================

10641:io:wait_io_mon_reply/2

12646:io:wait_io_mon_reply/2

32991:io:wait_io_mon_reply/2

2183837:io:wait_io_mon_reply/2

730790:io:wait_io_mon_reply/2

80194:io:wait_io_mon_reply/2

...

This is a list of functions that run in the Erlang process, which causes the mailboxe to become very large. The script is in this:

# Parse Erlang Crash dumps and correlate mailbox size to the currently running

# function.

#

# Once in the procs sections of the dump, all processes is displayed with

# =proc:<0.m.n> followed by a list of their attributes, which include the

# Message Queue Length and the program counter (what code is currently

# executing).

#

# Run as:

#

# $ awk-v threshold= $THRESHOLD-F Queue_fun.awk $CRASHDUMP

#

# Where $THRESHOLD is the smallest mailbox you want inspects. Default value

# is.

begin{

if (threshold = = "") {

Threshold = 1000# Default mailbox size

}

Procs = 0# is we in the =procs entries?

Print "MESSAGE QUEUE length:current FUNCTION"

Print "======================================"

}

# only bother with the =proc:entries. Anything else is useless.

Procs = = 0 &&/^=proc/{procs = 1}# entering the =procs entries

Procs = = 1 &&/^=/&&!/^=proc/{exit0}# we ' re done

# Message Queue length:1210

# 1 2 3 4

/^message Queue Length:/&& $4 >= threshold{flag=1;ct=$4}

/^message Queue Length:/&& $4 < threshold{flag=0}

# program counter:0x00007f5fb8cb2238 (IO:WAIT_IO_MON_REPLY/2 + 56)

# 1 2 3 4 5 6

Flag = = 1 &&/^program counter:/{print CT ":", substr ($4,2)}

Did you keep up with the idea? If you keep up, you've learned about awk. Congratulations!

Awk under Linux (reproduced)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.