Getting Started with Shell awk

Source: Internet
Author: User
Tags pear

This document is referenced from http://www.cnblogs.com/zhuyp1015/archive/2012/07/11/2586985.html

AWK: useful Data processing tools

Awk is also a great data-processing tool! SED is often used for the processing of an entire row, and awk prefers to divide a row into several "columns"( or, in other words, a column) to handle. Thus, awk is quite suitable for handling small data processing. The pattern that awk typically runs is this:

[[email protected] ~]# awk ' condition type 1{action 1} condition type 2{action 2} ... ' filename

Awk follows two single quotation marks and braces {} To configure the processing action that you want to make to the data. Awk can handle subsequent files, or it can read the standard output from the previous command. However, as stated earlier, AWK is primarily dealing with "data within each row of columns", while the default "field separator" is "blank key" or "[tab] key"! Take the following text as an example, using head to display the data.

Text Name: Log

Apple 6 30

Pear 5 25

Waterlemon 10 2

Orange 2 100

If I want to take out the name of the fruit (the first column) and the price of the fruit (in the third column), and the name and unit price are separated by ":", it will look like this:

$ Head Log | awk ' {print $ ': ' $ i} '

Apple:30

Pear:25

Waterlemon:2

orange:100

The above table is the most commonly used action of awk! List the field data by using the PRINT function! Fields are separated by a blank key (default) or a ":" Symbol (specified). Because no matter what line I have to deal with, so there is no need to have a "condition type" limit!

In addition, you will know from the above example that each field in each row has a variable name, that is, $ ... The name of the variable. In the above example, the fruit name is $ $, because it is the first column! The unit price is the third column, so he is $! Back and so on! And there's a variable Oh! That's the "a whole column of data" meaning--in the above example, the first line of "$" represents "Apple ...". "That line!" So, just above four lines, the entire awk process is:

    1. Read the first line and fill in the first line with the data, $ .... and other variables;
    2. According to the restriction of "condition type", the following "action" should be judged.
    3. Finish all the action and condition types;
    4. If there are subsequent "rows" of data, repeat the above steps until all the data has been read out.

After this step, you will know that awk is "the unit that behaves once", and "the smallest processing unit in a field." Okay, so how does awk know how many lines I have on this data? How many columns do you have? This will require the help of Awk's built-in variables.

TR align= "center" bgcolor= "#182448" >
Variable name means
nf
fs current delimited byte, default is blank key

Let's continue with the example of head log above, if I want to:

    • List the fruit name for each row (that is, $ $);
    • List the number of rows currently processed (that is, the NR variable in awk)
    • And it shows how many columns the line has (that is, the NF variable in awk)

Tips:
  Note that all of AWK's subsequent actions are enclosed in single quotes "'", since both single and double quotes must be paired, so if you want to print the format of awk, remember the non-variable text part, which contains the format mentioned in the previous section of printf , Need to use double quotes to define it! Because the single quote is already a fixed use of AWK's commands!

You can do this:

Head Log | awk ' {print $ ' lines: ' NR ' coloums: ' NF} '
Apple Lines:1 coloums:3
Pear Lines:2 Coloums:3
Waterlemon Lines:3 Coloums:3
Orange Lines:4 Coloums:3

# Note that the NR, NF and other variables in awk are capitalized and do not require a rich-size $!

So can you understand the difference between NR and NF? OK, let's talk about the so-called "condition type".

Note: The whole row is represented by $ A, which represents the first item

The logical operation Byte of awk

Since there is a need to use the "condition" category, there is a natural need for some logical operations, such as the following:

Arithmetic unit Representative meaning
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
== As
!= Not equal to

It is worth noting that the "= =" Symbol, because:

    • The logical operation above is the so-called greater than, less than, and so on judgment above, the habit is to "= =" to express;
    • If you give a value directly, such as a variable configuration, it is used directly.

Well, let's actually use logic to judge it! Use the log file above as an example, but the fields are separated by colons. Let's say I want to check, the third column (quantity) is less than 29 and lists only the fruit name and the third column, separated by a colon, so you can do the following:

$ Head Log | awk ' {fs= ': '} $3<29 {print $ ': ' $ $} '
APPLE:6:30:
Pear:25
Waterlemon:2

But how does the first line show up? And the format is not correct. This is because when we read the first line, those variables are $ ... The default is separated by a blank key, so although we define the fs= ":", it only takes effect after the second line. So what do we do? We can pre-configure AWK's variables! Use the BEGIN keyword! Do this:

$ Head Log | awk ' BEGIN {fs= ': '} $3<29 {print $ ': ' $ $} '
Pear:25
Waterlemon:2

And besides the BEGIN, we have END! In addition, what if you want to use awk for "computational functions"? In the example below, let's say I have a payroll data sheet called pay, which is something like this:

How to help me calculate the total of each person? And I also want to format the output Oh! We can consider this:

    • The first line is just a description, so the first line does not add total (nr==1 time processing);
    • After the second line, there will be a general situation (nr>=2 later processing)

$ Cat Pay | \
> awk ' nr==1 {printf "%10s%10s%10s%10s%10s\n", $1,$2,$3,$4, "total"}
> nr>=2 {total= $ + $ + $ $4
> printf "%10s%10d%10d%10d%10.2f\n", $1,$2,$3,$4,total} '


Me 1st 2nd 3th Total
Vbird 23000 24000 25000 72000.00
Dmtsai 21000 20000 23000 64000.00
Bird2 43000 42000 41000 126000.00

The above example has several important things to explain first:

    • awk command interval: All awk actions, that is, actions within {}, if multiple command assistance is required, use the semicolon ";" Interval, or simply use the [Enter] key to separate each command , such as in the example above.
    • In a logical operation, if you are "equal to", be sure to use the two equals sign "= ="!
    • When formatting the output, be sure to add \ n in the format configuration of printf in order to branch!
    • Unlike the bash shell variables, in awk, variables can be used directly without adding a $ symbol .

With awk, you can help us with a lot of routine work! It's really good. In addition, the output format of awk is often assisted with printf, so it's better to have a little familiarity with printf. In addition, awk's action {} also supports if (conditional)! For example, the above command can be revised to be like this:

# Cat Pay | > awk ' {if (nr==1) printf "%10s%10s%10s%10s%10s\n", $1,$2,$3,$4, "total"}nr>=2{total = $ $ + $ + $4printf "%10s%10 D%10d%10d%10.2f\n ", $, $, $ $, $4, total} '

You can carefully compare the above two input what is different ~ to learn two kinds of grammar! I personally prefer to use the first grammar, because there will be more unity Ah!

Attached: Private dishes from Brother Bird: 12th chapter, formal representation and file format processing

Getting Started with Shell awk

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.