Small usage, use awk to randomly extract n rows from a file _shell

Source: Internet
Author: User
Tags numeric value

Http://www.cnblogs.com/chenhuan001/p/6297615.html

From this small usage, use awk to randomly extract n rows from a file

Brief introduction

Awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing and generating reports on data, compared to grep lookup and sed editing. To put it simply, awk reads the file line by row, using a space as the default delimiter to slice each row, cut the section, and then perform various analytical processing.

AWK has 3 different versions: AWK, Nawk, and gawk, which are not specifically described, and generally refer to Gawk,gawk as the GNU version of awk.

Awk has its name from the first letter of its founder Alfred Aho, Peter Weinberger and Brian Kernighan. In fact Awk does have its own language: The AWK programming language, which the three-bit creator has formally defined as "style scanning and processing language." It allows you to create short programs that read input files, sort data, process data, perform calculations on input, and generate reports, as well as countless other features.

How to use

awk ' {pattern + action} ' {filenames}

Although the operation can be complex, the syntax is always the same, where pattern represents what AWK looks for in the data, and the action is a series of commands that are executed when the matching content is found. Curly braces ({}) do not need to appear in the program at all times, but they are used to group a series of instructions according to a specific pattern. pattern is the regular expression to be represented, surrounded by slashes.

The most basic function of the awk language is to browse and extract information based on specified rules in a file or string, and awk extracts the information before it can perform other text operations. A complete awk script is typically used to format the information in a text file.

Typically, awk handles units as an act of a file. awk processes the text every single line that receives the file, and then executes the appropriate command.

Invoke awk

There are three ways of calling Awk

1. Command-Line way
awk [f  field-separator]  ' commands '  input-file (s)
where commands is the true awk command, [-f field separator] is optional. Input-file (s) is the file to be processed. In
awk, each item in a file, separated by a domain delimiter, is called a domain. In general, the default field delimiter is a space without naming the-f field separator. The

2.shell script
inserts all of the awk commands into a file and enables the AWK program to execute, and then the awk command interpreter is invoked as the first line of the script, once again by typing the script name.
equivalent to the first line of the shell script: #!/bin/sh
can be replaced by: #!/bin/awk

3. Insert all awk commands into a single file, and then call:
awk-f awk-script-file Input-file (s)
where the-f option loads the awk script in Awk-script-file, Input-file (s) is the same as above.

This chapter focuses on the command-line approach.

Getting Started example

Suppose the output of Last-n 5 is as follows

[Root@www ~]# last-n 5 <== Only remove the first five elements
root     pts/1   192.168.1.100 Tue Feb a  11:21
still logged in Root     pts/1   192.168.1.100  Tue Feb 00:46-02:28  (01:41)
root     pts/1   192.168.1.100  Mon Feb  9 11:41-18:30  (06:48)
dmtsai   pts/1   192.168.1.100  Mon Feb  9 11:41- 11:41  (00:00)
root     tty1                   Fri Sep  5 14:09-14:10  (00:01)

If you just display the last 5 accounts that you've logged in

#last-N 5 | awk  ' {print '} '
Root
Root
Root
Dmtsai
Root

The awk workflow is as follows: Read a record with a ' \ n ' newline character split, then divide the record by the specified field delimiter, fill the field, and the $ $ represents all the fields, representing the first field, $n the nth field. The default Domain delimiter is the blank key or the [tab] key, so it represents the Logged-in user, the $ $ means the logged-on user IP, and so on.

If you just show/etc/passwd's account

#cat/etc/passwd |awk  -F ': '  {print $} '  
root
daemon
bin
sys

This is an example of awk+action, where each row executes action{print $}.

-f Specifies that the field separator is ': '.

If you just display the/etc/passwd account and the corresponding shell of the account, and the account and the shell are separated by the TAB key

#cat/etc/passwd |awk  -F ': '  {print $ \ t ' $} '
root    /bin/bash
daemon  /bin/sh
bin     /bin/sh
sys     /bin/sh

Add "Blue,/bin/nosh" on the last line if you just display the/etc/passwd account and the corresponding shell of the account, and the account is separated from the shell by a comma, and the column name Name,shell is added to all rows.

CAT/ETC/PASSWD |awk  -F ': '  BEGIN {print ' Name,shell '}  {print $ ', ' $} end {print ' Blue,/bin/nosh '} ' C13/>name,shell
root,/bin/bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh ...
.
Blue,/bin/nosh

The awk workflow is like this: first executes the beging, then reads the file, reads a record with the/n newline character split, then divides the record by the specified field delimiter, fills the field, and $ represents all fields, the first field, $n the nth field, The action action for the pattern is then started. Then start reading the second record. Until all the records have been read, the end operation is performed.

Search for all rows with the root keyword/etc/passwd

#awk-F: '/root/'/etc/passwd
root:x:0:0:root:/root:/bin/bash

This is an example of pattern usage, where the line that matches the pattern (here is root) executes the action (no action is specified and the content of each row is output by default).

Search support Regular, for example to start with root: awk-f: '/^root/'/etc/passwd

Search for all lines with the root keyword in the/etc/passwd and display the corresponding shell

# awk-f: '/root/{print $} '/etc/passwd             
/bin/bash

Action{print $} was specified here

awk Built-in variables

Awk has a number of built-in variables to set up environment information, which can be changed, and some of the most commonly used variables are given below.

ARGC the               number of command-line arguments
ARGV               command line arguments
ENVIRON the use of            system environment variables in the support queue filename           awk Browse file name
FNR                Number of records to browse files
FS                 set input field separator, equivalent to command line-F option
NF browsing record number of records
NR                 read record
OFS                output field separator
ORS                Output Record separator
RS                 Control Record Separator

In addition, the $ variable refers to the entire record. Represents the first field in the current row, and $ $ represents the second field of the current row,...... Analogy

Statistics/etc/passwd: File name, line number for each line, number of columns per row, corresponding full line content:

#awk-  F ': '  {print ' filename: "filename", linenumber: "NR", Columns: "NF", Linecontent: "$}"/etc/passwd
Filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/ Passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
filename:/etc/passwd, Linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
Filename:/etc/passwd,linenumber:4,columns:7, Linecontent:sys:x:3:3:sys:/dev:/bin/sh

Use printf instead of print to make your code simpler and easier to read

Awk-  F ': '  {printf ("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n", Filename,nr,nf,$0)} '/etc/ passwd

Print and printf

The functions of print and printf two kinds of printouts are also available in awk.

Where the print function argument can be a variable, a numeric value, or a string. The string must be quoted in double quotes and the arguments are separated by commas. If there are no commas, the arguments are concatenated together without distinction. Here, the function of the comma is the same as the delimiter of the output file, except that the latter is a space.

printf functions, which are basically similar to printf in the C language, can format strings, and when output is complex, printf works better and the code is easier to understand.

AWK programming

Variables and Assignments

In addition to the built-in variables of awk, awk can also customize variables.

The following statistics/etc/passwd account number

awk ' {count++;p rint $} End{print "User Count is", count} '/etc/passwd
root:x:0:0:root:/root:/bin/bash ...
User Count is 40

Count is a custom variable. Before the action{} is only one print, in fact, print is only a statement, and action{} can have multiple statements, separated by a.

Count is not initialized here, although the default is 0, the proper approach is to initialize to 0:

awk ' BEGIN {count=0;print ' [Start]user count is ', count} {count=count+1;print $} End{print [End]user Count is], count} '/etc/passwd
[start]user count is  0
root:x:0:0:root:/root:/bin/ Bash
...
[End]user count is  40

Count the number of bytes in a file under a folder

Ls-l |awk ' BEGIN {size=0;} {size=size+$5;} End{print "[End]size is", size} '
[End]size is 8657198

If displayed in M:


[End]size is 8.25889 M

Note that statistics do not include subdirectories of folders.

Conditional statement

The conditional statements in awk are drawn from the C language, as in the following declarations:

if (expression) {
    statement;
    statement;
    ... ...
}

if (expression) {
    statement;}
else {
    statement2;
}

if (expression) {
    Statement1
} else if (expression1) {
    statement2;}
else {
    statement3;
}

Counts the number of bytes of files under a folder, filtering files of 4096 sizes (typically folders):


[End]size is 8.22339 M

Loop statement

The looping statements in awk also refer to the C language and support while, Do/while, for, break, and continue, which are semantically identical to the semantics of the C language.

Array

Because the subscripts of an array in awk can be numbers and letters, the subscript of an array is often called a keyword. Both values and keywords are stored inside a table that applies a hash to the key/value. Because the hash is not sequential, it is shown that the contents of the array are not displayed in the order that you expect. Arrays and variables are created automatically when they are used, and awk automatically determines whether they store numbers or strings. In general, an array in awk is used to gather information from records, to calculate totals, to count words, and to track how many times a template is matched, and so on.

Show/ETC/PASSWD's account

Awk-f ': ' BEGIN {count=0} {Name[count] = $1;count++; End{for (i = 0; i < NR; i++) print I, Name[i]} '/etc/passwd
0 root
1 daemon
2 bin
3 sys
4 sync
5 games ...

This uses the For loop to traverse the array

AWK programming is very much, here only a list of simple common usage, more please refer to http://www.gnu.org/software/gawk/manual/gawk.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.