Common CentOS Shell skills awk Programming

Source: Internet
Author: User
Tags month name natural logarithm

Awk programming:

1. variables:
In awk, variables can be used without definition. When values are assigned, they are defined. The variable type can be numbers or strings. The value of the uninitialized variable is 0 or a blank string "" depending on the context of the variable application. The following lists the values of variable negative signs:

Symbol Meaning equivalent form = a = 5a = 5 + = a + 5a + = 5-= a-5a-= 5 * = a * 5a * = 5 /= a/5a/= 5% = a % 5a % = 5 ^ = a ^ 5a ^ = 5

/> Awk '$1 ~ /Tom/{Wage = $2 * $3; print Wage} 'filename
This command reads data from the file, searches for records in which the first domain field Matches Tom, and then assigns the product of the second and third fields to the custom Wage variable, finally, print the variable.

/> Awk '{$5 = 1000 * $3/$2; print}' filename
In the preceding command, if $5 does not exist, awk calculates the value of the expression 1000*$3/$2 and assigns it to $5. If the Fifth domain exists, use an expression to overwrite the original value of $5.

You can also define custom variables in the command line as follows:
/> Awk-F:-f awkscript month = 4 year = 2011 filename
Here, both month and year are User-Defined variables and are assigned 4 and 2000 values respectively. These variables can be directly used in awk scripts, they are no different from the variables defined in the script in use.

In addition, awk also provides a set of variable creation (all variable names are capitalized), as shown in the following list:

Variable name the number of ARGC command line parameters. The AGV index of the current file being processed by the ARGIND command line. ARGV command line parameter array. CONVFMT converts the numeric format. The array passed by ENVIRON from the shell that contains the current environment variable. ERRNO when you use the close function or the getline function to read data, the description of the redirection error is stored in this variable. FIELDWIDTHS can replace the list of FS delimiters when dividing records by fixed domain width. The current input file name of FILENAME. The record number of the current FNR file. The input delimiter of FS. The default Delimiter is space. IGNORECASE disables case sensitivity in regular expressions and string operations. The number of NF file domains. The number of records of the current NR file. OFMT digital output format. OFS output domain separator. ORS output record delimiter. The length of the string matching by the match function. RS input record delimiter. The offset of the string matching by the match function. SUBSEP subscript delimiter.

/> Cat employees2
Tom Jones: 4424: 5/12/66: 543354
Mary Adams: 5346: 11/4/63: 28765
Sally Chang: 1654: 7/22/54: 650000
Mary Black: 1683: 9/23/44: 336500

/> Awk-F: '{IGNORECASE = 1}; $1 = "mary adams" {print NR, $1, $2, $ NF}' employees2
2 Mary Adams 5346 28765
/> Awk-F: '$1 = "mary adams" {print NR, $1, $2, $ NF}' employees2
No output results.
When the value of the IGNORECASE built-in variable is not 0, it indicates that the case sensitivity is disabled when string operations and regular expressions are processed. Here, "mary adams" will match the "Mary Admams" record in the file. Print the first, second, and last fields. Note that NF indicates the number of the current record domains, so $ NF indicates the value of the last domain.

Awk also provides the BEGIN block and END block in the action part. The BEGIN action block is executed before awk processes any input file lines. In fact, the BEGIN block can be tested without any input file. Because awk will not read any input files until the BEGIN block is executed. The BEGIN block is usually used to change the value of built-in variables, such as OFS, RS, and FS. It can also be used to initialize custom variable values or print the output title.
/> Awk 'in in {FS = ":"; OFS = "\ t"; ORS = "\ n"} {print $1, $2, $3} filename
In the above example, awk has set the domain separator (FS) as a colon Before processing the file, the output file domain separator (OFS) as a tab, and the output record separator (ORS) it is set to two line breaks. If there are multiple statements in the action module after BEGIN, they are separated by semicolons.
Unlike BEGIN, the action in the END module is executed after the entire file is processed.
/> Awk 'end {print "The number of the records is" NR} 'filename
After processing the input file, awk executes the action in the END module. In the above example, the NR value is the record number of the last record read.

/> Awk '/Mary/{count ++} END {print "Mary was found" count "times."}' employees2
Mary was found 2 times.

/> Awk '/Mary/{count ++} END {print "Mary was found" count "times."}' employees2
Mary was found 2 times.

/> Cat testfile
Northwest NW Charles Main 3.0. 98 3 34
Western WE Sharon Gray 5.3. 97 5 23
Southwest SW Lewis Dalsass 2.7. 8 2 18
Southern SO Suan Chin 5.1. 95 4 15
Southeast SE Patricia Hemenway 4.0. 7 4 17
Eastern ea tb save age 4.4. 84 5 20
Northeast ne am Main Jr. 5.1. 94 3 13
North NO Margot Weber 4.5. 89 5 9
Central CT Ann Stephen 5.7. 94 5 13

/> Awk '/^ north/{count + = 1; print count} 'testfile# If the record starts with regular north, the variable count will be created and added at the same time, and its value will be output.
1
2
3

# Here, only the first three fields are output. The seventh field is assigned to the variable x first, and then the variable x is automatically subtracted and printed at the same time.
/> Awk 'nr <= 3 {x = $7 --; print "x =" x ", $7 =" $7} 'testfile
X = 3, $7 = 2
X = 5, $7 = 4
X = 2, $7 = 1

# Print records with a NR (Record Number) value between 2 and 5.
/> Awk 'nr = 2, NR = 5 {print "The record number is" NR} 'testfile
The record number is 2
The record number is 3
The record number is 4
The record number is 5

# Print the values of environment variables USER and HOME. The value of the environment variable is passed to the awk program by the parent process shell.
/> Awk 'in in {print ENVIRON ["USER"], ENVIRON ["HOME"]}'
Root/root

# The OFS built-in variables are assigned a new value to the BEGIN block, so the output domain separator is changed to \ t.
/> Awk 'in in {OFS = "\ t"};/^ Sharon/{print $1, $2, $7} 'testfile
Western WE 5

# Find the record count starting with north in the input file and add one. Finally, output the variable in the END block.
/> Awk '/^ north/{count ++}; END {print count}' testfile
3

2. Redirect:
You can use the shell universal redirection output symbol ">" in the Action Statement to complete the awk redirection operation. When> is used, the original file will be cleared and the file will be opened continuously, until the file is explicitly closed or the awk program is terminated. The output from the following print statement is appended to the back of the preceding content. The symbol ">" is used to open a file but does not clear the content of the original file. The redirected output is appended to the end of the file.
/> Awk '$4 >=70 {print $1, $2> "passing_file"}' filename# Note that the file name must be enclosed in double quotation marks.
# You can see the difference between the two cat results>.
/> Awk '/north/{print $1, $3, $4> "districts"} 'testfile
/> Cat districts
Northwest Joel Craig
Northeast TJ Nicolas
North Val Shultz
/> Awk '/south/{print $1, $3, $4> "districts"} 'testfile
/> Cat districts
Northwest Joel Craig
Northeast TJ Nicolas
North Val Shultz
Southwest Chris Foster
Southern May Chin
Southeast Derek Jonhson


In awk, The getline function is used to redirect input. The getline function is used to obtain input from standard input, MPs queue, or other input files that are currently being processed. He obtains the content of the next line from the input, and assigns values to built-in variables such as NF, NR, and FNR. If a record is obtained, getline returns 1, and 0 if it reaches the end of the file. If an error occurs, if the file fails to be opened,-1 is returned.
/> Awk 'in in {"date" | getline d; print d }'
Tue Nov 15 15:31:42 CST 2011
In the BEGIN action module in the preceding example, execute the shell command date and output it to getline through the pipeline. Then, assign the output value to the custom variable d and print the output.

/> Awk 'in in {"date" | getline d; split (d, mon); print mon [2]}'
Nov
In the preceding example, the date command is output to getline through a pipeline and assigned to the d variable. Then, the built-in function split is used to split d into a mon array, and the second element of the mon array is printed.

/> Awk 'in in {while ("ls" | getline) print }'
Employees
Employees2
Testfile
The output of the command ls is passed to getline as the input. Every repetition of the loop, getline reads a line of input from the ls result and prints it to the screen.

/> Awk 'in in {printf "What is your name? ";\
Getline name <"/dev/tty "}\
$1 ~ Name {print "Found" name "on line", NR "."}\
END {print "See ya," name "."} 'ployees2
What is your name? Mary
Found Mary on line 2.
See ya, Mary.
In the preceding example, the "What is your name? ", Then wait for the user to input from/dev/tty, assign the read data to the name variable, and then read the record from the input file, find and print the records matching the input variables, and output the END information in the END block.

/> Awk 'in in {while (getline <"/etc/passwd"> 0) lc ++; print lc }'
32
Awk will read the content in the/etc/passwd file row by row. Until the end of the file is reached, the counter lc will keep increasing by 1. When the end is reached, the lc value will be printed. The lc value is the number of rows in the/etc/passwd file.
Because there is only one pipeline opened at the same time in the awk, you must close it before opening the next pipeline. You can close the pipeline by double quotation marks on the right of the pipeline symbol. If it is not disabled, it will remain open until the awk exits.
/> Awk {print $1, $2, $3 | "sort-4 + 1-2 + 0-1"} END {close ("sort-4 + 1-2 + 0-1")} filename
In the preceding example, the close command in the END module shows that the sort pipeline is closed. Note that the close command in the close command must be exactly the same as that in the original Open command, otherwise, the output produced by the END module is classified by sort together with the previous output.


3. Condition Statement:
The condition statements in the awk are used for reference in the C language. See the following declaration method:
If (expression ){
Statement;
Statement;
......
}
/> Awk '{if ($6> 50) print $1 "Too hign"}' filename
/> Awk '{if ($6> 20 & $6 <= 50) {safe ++; print "OK}' filename

If (expression ){
Statement;
} Else {
Statement2;
}
/> Awk '{if ($6> 50) print $1 "Too high"; else print "Range is OK"}' filename
/> Awk '{if ($6> 50) {count ++; print $3} else {x = 5; print $5}' filename

If (expression ){
Statement1;
} Else if (expression1 ){
Statement2;
} Else {
Statement3;
}
/> Awk '{if ($6> 50) print "$6> 50" else if ($6> 30) print "$6> 30" else print "other"} 'filename'

4. Loop statement:
The loop statements in awk are also used in C language and support while, do/while, for, break, and continue. These keywords have the same semantics as those in C language.

5. Process Control statement:
The next statement reads the next line from the file and then runs the awk script from the beginning.
The exit statement is used to end the awk program. It terminates the processing of records. However, the END module is not skipped. If the exit () Statement is assigned a value between 0--255, for example, exit (1), this parameter is printed to the command line to determine whether the exit succeeds or fails.

6. array:
Because the subscript of an array in awk can be numbers and letters, the subscript of an array is usually called a key ). Both values and keywords are stored in an internal table that uses hash for key/value applications. Because hash is not stored in sequence, you will find that the array content is not displayed in the expected order. Arrays and variables are automatically created when they are used, and awk automatically determines whether they are stored as numbers or strings. In general, arrays in awk are used to collect information from records. They can be used to calculate the sum, count words, and track the number of times the template is matched.
/> Cat employees
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500

/> Awk '{name [x ++] = $2}; END {for (I = 0; I <NR; I ++) print I, name [I]} 'ployees
0 Jones
1 Adams
2 Chang
3 Black
In the preceding example, the subscript of array name is variable x. Awk initializes the value of this variable to 0. after each use, it increases by 1. The value of the second field in the Read File is assigned to each element of the name array in turn. In the END module, the for loop traverses the value of the array. Because subscript is a keyword, it does not necessarily start from 0, but can start from any value.

# Here, the built-in variable NR is used as the subscript of the array.
/> Awk '{id [NR] = $3}; END {for (x = 1; x <= NR; x ++) print id [x]}' employees
4424
5346
1654
1683

Awk also provides a special for loop. See the following statement:
For (item in arrayname ){
Print arrayname [item]
}

/> Cat db
Tom Jones
Mary Adams
Sally Chang
Billy Black
Tom Savage
Tom Chung
Reggie Steel
Tommy Tucker

/> Awk '/^ Tom/{name [NR] = $1}; END {for (I = 1; I <= NR; I ++) print name [I]} 'db
Tom



Tom
Tom

Tommy
The output results show that only the first field of the record matching the regular expression is assigned to the specified subscript element of the array name. Because NR is used as the subscript, the subscript of the array cannot be continuous, because when traditional for loop printing is used in the END module, null strings are printed for non-existent elements. Next, let's take a look at the output in special.
/> Awk '/^ Tom/{name [NR] = $1}; END {for (I in name) print name [I]} 'db
Tom
Tom
Tommy
Tom

Next let's take a look at the example using a string as the underlying object: (if the subscript is a string literal constant, it needs to be enclosed in double quotation marks)
/> Cat testfile2
Tom
Mary
Sean
Tom
Mary
Mary
Bob
Mary
Alex
/> Awk '/tom/{count ["tom"] ++};/mary/{count ["mary"] ++ }; END {print "There are" count ["tom"] \
"Toms and" count ["mary"] "Marys in the file."} testfile2
There are 2 Toms and 4 Marys in the file.
In the preceding example, the count array has two elements: tom and mary. The initial values of each element are 0, count ["tom"] adds one, and count ["mary"] matches mary. The END module prints the elements stored in the array.

/> Awk '{count [$1] ++}; END {for (name in count) printf "%-5 s % d \ n", name, count [name]} 'testfile2
Mary 4
Tom 2
Alex 1
Bob 1
Sean 1
In the preceding example, awk uses the recorded domain as the subscript of the array count.

/> Awk '{count [$1] ++; if (count [$1]> 1) name [$1] ++ }; END {print "The duplicates were"; for (I in name) print I} 'testfile2
The duplicates were
Mary
Tom
In the preceding example, when the element value of count [$1] is greater than 1, that is, when the name appears multiple times, a new array name will be initialized, finally, the name subscript repeated in the array is printed.

Previously, we introduced how to add new elements to the array and assign initial values. Now we need to introduce how to delete existing elements in the array. To complete this function, we need to use the built-in function delete, as shown in the following command:
/> Awk '{count [$1] ++ };\
END {for (name in count ){\
If (count [name] = 1 )\
Delete count [name]; \
}\
For (name in count )\
Print name} 'testfile2
Mary
Tom
In the above example, the main technique is from the END module. First, the variable count array. If the value of an element in the array is equal to 1, the element is deleted, which is equivalent to the name that appears only once. Finally, use special for to print out the element subscript name that still exists in the array.

Finally, let's take a look at how to use the command line parameter array. See the following command:
/> Awk 'begin {for (I = 0; I <ARGC; I ++) printf ("argv [% d] is % s. \ n ", I, ARGV [I]); printf (" The number of arguments, ARGC = % d \ n ", ARGC)} 'testfile" Peter Pan "12
Argv [0] is awk.
Argv [1] is testfile.
Argv [2] is Peter Pan.
Argv [3] is 12.
The number of arguments, ARGC = 4
From the output, we can see that the argument array ARGV of the command line uses 0 as the starting base, and the first argument of the command line is the command itself (awk ), the usage is exactly the same as that of the main function of the C statement.

/> Awk 'in in {name = ARGV [2]; print "ARGV [2] is" ARGV [2]}; $1 ~ Name {print $0} 'testfile2 "bob"
ARGV [2] is bob
Bob
Awk: (FILENAME = testfile2 FNR = 9) fatal: cannot open file 'bob' for reading (No such file or directory)
First, explain the meaning of the preceding command. The name variable is assigned the third parameter of the command line, that is, bob. Then, find the record matching the variable value in the input file, and print the record.
In the second output line, the awk processing error message is reported. This is mainly because awk regards bob as an input file for processing. However, in fact this file does not exist, next, we need to solve this problem further.
/> Awk 'in in {name = ARGV [2]; print "ARGV [2] is" ARGV [2]; delete ARGV [2]}; $1 ~ Name {print $0} 'testfile2 "bob"
ARGV [2] is bob
Bob
From the output results, we can see that we have obtained the expected results. Note that the call of the delete function must be completed in the BEGIN module, because awk has not started to read the file specified in the command line parameters.

7. built-in functions:
String Functions
Sub (regular expression, substitution string );
Sub (regular expression, substitution string, target string );

/> Awk '{sub ("Tom", "Tommy"); print}' employees # Tom is replaced here.
Tommy Jones 4424 5/12/66 543354

# When the regular expression Tom is matched for the first time in the first domain, it will be replaced by the string "Tommy". If you change the third parameter of the sub function to $2, there will be no replacement.
/> Awk '{sub ("Tom", "Tommy", $1); print}' employees
Tommy Jones 4424 5/12/66 543354

Gsub (regular expression, substitution string );
Gsub (regular expression, substitution string, target string );
Different from sub, if the regular expression in the first parameter appears multiple times in the record, gsub is replaced multiple times, And sub is replaced for the first time.

Index (string, substring)
This function returns the position where the second parameter appears in the first parameter. The offset starts from 1.
/> Awk 'in in {print index ("hello", "el ")}'
2

Length (string)
Returns the length of the string.
/> Awk 'in in {print length ("hello ")}'
5

Substr (string, starting position)
Substr (string, starting position, length of string)
This function returns the substring of the first parameter. Its truncation start position is the second parameter (offset is 1), and the truncation length is the third parameter. If this parameter is not provided, from the position specified by the second parameter until the end of the string.
/> Awk 'in in {name = substr ("Hello World", 2, 3); print name }'
Ell

Match (string, regular expression)
This function returns the index of the regular expression position in the string. If the specified regular expression is not found, 0 is returned. the match function sets the built-in variable RSTART as the starting position of the string neutron string, and RLENGTH as the number of characters to the end of the string.
/> Awk 'in in {start = match ("Good ole CHINA",/[A-Z] + $/); print start }'
10
In the above example, the regular expression [A-Z] + $ represents searching for consecutive uppercase letters at the end of the string. Locate the string "CHINA" at the 10th position of the string "Good ole CHINA ".

/> Awk 'in in {start = match ("Good ole CHINA",/[A-Z] + $/); print RSTART, RLENGTH }'
10 5
RSTART indicates the starting index for matching, and RLENGTH indicates the matching length.

/> Awk 'in in {string = "Good ole CHINA"; start = match (string,/[A-Z] + $/); print substr (string, RSTART, RLENGTH )}'
CHINA
Match, RSTART, RLENGTH, and substr are cleverly combined here.

Toupper (string)
Tolower (string)
The preceding two functions return both uppercase and lowercase parameters.
/> Awk 'in in {print toupper ("hello"); print tolower ("WORLD ")}'
HELLO
World

Split (string, array, field seperator)
Split (string, array)
This function uses the domain Separator Used as the third parameter to separate strings into an array. If the third parameter is not provided, the default FS value is used.
/> Awk 'in in {split ("11/20/2011", date, "/"); print date [2]}'
20

Variable = sprintf ("string with format specifiers", expr1, expr2 ,...)
The difference between this function and printf is equivalent to the difference between printf and sprintf in C language. The former outputs the formatted result to the output stream, while the latter outputs the result to the return value of the function.
/> Awk 'in in {line = sprintf ("%-15 s % 6.2f", "hello", 4.2); print line }'
Hello 4.20

Time functions:
Systime ()
This function returns the number of seconds between the current time and January 1, January 1, 1970.
/> Awk 'in in {print your IME ()}'
1321369554

Strftime ()
The formatting rule of the time Formatting Function is equivalent to the rule provided by the strftime function in C language. See the following list:

Data format meaning % aAbbreviated weekday name % AFull weekday name % bAbbreviated month name % BFull month name % cDate and time representation appropriate for locale % dDay of month as decimal number (01-31) % HHour in 24-hour format (00-23) % IHour in 12-hour format (01-12) % jDay of year as decimal number (001-366) % mMonth as decimal number (01-12) % MMinute as decimal number (00-59) % pCurrent locale's. m. /P. m. indicator for 12-hour clock % SSecond as decimal number (00-59) % UWeek of year as decimal number, with Sunday as first day of week (00-53) % wWeekday as decimal number (0-6; Sunday is 0) % WWeek of year as decimal number, with Monday as first day of week (00-53) % xDate representation for current locale % XTime representation for current locale % yYear without century, as decimal number (00-99) % YYear with century, as decimal number

/> Awk 'begin {print strftime ("% D", systime ())}'
11/15/11
/> Awk 'in in {now = strftime ("% T"); print now }'
23:17:29

Built-in mathematical functions:

Return Value: atan2 (x, y) y. the cotangent cos (x) Cosine Function exp (x) in the range of x is used to calculate the power int (x) and take the whole log (x) natural logarithm sin (x) sine function sqrt (x) Square Root

/> Awk 'in in {print 31/3 }'
10.3333
/> Awk 'in in {print int (31/3 )}'
10

Custom functions:
User-defined functions can be placed in any part of the awk script where templates and actions can be placed.
Function name (parameter1, parameter2 ,...){
Statements
Return expression
}
Pass the value to the local variable in the function. Only copies of variables are used. The array is passed through the address or pointer, so the value of the array element can be changed directly within the function. Any variables used inside the function that are not passed as parameters are regarded as global variables, which are visible to the entire program. If the variable changes in the function, it is changed throughout the program. The only way to provide local variation to a function is to put them in the parameter list, which is usually placed at the end of the list. If the function call does not provide formal parameters, the parameter initialization is null. The return statement usually returns a program control and a value to the caller.
/> Cat grades
20 10
30 20
40 30

/> Cat add. SC
Function add (first, second ){
Return first + second
}
{Print add ($1, $2 )}

/> Awk-f add. SC grades
30
50
70

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.