Link: Shell sed awk

Source: Internet
Author: User
Tags natural logarithm square root

If you want to quickly and easily understand shell programming, here is also a simple tutorial link:

There is also a little deeper Cu shell version of the town board treasure classic thirteen ask

If you want to quickly learn about awk and have no time to read the manual of the long e-paper, please refer to the following. If you want to study awk well, there is also a good book:

Sed is still looking

Awk call Method

As mentioned earlier, awk provides different solutions to meet various needs. They are:

I. awk command line: You can use awk like a Common Unix Command. You can also use the awk programming language in the command line. Although awk supports multi-line input, however, inputting a long command line and ensuring that it is correct is a headache. Therefore, this method is generally only used to solve simple problems. Of course, you can also reference awk command lines or even awk program scripts in shell script programs.

2. Use the-F option to call the awk program. Awk allows an awk program to be written into a text file, and then the program is called and executed using the-F option in the awk command line. The specific method is described in the awk syntax.

3. Use the command interpreter to call the awk program: using the command interpreter function supported by UNIX, we can write an awk program into a text file, and then add the following to its first line:
#! /Bin/awk-F
And grant this text file the execution permission. After doing so, you can call and execute this awk program in the command line in a way similar to the following.

$ Awk script text name: file to be processed

Awk Syntax:

Like other UNIX commands, awk has its own Syntax:

Awk [-f re] [parameter...] ['prog'] [-F progfile] [in_file...]

Parameter description:

-F re: Allows awk to change its field separator.

Parameter: this parameter helps assign values to different variables.

'Prog': Specifies the Program Statement segment of the awk. The statement segment must be enclosed by the single extension number 'and' to prevent shell interpretation. The standard format of this Program Statement segment is:

'Pattern{ action }'

The pattern parameter can be any of the regular expressions of egrep. It can be made up of the syntax/RE/and some style matching techniques. Similar to SED, you can also use "," to separate the two formulas to select a certain range. For details about the matching, refer to the appendix. If you still don't understand it, find a Unix book to learn grep and sed (I learned the matching technology when learning ed ). The action parameter is always enclosed by braces. It consists of a system awk Statement, which is separated. Awk interprets them and performs their operations on the records that match the pattern given. Similar to shell, you can also use "#" as the annotator to make the content from "#" to the end of the line into comments, which will be ignored during interpretation. You can omit either pattern or action, but not both. If pattern is omitted, no style match exists, indicating that all rows (Records) are operated, if action is omitted, the default operation is executed. The default operation is displayed on the standard output.

-F progfile: Allows awk to call and execute progfile to specify a program file. Progfile is a text file that must comply with awk syntax.

In_file: the input file of the awk. awk allows processing of multiple input files. It is worth noting that awk does not modify the input file. If no input file is specified, awk accepts the standard input and displays the result on the standard output. Awk supports input/output redirection.

Awk records, fields, and built-in variables:

As mentioned above, the awk processing is similar to the database processing method. One of the similarities is that awk supports processing records and fields, the processing of fields is not implemented by grep and SED, which is one of the reasons why awk is better than both. In awk, by default, a row in a text file is always regarded as a record, and a part of a row is used as a field in the record. To operate on these different fields, awk uses the shell method to represent different fields in the row (record) in sequence in the form of $1, $2, $3. In particular, awk uses $0 to represent the entire row (record ). Different fields are separated by characters called delimiters. The default Delimiter is space. Awk allows the use of-F in command line
To change the separator. In fact, awk uses a built-in variable FS to remember this separator. Awk has several such built-in variables, for example, record the delimiter variable RS, the number of records currently working NR, etc. The appendix below this article lists all the built-in variables. These built-in variables can be referenced or modified in the awk program. For example, you can use the NR variable to specify the work scope in the pattern matching, you can also modify the record delimiter rs to set a special character instead of a line break as the record separator.

For example, the first field, third field, and seventh field, separated by characters %, between the seventh row and the seventh row of the myfile text file are displayed:

Awk-F % 'nr = 7, Nr = 15 {printf $1 $3 $7 }'

Awk built-in functions

One of the reasons why awk has become a good programming language is that it has absorbed many advantages of some excellent programming languages (such as C. One of these advantages is the use of built-in functions. awk defines and supports a series of built-in functions. Thanks to the use of these functions, awk provides more comprehensive and powerful functions, for example, awk uses a series of built-in functions for string processing (these functions seem to be similar to the string processing functions in C language, and their usage is similar to those in C Language). It is precisely because of the use of these built-in functions that, makes awk more powerful in processing strings. The appendix below contains the built-in functions provided by the general awk. These built-in functions may differ from your awk version. Therefore, before using them, it is best to refer to the online help in your system.

As an example of a built-in function, we will introduce the printf function of awk here, which makes the output of awk consistent with that of C language. In fact, many references in awk are borrowed from the C language. If you are familiar with the C language, you may remember the printf function. The powerful format output function provided by printf has brought us a lot of convenience. Fortunately, we have reunited with awk again. Printf in awk is almost the same as that in C language. If you are familiar with C language, you can use printf in awk in C language mode. So here, we only provide one example. If you are not familiar with it, please refer to a C language entry book.

For example, the row number and field 3rd in the myfile file are displayed:

$ Awk '{printf "% 03d % s \ n", NR, $1}' myfile

Use awk in command line

In order, we should explain the content of the awk program design. However, we will use some examples to review the previous knowledge, these examples are all used in the command line, so we can know how convenient it is to use awk in the command line. The reason for doing so is to pave the way for the following content, and introduce some methods to solve simple problems, there is no need for us to solve simple problems in a complicated way-since awk provides a simpler method.

For example, display all rows of the text file mydoc match (containing) string "Sun.

$ Awk '/Sun/{print}' mydoc

Because displaying the entire record (full row) is the default action of awk, the action item can be omitted.

$ Awk '/Sun/'mydoc

For example, the following is a complex matching example:

$ Awk '/[ss] UN/,/[mm] oon/{print}' myfile

It will display the rows between the first row that matches sun or sun and the first row that matches moon or moon, and display them to the standard output.

For example, the following example shows the use of built-in variables and built-in function length:

$ Awk 'length ($0)> 80 {print Nr} 'myfile

This command line will display all the lines with more than 80 characters in the text myfile. Here, $0 is used to represent the entire record (Line). At the same time, the built-in variable Nr does not use the flag '$ '.

Example: as a more practical example, we assume that we want to perform security checks on users in UNIX by examining the passwd file in/etc and checking the passwd field (second field) if it is "*" or not, if it is not "*", it indicates that the user has not set a password and the user names (the first field) are displayed ). We can use the following statement:

# Awk-F: '$2 = "" {printf ("% s no password! \ N ", $1 '/etc/passwd

In this example, the field separator of the passwd file is ":". Therefore, you must use-F: to change the default field separator. This example also involves the use of the built-in function printf.

Awk Variables

Like other programming languages, awk allows variables to be set in programming languages. In fact, the function of providing variables is the requirement of programming languages, I have never seen any programming language that does not provide variables.

Awk provides two types of variables. One is the built-in variables of awk. As we have mentioned earlier, we must note that, unlike other variables mentioned later, you do not need to use the flag "$" to reference built-in variables in the awk Program (recall the use of NR mentioned above ). Another variable provided by awk is a custom variable. Awk allows users to define and call their own variables in awk program statements. Of course, this type of variable cannot be the same as the built-in variable and other reserved words of awk. to reference a custom variable in awk, you must add a flag "$" before it ". Unlike C, awk does not need to initialize variables. awk determines its specific data type based on its first appearance in the form and context in awk. When the variable type is unknown, awk is a string type by default. Here is a tip: If you want your awk program to know the explicit type of the variable you are using, you should assign the initial value to it in the program. This technique will be used in subsequent instances.

Calculation and judgment:

As one of the features of a programming language, awk supports a variety of operations, which are the same as those provided by C: +,-, *,/, %, etc. At the same time, awk also supports functions similar to ++, --, + =,-=, = +, and =-in C, this makes it very convenient for users familiar with the C language to write awk programs. As an extension of the computing function, awk also provides a series of built-in computing functions (such as log, sqr, cos, sin, and so on) and some for string operations (operations) (such as length and substr ). The reference of these functions greatly improves the awk operation function.

As part of the conditional transfer instruction, relational judgment is a function of each programming language, and awk is no exception. Multiple tests are allowed in awk, such as commonly used = (equal ),! = (Not equal to),> (greater than), <(less than),> = (greater than or equal to),> = (less than or equal to), and so on, as style matching, also provides ~ (Matched) and !~ (Mismatched) judgment.

As an extension of the test, awk also supports the use of logical operators :! (Not), & (and), | (OR), and parentheses () for multiple judgments, which greatly enhances the awk function. The appendix of this article lists the operations, judgments, and operator priorities allowed by awk.

Awk Process Control

Flow Control statements are indispensable to any programming language. Any good language has some statements that execute flow control. Awk provides a complete flow control statement similar to the C language, which brings great convenience to programming.

1. begin and end:

In awk, there are two special expressions, begin and end, both of which can be used in pattern (refer to the previous awk syntax ), the function of providing begin and end is to give the program an initial state and execute some scanning work after the program ends. Any operations listed after begin (within {}) will be executed before awk starts scanning input, and operations listed after end will be executed after scanning full input. Therefore, begin is usually used to display variables and preset (initialization) variables, and end is used to output the final result.

For example, the sales amount in the cumulative sales file Xs (assuming that the sales amount is in the third field of the record ):

$ Awk
> 'Begin{ FS = ":"; print "sales amount Statistics"; Total = 0}
>{ Print $3; Total = total + $3 ;}
> End {printf "total sales amount: %. 2f", total} 'sx
(Note:> is the second prompt provided by shell. to wrap a line in the awk statement and awk language of the shell program, add a backslash \ at the end of the line \)

Here, begin provides the internal variable FS (field separator) and the custom variable total, and displays the output line header before scanning. The end command prints the total sum after scanning.

2. Process Control statements
Awk provides a complete process control statement, which is similar to the C language. Here we will explain one by one:

2.1. If... else statement:

If (expression)
Statement 1
Statement 2

In the format, "Statement 1" can be multiple statements. If you want to facilitate awk judgment and read it yourself, you 'd better include multiple statements in. The awk branch structure can be nested in the following format:

If (expression 1)
{If (expression 2)
Statement 1
Statement 2
Statement 3
Else {If (expression 3)
Statement 4
Statement 5
Statement 6

Of course, you may not use such a complicated branch structure in the actual operation process. Here we just want to give its style.

2.2 While statement


While (expression)

2.3 do-while statement


} While (condition-based judgment Statement)

2.4. For statement


For (initial expression; termination condition; Step expression)

You can use the break and continue statements in the while, do-while, and for statements of the awk to control the process and exit using the exit statement. The break interrupts the current loop and jumps out of the loop to execute the next statement. Continue is executed from the current position to the beginning of the loop. There are two conditions for exit execution: When the exit statement is not in the end, the exit command in any operation performs as to the end of the file, and the execution of all modes or operations will stop, the operation in end mode is executed. The exit that appears in the end will cause the program to terminate.

For example,

Custom functions in awk

Defining and calling your own functions is a feature of almost every advanced language. awk is no exception, but the original awk does not provide function functions, functions can be added only in nawk or newer awk versions.

Function usage includes two parts: Function Definition and function call. The function definition includes the code to be executed (the function itself) and the temporary call from the main program code to the function.

The definition of the awk function is as follows:

Function Name (parameter table ){
Function body

In gawk, function is allowed to be omitted to func, but awk of other versions is not allowed. The function name must be a valid identifier and cannot be provided in the parameter table (however, the parentheses after the function name are still indispensable when calling the function ), you can also provide one or more parameters. Similar to the C language, awk parameters are passed through values.

The method for calling a function in awk is similar to that in C, but awk is more flexible than C, and does not perform parameter validity check. In other words, when you call a function, you can list more or less parameters than the expected function (defined in the function definition). Excessive parameters will be ignored by awk, awk sets them to the default value 0 or an empty string. The specific value depends on how the parameter is used.

The awk function has two return methods: Implicit return and explicit return. When awk is executed to the end of the function, it automatically returns to the calling program, which is implicitly returned by the function. If you need to exit the function before the end, you can use the return statement to exit the function in advance. The method is to use a statement in the format of return in a function.

For example, the following example shows how to use a function. In this example, a function named print_header is defined. This function calls two parameters: filename and pagenum. the filename parameter is passed to the currently used file name of the function. The pagenum parameter is the page number of the current page. This function is used to print (Display) the file name of the current file and the page number of the current page. After this function is completed, the page number of the next page is returned.

> 'Begin{ pageno = 1; file = filename
> Pageno = print_header (file, pageno); # Call the print_header function.
> Printf ("Current page number: % d \ n", pageno );

> # Define the print_header Function
> Function print_header (filename, pagenum ){
> Printf ("% S % d \ n", filename, pagenum);> pagenum ++; return pagenum;
>} 'Myfile

Executing this program will display the following content:

Myfile 1
The current page number is: 2

Awk advanced Input and Output

1. Read the next record:

The next statement of the awk causes the awk to read the next record and complete the pattern matching, and then immediately perform the corresponding operation. It usually uses the matching mode to execute the code in the operation. Next causes any extra matching modes of this record to be ignored.

2. Simply read a record

The Getline Statement of awk is used to read a record. If you have a data record similar to two physical records, Getline is particularly useful. It completes the separation of common fields (set the field variable $0 fnr nf nr ). If the operation succeeds, 1 is returned. If the operation fails, 0 is returned (to the end of the file ). To simply read a file, you can write the following code:

Example: Use of Getline

{While (Getline = 1)
# Process the inputted Fields

You can also use Getline to store input data in a field, instead of processing common fields in the form of Getline variable. When this method is used, NF is set to 0, and FNR and NR are added.

You can also use Getline <"FILENAME" to input data from a given file, rather than from the content listed in the command line. In this case, Getline completes the general field separation (set the field variables $0 and NF ). If the object does not exist,-1 is returned, success is returned, 1 is returned, and 0 is returned, indicating failure. You can read data from a given file to a variable, or use stdin (standard input device) or a variable containing the file name to replace filename. It is worth noting that FNR and NR are not modified when this method is used.

Another way to use the Getline statement is to accept input from Unix commands, for example, the following example:

For example, input is accepted from Unix commands.

{While ("who-U" | Getline)
# Process each line from the WHO command

You can also use the following format:

"Command" | Getline variable

3. close the file:

Awk allows you to close an input or output file in a program by using the close Statement of awk.

Close ("FILENAME ")

Filename can be a file opened by Getline (or stdin, a variable containing the file name or the exact command used by Getline ). Or an output file (it can be stdout, a variable containing the file name or an exact command using the pipeline ).

4. output to a file:

Awk allows output of results to a file as follows:

Printf ("Hello word! \ N ")>" datafile"
Printf ("Hello word! \ N ")>" datafile"

5. output to a command

Awk allows the following method to output the result to a command:

Printf ("Hello word! \ N ") |" sort-t ','"

Hybrid programming of awk and shell script

Because awk can be used as a shell command, awk can be well integrated with the shell Batch Processing Program, which provides the possibility of implementing hybrid programming between awk and Shell programs. The key to implementing mixed programming is the dialogue between awk and shell script, that is, information exchange between awk and shell script: awk obtains the required information from shell script (usually the value of the variable) run the shell command line in the awk, the shell script sends the command execution result to the awk for processing, and the shell script reads the awk execution result.

1. awk reads shell script program variables

In awk, we can use the '$ variable name' method to read the variables in the scrpit program.

For example, in the following example, we will read the name of the variable in the mongoscrpit program, which is the author of the text myfile, and awk will print the name of the variable.

$ Cat writename
Name = "James" nawk 'in in {name = "'name'"; \ printf ("\ t % s \ t writer % s \ n", filename, name ");}\
{...} End {...} 'myfile

2. Send the shell command execution result to awk for processing.

As a method of information transmission, we can pass the result of a shell command to awk through the pipeline line (|) for processing:

For example, awk processes the execution results of shell commands.

$ Who-u | awk '{printf ("% s is executing % s \ n", $2, $1 )}'

This command prints the name of the program being executed by the Registration terminal.

3. The execution result of the shell script program reading awk

To implement the shell script program to read awk execution results, we can take some special methods, for example, we can store the awk execution result in a shell script variable in the form of a variable name = 'awk statement. Of course, you can also pass the awk execution results to the shell script program for processing using the pipeline method.

For example, as one of the message transmission mechanisms, Unix provides a command wall (write to all users) to send messages to all users ), this command allows you to send messages to all working users (terminals. To this end, we can simulate this program through a shell batch processing program wall. Shell (in fact, in older versions, wall is a shell batch processing program:

$ Cat wall. Shell
# @ (#) Wall. Shell: send messages to each registered Terminal
# User input message text who-u | awk '{print $2}' | while read tty
CAT/tmp/$> $ tty

In this program, awk accepts the execution result of the WHO-u command. The command prints information of all registered terminals. The second field is the device name of the registered terminal, therefore, the device name is analyzed using the awk command, and the file name is read to the variable (shell script variable) tty cyclically using the while read tty statement as the end address of information transmission.

4. execute shell command line in awk ---- embedded function system ()

System () is an embedded function that is not suitable for character or number types. The function is used to process strings passed to it as parameters. System processes this parameter as a command, that is, it is executed as a command line. This allows you to flexibly execute commands or scripts as needed by your awk program.

For example, the following program uses systemembedded to print the prepared report file, which is stored in the file named myreport.txt. For simplicity, we only list the end part:

End {close ("myreport.txt"); System ("LP myreport.txt ");}

In this example, we first closed the myreport.txt file with a closesentence, and then used systemembedded scripts to send myreport.txt to the printer for printing.

Here, I have to say goodbye to my friends. To be honest, this content is still the preliminary knowledge of awk, and computers will always be the science of forward, and awk is no exception, all you can do in this article is to pave the way for a small start in your long journey, and the rest of the journey will have to be done by yourself. Honestly, if this article can bring you some convenience on your way forward, I will be satisfied!

If you have any questions about this article, please email to: or to the homepage


1. Regular Expression metacharacters of awk

\ Code change sequence
^ Start matching at the beginning of the string
$ Start matching at the end of the string
. Match with any single string
[ABC] matches any character in []
[A-ca-C] matches characters in the A-C and a-c range (in alphabetical order)
[^ ABC] matches any character other than all characters in [].
Desk | chair matches any one of desk and chair
[ABC] [DEF] Association. Matches any character in A, B, and C, and must be followed by any character in D, E, and F.
* Matches any character in A, B, or C that appears 0 or multiple times
+ Match any one or more characters in A, B, or C
? Matches an empty string or A, B, or C in any character
(Blue | black) the regular expression of berry merging, which matches with blueberry or Blackberry.

2. awk Arithmetic Operators

Operator usage
Y Power of x ^ y x
X ** y same as above
X % Y calculate the remainder of x/y (Modulo)
X + Y x + y
X-y X minus y
X * Y x multiplied by Y
X/y x except y
-Y negative Y (the switch symbol of Y); also known as one minus
+ + Y + 1 and then use y (prefix)
Add 1 (suffix) after using the Y value in Y ++)
-- Y minus 1 and use y (minus)
Y -- after use, Y minus 1 (suffix minus)
X = Y: Assign the value of Y to X.
X + = Y: Assign the value of X + Y to X.
X-= Y: Assign the value of X-y to X.
X * = Y: Assign the value of X * y to X.
X/= Y: Assign the value of x/y to x % = y and assign the value of X % Y to X.
X ^ = Y: Assign the value of x ^ y to X
X ** = Y: Assign the value of X ** y to X.

3. Test allowed by awk:

Operator meaning

X = Y x equals Y
X! = Y x is not equal to Y
X> Y x greater than Y
X> = Y x greater than or equal to Y
X <Y x less than Y
X <= Y x is less than or equal to Y?
X ~ Re x matches the regular expression re?
X !~ Re X does not match the regular expression re?

4. awk operators (sorted by priority in ascending order)

=, + =,-=, * =,/=, % =
>>=<==! = ~ !~
XY (string link, 'x' y' to "XY ")
++ --

5. awk built-in variables (pre-defined variables)

Note: item V in the table indicates the first tool that supports variables (the same below): A = awk, n = nawk, P = POSIX awk, G = gawk

Default Value of variable v
N argc command line parameter count
G argind the argv flag of the currently processed file
N argv command line parameter Array
G convfmt digital conversion format %. 6g
P environ Unix environment variable
N errno UNIX system error message
G fieldwidths blank separator string of the input field width
A filename name of the current input file
Current records of P FNR
A fs input field delimiter Space
G ignorecase control case sensitivity 0 (Case sensitivity)
A nf: number of fields in the current record
Number of records read by a NR
Output format of a ofmt Number %. 6g
A ofs output field delimiter Space
A new record separator line output by ORS
New Line of A Rs input record
N rstart is the first string to be matched by the function.
N rlength the length of the string matched by the matching function
N subsep subscript separator "\ 034"

6. built-in functions of awk

V function purpose or return value
N gsub (Reg, String, target) replaces the string
N index (search, string) returns the position of the search string in string
A length (string) calculates the number of characters in the string.
N match (string, Reg) returns the position in the string matching the regular expression Reg
N printf (format, variable) formats the output, and outputs the variable in the format provided by format.
N split (string, store, delim) Splits string into store array elements based on delim.
N sprintf (format, variable) returns a format-based formatted data. variables is the data to be placed in the string.
G strftime (format, timestamp) returns a format-based date or time string. timestmp is the time returned by the systime () function.
The first time that n sub (Reg, String, target) matches a regular expression Reg, it replaces the string in the target string.
A substr (string, position, Len) returns a substring starting with position Len
P totower (string) returns the lowercase characters in string.
P toupper (string) returns the uppercase characters in string.
Cotangent (radians) of a atan (x, y) x)
Cosine (radian) of N cos (x) x)
X power of a exp (x) E
Integer part of a int (x) x
Natural logarithm of a log (x) x
Random Number between N rand () 0-1
Sine (radian) of n sin (x) x)
Square root of a SQRT (x) x
A srand (x) initializes the random number generator. If X is ignored, system () is used ()
G system () returns the time elapsed since January 1, January 1, 1970 (in seconds)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.