Awk instance Part 3 (end of Series)

Source: Internet
Author: User

Common threads: Awk by example, Part 3

String functions and...Check?

Summary: In the last part of the awk series, Daniel describes important string functions of awk and shows you how to write a complete check Settlement Program from the beginning. Along the way, you will learn how to write your own functions and how to use the multi-dimensional array of awk. At the end of this article, you will have more awk experience and can write more powerful scripts.

Format output

Although the print statement of the awk can meet the requirements most of the time, it is sometimes not enough. Awk also provides two useful old friends, printf () and sprintf (). Yes, these functions are the same as those in C. Printf () prints a formatted string on the standard output device, while sprintf () returns a formatted string, which can be assigned to a variable. If you are not familiar with printf () and sprintf (), find an introductory C language book to read these two basic functions. You can also enter "man 3 printf" in Linux to view the man description of printf.

Here are some examples of awk sprintf () and printf () code. As you can see, all usage looks the same as in C:

X = 1

B = "foo"

Printf ("% s got a % d on the last test \ n", "Jim", 83)

Myout = ("% s-% d", B, x)

Print myout

This code will print:

Jim got a 83 on the last test

Foo-1

String Functions

Awk has many string functions, which is a good thing. In awk, you cannot treat strings as a character array like other languages such as C, C ++, and python. Therefore, you really need a string function. For example, execute the following code:

Mystring = "How are you doing today? "

Print mystring [3]

You will get an error like the following:

Awk: string. gawk: 59: fatal: attempt to use scalar as array

Oh, okay! Although it is not as convenient as the sequence type of Python, awk string functions can still do this work. Let's take a look at these functions.

First, we have a basic length () function, which returns the length of the string. Here is an example of how to use it:

Print length (mystring)

This code will output the following values:

24

OK. Let's continue. The next string function is called index. It returns the position of a substring in another string. If the substring is not found, 0 is returned. Using mystring as an example, we can call:

Print index (mystring, "you ")

Awk printing:

9

Let's continue to look at two easier functions: tolower () and toupper (). As you may have guessed, these functions will return strings in either upper or lower case. Note that tolower () and toupper () return a new string instead of modifying the original string. This code:

Print tolower (mystring)

Print toupper (mystring)

Print mystring

... The following output is generated:

How are you doing today?

How are you doing today?

How are you doing today?

So far, everything has been going well, but what if I want to extract a substring or even a single character from a string? This is what the substr () function does. The following shows how to call substr ():

Mysub = substr (mystring, startpos, maxlen)

Mystring is a string variable or an original string from which you want to extract the substring. Startpos should be set to the start character. Maxlen should contain the maximum length of the substring you want to extract. Note the maximum length I mentioned. If the value of length (mystring) is smaller than startpos + maxlen, your results will be truncated. Substr () returns a new string instead of modifying the original string. Here is an example:

Print substr (mystring, 9, 3)

Awk will output:

You

If you often use programming languages that use array indexes to access strings, remember that the awk replaces this method with substr. You will use it to extract a single character and substring, because awk is a string-based language and you will often use it.

Now let's continue to look at some of the better functions. The first one is called match (). Match () is a bit like index (). In addition to searching for substrings like index (), it can also search for regular expressions. Match () returns the start point of the Match or 0 (if no Match is found ). In addition, math () also sets two variables, RSTART and RLENGTH. RSTART contains the returned value (the position of the first matching item). RLENGTH indicates the length of the matching character (-1 if no matching item is found ). Using RSTART, RLENGTH, substr (), and a small loop, you can easily iterate each matching item in your string. Here is an example of a match () call:

 

Print match (mystring,/you/), RSTART, RLENGTH

Awk will print:

9 9 3

String replacement

Now let's look at the two string replacement functions sub () and gsub (). The two guys are slightly different from the functions we have seen: they actually modify the original string. Here is a template that shows how to call sub ():

Sub (regexp, replstring, mystring)

When you call sub (), it finds the first character sequence that matches regexp in mystring and replaces it with replstring. Sub () and gsub () have the same parameters. The only difference is that sub () only replaces the first match (if there is a match), while gsub () A global replacement is executed to replace all matching items in the string. Here is an example of calling sub () and bsub:

Sub (/o/, "O", mystring)

Print mystring

Mystring = "How are you doing today? "

Gsub (/o/, "O", mystring)

Print mystring

Because the first sub () directly modifies mystring, We must reset mystring to its original value. When the above code is executed, awk will output the following content:

HOw are you doing today?

HOw are yOu dOing tOday?

Of course, more complex regular expressions are also acceptable. I will leave you alone to test some complex regular expressions.

Finally, we will introduce you to a function called split () to end the content of the string function. Split () is used to Split a string and put multiple parts into an array that uses an integer index. Here is an example of a split () call:

Numelements = split ("Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec", mymonths ,",")

When split () is called, the first parameter is the string or string variable to be split. The second parameter is split (), which is used to fill in the name of the array from the split part to it. The third parameter specifies the delimiter used to separate strings. Split () returns the number of Split strings. Split () puts each Split string into an array with an index starting from 1, so the following code:

Print mymonths [1], mymonths [numelements]

... Will print out:

Jan Dec

Special string format

A shortcut-When length (), sub (), or gsub () is called, you can leave the last parameter Unspecified. awk will use $0 (the current whole row) instead. The following awk script can be used to print the length of each line in a file:

{

Print length ()

}

Financial pleasure

A few weeks ago, I decided to use awk to write my own check Settlement Program. I use a tab-separated text file to record recent deposit and withdrawal records. The idea was to use an awk script to process the data. The script will automatically aggregate the total amount and tell me the balance. Here is an example of how I recorded all my deposit and withdrawal transactions to my "ASCII check:

23 Aug 2000 food-Y Jimmy's Buffet 30.25

This file uses one or more tabs to separate each field. After the date field ($1), two fields are called "Consumption category" and "income category" respectively ". When I enter a consumer record as in the preceding line, I use a 4-character abbreviation in the consumption Category field and a "-" in the income category field. The above record indicates that this special record is a "food consumption ". Here is a deposit record:

23 Aug 2000-inco-Y Boss Man 2001.00

In this case, I use a "-" in the consumption Category field and fill in "inco" in the income category field ". "Inco" is short for my regular (salary type) income. The use category abbreviation allows me to generate income and expenditure categories. For the rest of the record, all other fields are easily self-explanatory. The Cleared field ("Y" or "N") records whether the transaction has been committed to my bank account. In addition, there is a transaction description and a positive number indicating the total amount of USD that the transaction occurred.

The algorithm used to calculate the current balance is not very difficult. Awk only needs to read one row after another. If a row has a consumption category but no income category (marked as "-"), this row is a debit item. Otherwise, if a row has an income category but no consumption category, this row is a loan. If a row has both a consumption category and an income category, this row records a "category transfer", that is, the USD value of this record must be subtracted from the consumption category, and added to the income category. In addition, all these categories are virtual, but they are useful for tracking revenue and expenditure and budgeting.

Code

It's time to look at the code. We will start from the first line, and then follow a BEGIN block and a function definition:


Balance, part 1

#! /Usr/bin/env awk-f

BEGIN {

FS = "\ t +"

Months = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"

}

 

Function monthdigit (mymonth ){

Return (index (months, mymonth) + 3)/4

}

Add the first line "#!..." to any awk script, This script can be executed directly from the shell. Of course, you must first use "chmod + x myscript" to add execution permissions to the script. The remaining rows define our BEGIN block. The content of the BEGIN block will be executed before awk starts to process our check file. We set the FS variable to "\ t +", which tells awk that we use one or more tabs to separate fields. In addition, we define a string called months, which will be used by the monthdigit () function. We will introduce this function immediately.

The last three lines show how to define your own awk functions. The definition format is simple-Enter "function", and then follow the function name and the list of parameters separated by commas in parentheses. Then, place the code block to be executed by the function in a braces. All functions can access global variables (such as our months variable ). In addition, awk provides a return statement that allows the function to return a value. This return operation is similar to the Return Statement operation in C, python, or other languages. This special function converts a three-character monthly name to an equivalent numeric value. For example, the following code:

Print monthdigit ("Mar ")

... Will print:

3

Now, let's continue to write more functions.

Financial Functions

Here are three functions for accounting. We will soon see that the main function will process each row of the check file in order, when processing a row of records, one of the three functions will be called to record the appropriate transactions into an awk array. There are three basic types of transactions: doincome, doexpense, and transfer ). You may notice that all three functions accept a parameter called mybalance. Mybalance is a placeholder for a two-dimensional array. We will pass a two-dimensional array as the input parameter. So far, we have not processed 2D arrays. However, as you can see below, the syntax for processing 2D arrays is quite simple. You can use only one comma to separate dimensions.

We will record the information to mybalance as follows. The range of the first dimension of the array is 0 to 12, which indicates the month, and 0 indicates the whole year. The second dimension is a four-character type, such as "food" or "inco". This is the type we actually process. Therefore, to find the food category balance for the entire year, you can view mybalance [0, "food"]. To find the revenue for March, you can view mybalance [6, "inco"].


Balance, part 2

Function doincome (mybalance ){

Mybalance [curmonth, $3] + = amount

Mybalance [0, $3] + = amount

}

 

Function doexpense (mybalance ){

Mybalance [curmonth, $2]-= amount

Mybalance [0, $2]-= amount

}

 

Function dotransfer (mybalance ){

Mybalance [0, $2]-= amount

Mybalance [curmonth, $2]-= amount

Mybalance [0, $3] + = amount

Mybalance [curmonth, $3] + = amount

}

When calling doincome () or any other function, we record the current transaction in two places-mybalance [0, category] And mybalance [curmonth, category], this is the annual category balance and the current month's category balance. This allows us to generate annual or monthly income and expenditure classification reports later.

If you take a closer look at these functions, you will find that the array referenced by mybalance is the input parameter. In addition, we also use several global variables: curmonth, which stores the number of months of the current record, $2 (consumption class), $3 (income class ), and the amount ($7, USD ). When doincome () and his friends are called, all these variables are correctly set to the corresponding value of the currently processed record.

Main code block

The main code block contains the code for parsing each line of input data. Remember, because we have correctly set the FS variable, we can use $1 to reference the first field, use $2 to reference the second field, and so on. When doincome () and its friends are called, the function can access the current values of curmonth, $2, $3, and amount. Let's take a look at this code and explain it later.


Balance, part 3

 

{

Curmonth = monthdigit (substr ($1, 4, 3 ))

Amount = $7

# Record all the categories encountered

If ($2! = "-")

Globcat [$2] = "yes"

If ($3! = "-")

Globcat [$3] = "yes"

 

# Tally up the transaction properly

If ($2 = "-"){

If ($3 = "-"){

Print "Error: inc and exp fields are both blank! "

Exit 1

} Else {

# This is income

Doincome (balance)

If ($5 = "Y ")

Doincome (balance2)

}

} Else if ($3 = "-"){

# This is an expense

Doexpense (balance)

If ($5 = "Y ")

Doexpense (balance2)

} Else {

# This is a transfer

Dotransfer (balance)

If ($5 = "Y ")

Dotransfer (balance2)

}

}

In the main code block, set curmonth to an integer between 1 and 12 in the first two lines, and set amount to the value of field 7 (for easier understanding of the Code ). Then we have four lines of interesting code. We write data into an array named globcat. Globcat, or a global classification array, is used to record all types encountered in the check file-"inco", "misc", "food", "util", and so on. For example, if $2 = "inco", we set globcat ["inco"] to "yes ". In the future, we can use a simple "for (x in globcat)" loop to iteratively access our category list.

For more than 20 rows, we analyze $2 and $3 and properly record transactions. If $2 = "-" and $3! = "-" Indicates that this is an income record, so we call doincome (). Instead, we call doexpense (). If both $2 and $3 contain classes, we call dotransfer (). Each time we pass the "balance" array to these functions, so that appropriate data is recorded in this array.

You have also noticed several lines that say "if ($5 =" Y "), record the same transaction in balance2 ". What are we doing here? You should remember that the value of $5 is either "Y" or "N", indicating whether the current transaction has been committed to the bank account. Because transactions committed only to the bank account are recorded in balance2, balbnce2 will contain the actual account balance, and "balance" will contain all transactions, whether or not the transaction has been committed. You can use balance2 to check your data entries (because it should be consistent with your bank account balance ), use "balance" to ensure that you do not overdraw your account (because it will take into account any check you have written but has not yet received ).

Generate Report

After processing each input record repeatedly in the main code block, we now have a relatively complete monthly/category-based lending record. Now, we need to write an END block to generate a report. In this case, the most appropriate code is as follows:

END {

Bal = 0

Bal2 = 0

For (x in globcat ){

Bal = bal + balance [0, x]

Bal2 = bal2 + balance2 [0, x]

}

Printf ("Your available funds: % 10.2f \ n", bal)

Printf ("Your account balance: % 10.2f \ n", bal2)

}

This code prints a summary report that looks like the following:

Your available funds: 1174.22

Your account balance: 2399.33

In our END block, we use the "for (x in globcat)" structure to iterate each category and calculate the balance based on all transaction records. We actually calculated two balances, one being available funds and the other being account balances. You need to execute this program to handle the financial matters you recorded in a file named "mycheckbook.txt", as long as you put all the above code into a text file named "balance, use "chmod + x balance" to modify file permissions, and then enter ". /balance mycheckbook.txt "starts execution. This settlement script will calculate all your transactions and print the two-line balance summary.

Enhanced

I used a more advanced version of this program to manage my personal and business financial matters. My version (I cannot put it here because of space limitations) prints out monthly income and expenses, including the sum of the Year, net income, and other things. Even better, I use HTML format to output data so that I can view J in a Web browser. If you think this program is useful, I encourage you to add these features to this script. You do not need to configure it to record any additional information. All the information you need is in balance and balance2. Only the END block can be updated to meet the requirements!

I hope you like this series. For more information about awk, see the resources listed below.

 

Resources

  • Read Daniel's earlier installments in the awk series: Awk by example, Part 1 and Part 2 onDeveloperWorks.
  • If you 'd like a good old-fashioned book, O 'Reilly's sed & awk, 2nd Edition is a wonderful choice.
  • Be sure to check out the comp. lang. awk FAQ. It also contains lots of additional awk links.
  • Patrick Hartigan's awk tutorial is packed with handy awk scripts.

End of Series

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.