Formatted output
While Awk's print statements can accomplish tasks in most cases, there are times when we need more. In those cases, awk provided two of our well-known old friends, printf () and sprintf (). Yes, like many other awk parts, these functions are equivalent to the corresponding C-language functions. printf () Prints the formatted string to stdout, and sprintf () returns a formatted string that can be assigned to the variable. If you're not familiar with printf () and sprintf (), a C-language article will give you a quick overview of these two basic print functions. On a Linux system, you can enter "Man 3 printf" to view the printf () help page.
Here are some sample codes for awk sprintf () and printf (). As you can see, they are almost exactly the same as the C language.
X=1
b= "foo"
printf ("%s got a%d on the last test\n", "Jim", the "myout=",%s-%d
(b,x)
|
This code will print:
Jim got a in the last Test
|
String functions
AWK has many string functions, which is a good thing. In awk, you do need string functions because you cannot treat a string as a character array, as in other languages such as C, C + +, and Python. For example, if you execute the following code:
Mystring= "How is you doing today?"
|
An error will be received, as follows:
Oh, okay. Although not as convenient as the Python sequence type, awk's string functions can do the job. Let's take a look.
First, there is a basic length () function, which returns the length of the string. Here's how it's used:
This code prints the value:
OK, go ahead. The next string function, called index, returns the position where the substring appears in another string and returns 0 if the string is not found. With mystring, you can call it as follows:
awk will print:
Let's continue to discuss two other simple functions, ToLower () and ToUpper (). As you suspect, these two functions return a string and convert all characters to lowercase or uppercase, respectively. Note that ToLower () and ToUpper () return a new string and do not modify the original string. This piece of code:
Print ToLower (mystring)
print ToUpper (mystring)
print mystring
|
...... The following output will be produced:
How is you doing today?
How is you DOING TODAY?
|
So far everything is fine, but how exactly do we choose substrings from strings, even single characters. That is the reason for using substr (). The following is the calling method for substr ():
MyString should be a string variable or literal string from which to extract the substring. The startpos should be set to the starting character position, and the maxlen should contain the maximum length of the string to extract. Please note that I am talking about the maximum length, and if length is shorter than startpos+maxlen, the resulting result will be truncated. mystring SUBSTR () does not modify the original string, but instead returns the substring. The following is an example:
Print substr (mystring,9,3)
|
awk will print:
If you typically use an array subscript to access part of a string (and people who do not use it) in a programming language, remember that substr () is the awk substitution method. It needs to be used to extract individual characters and substrings, because awk is a string-based language and is often used.
Now, let's talk about some of the more intriguing functions, first, match (). Match () is very similar to index () in that it differs from index () in that it does not search for substrings, it searches for regular expressions. The match () function returns the starting position of the match and returns 0 if no match is found. In addition, match () will also set two variables, called Rstart and Rlength. Rstart contains the return value (the first matching position), rlength specifies the character span it occupies (returns 1 if no match is found). Each match in a string can be easily iterated by using Rstart, Rlength, substr (), and a small loop. The following is an example of a match () Call:
awk will print:
String substitution
Now, we'll look at two string substitution functions, sub (), and Gsub (). These functions are slightly different from the functions that have been discussed, because they do modify the original string. Here is a template that shows how to call a sub ():
When a sub () is called, it matches the first character sequence of regexp in mystring and replaces the sequence with replstring. Sub () and gsub () use the same arguments; the only difference is that the sub () replaces the first regexp match (if any), and Gsub () performs a global substitution, swapping out all matches in the string. The following is an example of a sub () and Gsub () invocation:
Sub (/o/, "O", mystring)
print mystring
mystring= "How is you doing today?"
Gsub (/o/, "O", mystring)
|
MyString must be reset to its initial value because the first sub () call modifies the mystring directly. At execution time, this code causes awk to output:
Of course, it can also be a more complex rule expression. I leave the task of testing some complex rule expressions to you.
By introducing the function split (), let's summarize the functions that have been discussed. The task of Split () is to "cut" the string and place each part in an array that uses an integer subscript. The following is an example of a split () Call:
Numelements=split ("Jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", Mymonths, ",")
|
When you call Split (), the first argument contains a literal string or string variable to cut. In the second argument, you should specify the array name that split () will fill in the fragment part. In the third element, specify the delimiter used to cut the string. When split () returns, it returns the number of split string elements. Split () assigns each fragment to an array starting at 1, so the following code:
Print Mymonths[1],mymonths[numelements]
|
...... Will print:
Special String form
Short note-When you call Length (), sub (), or gsub (), you can remove the last argument so that awk applies a function call to $ (the entire current row). To print the length of each line in the file, use the following awk script:
A financial anecdote.
A few weeks ago, I decided to write my checkbook clearing procedure in awk. I decided to use a simple tab-bound text file to facilitate the entry of the most recent deposit and withdrawal records. The idea is to give this data to the awk script, which automatically aggregates all the amounts and tells me the balance. Here's how I decided to record all the transactions in "ASCII Checkbook":
The food--Y Jimmy's buffet 30.25
|
Each field in this file is delimited by one or more tabs. After the date (field 1,$1), there are two fields called "Expense Ledger" and "Income Ledger". In this example, I put a four-letter alias in the expense field and put a "-" (blank item) in the Income field. This means that this particular item is "Food cost". :) The following are examples of deposits:
In this example, I put "-" (blank) in the expense ledger and put "Inco" in the income ledger. "Inco" is an alias for income (such as salary) in general. Using ledger aliases allows me to generate itemized ledger for revenue and expenses by category. As for the rest of the record, all other fields are not required to be described. "Is it paid?" The field ("Y" or "N") records whether the transaction has been posted to my account, in addition to a description of the transaction, and a positive dollar amount.
The algorithm used to calculate the current balance is not too difficult. Awk only needs to read each row sequentially. If the fee ledger is listed, but there is no income ledger (for "-"), then this is a debit. If the income ledger is listed, but there is no expense ledger (for "-"), then this is a credit. Also, if the expense and income ledger are listed at the same time, the amount is "Ledger transfer", that is, the dollar amount is subtracted from the expense ledger and added to the income ledger. In addition, all of these ledgers are virtual, but are useful for tracking revenue and expenses and budgeting.
Code
Now it's time to study the code. We will start with the first line (begin block and function definition):
balance, part 1th
#!/usr/bin/env awk-f
BEGIN {
fs= "\t+"
months= "Jan Feb Mar Apr may June Jul Sep Oct Nov"
}< C7/>function Monthdigit (mymonth) {
return (index (months,mymonth) +3)/4
}
|
First execute the "chmod +x myscript" command, then the first line "#!..." Adding to any awk script will enable it to execute directly from the shell. The remaining rows define the begin block, which is executed before awk begins processing the checkbook file. We set the FS (field delimiter) to "\t+", which tells the awk field to be delimited by one or more tabs. In addition, we define the string months, and the monthdigit () function that appears below will use it.
The last three lines show how to define your own awk. The format is simple-enter "function", enter a name, and enter a comma-delimited argument in parentheses. After that, the "{}" code block contains the code that you want the function to execute. All functions can access global variables (such as months variables). In addition, AWK provides a "return" statement that allows the function to return a value and perform operations similar to "return" in C and other languages. This particular function converts the month name in 3-letter string format to an equivalent value. For example, the following code:
...... Will print:
Now, let's discuss some other functions.
Financial functions
The following are three other functions that perform bookkeeping. The main block of code that we are about to see will call one of these functions, processing each line of the checkbook file sequentially, thus recording the corresponding transaction into the awk array. There are three basic transactions, credit (Doincome), Debit (Doexpense), and Transfer (Dotransfer). You will find that all three of these functions accept an argument called mybalance. Mybalance is a placeholder for a two-dimensional array, and we pass it as an argument. Currently, we have not processed a two-dimensional array, but as you can see below, the syntax is very simple. You just need to separate each dimension with a comma.
We will record the information in "Mybalance" as follows. The first dimension of the array is from 0 to 12, which specifies the month, and 0 represents the year. The second dimension is a four-letter ledger, such as "food" or "Inco"; this is the real ledger we deal with. Therefore, to find the balance of the year-round food ledger, you should check the mybalance[0. To find the revenue for June, you should review mybalance[6, "Inco".
balance, part 2nd
function Doincome (mybalance) {
mybalance[curmonth,$3] + amount
mybalance[0,$3] + = Amount
}
functi On Doexpense (mybalance) {
mybalance[curmonth,$2]-= Amount
mybalance[0,$2]-= Amount
}
function do Transfer (mybalance) {
mybalance[0,$2]-= Amount
mybalance[curmonth,$2]-= Amount
mybalance[0,$3] + = Amount
mybalance[curmonth,$3] + = Amount
|
When calling Doincome () or any other function, we record transactions to two locations-mybalance[0,category] and mybalance[curmonth, category], which represent the ledger balances for the year and the ledger balances for the current month, respectively. This makes it easy for us to generate annual or monthly revenue/expense itemized ledgers later.
If you study these functions, you will find an array of mybalance references passed in my reference. In addition, we refer to several global variables: Curmonth, which holds the value of the month to which the current record belongs, the $ (cost ledger), $ $ (income ledger), and the amount ($7, dollar amount). When calling Doincome () and other functions, all these variables have been set correctly for the current record (row) to be processed.
Main block
Here is the main code block, which contains the code that parses each line of input data. Keep in mind that, because FS is set up correctly, you can refer to the first field in $1, the second field with $ A, and so on. When calling Doincome () and other functions, these functions can access the current values of Curmonth, $, $, and amounts from within the function. Please study the code before you can see my instructions in the code.
balance, part 3rd
{ curmonth=monthdigit (substr ($1,4,3)) amount=$7 #record All the categories encountered if ($! = "-") &NBSP;&NBSP;&N bsp; globcat[$2]= "Yes" if ($! = "-")   ; globcat[$3]= "yes" #tally up the transaction properly I F ($ = = "-") { if ($ = = "-") {&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&N
bsp; print "Error:inc and exp Fields is both blank!" Exit 1 } else { #this is income & nbsp; Doincome (Balance) if ($ = = "Y") & nbsp; doincome (BALANCE2) } } else if ($ = = "-") { &nbs P #this is an expense doexpense (balance)   ; if ($ = = "Y") Doexpense (balance2 ) } else { #this is a transfer &n bsp; Dotransfer (Balance) if ($ = = "Y") &n bsp; Dotransfer (balance2) } & Nbsp; } |
In the main block, the first two rows curmonth set to an integer between 1 and 12, and the amount is set to field 7 (which makes the code easy to understand). Then, there are four lines of interesting code that write the values into the array globcat. Globcat, or a global ledger array, is used to record all the ledger that is encountered in a file-"Inco", "misc", "Food", "util", and so on. For example, if the $ = = "Inco", the globcat["Inco" is set to "yes". Later, we can iterate through the ledger list using a simple "for (x in Globcat)" loop.
In the next approximately 20 lines, we analyze the field $ and $ $, and record the transaction appropriately. If $2== "-" and $3!= "-" means we have revenue, we call Doincome (). If this is the case, call Doexpense (), or Dotransfer () if both the $ and $ are included in the ledger. Each time we pass an array of "balance" to these functions, the appropriate data is recorded in these functions.
You will also find a few lines of code saying "if ($ = =" Y "), then the same transaction is recorded in Balance2". What we have done here. You will recall that the $ $ contains "Y" or "N" and that the transaction has been posted to the account. Since the transaction is only recorded to Balance2 when the transaction is posted, the balance2 contains the actual account balance, and "balance" contains all transactions, whether or not posted. You can use Balance2 to validate a data item (because it should match the current bank account balance), and you can use "balance" to make sure there is no overdraft account (because it takes into account all the checks you have not yet cashed).
Generating reports
After the main block repeats each row of records, we now have a more comprehensive, ledger-by-month debit and credit record. Now, the best thing to do in this case is to define only the END block that generates the report:
balance, Part 4
END {
bal=0
bal2=0 for
(x in Globcat) {
bal=bal+balance[0,x]
bal2=bal2+ba LANCE2[0,X]
}
printf ("Your available funds:%10.2f\n", bal)
printf ("Your account balance:%10.2f\n ", Bal2)
}
|
This report will print a summary as follows:
Your available funds:1174.22
Your account balance:2399.33
|
In the END block, we use the "for (x in Globcat)" structure to iterate through each ledger and settle the major balances based on the transactions that are recorded. In fact, we settle two balances, one is the available funds and the other is the account balance. To execute the program and process the financial data you entered in the file "Mycheckbook.txt", put all of the above code into the text file "balance", execute "chmod +x balance", and enter "./balance mycheckbook.txt". The balance script then sums up all the trades and prints out the balance totals for the two lines.
Upgrade
I use a more advanced version of this program to manage my personal and corporate finances. My version (because space restrictions cannot be covered here) will print a monthly itemized ledger of revenue and expenses, including annual aggregates, net income, and many other content. It even outputs the data in HTML format, so I can view it in a Web browser. :) If you think this program is useful, I recommend that you add these features to the script. It is not necessary to configure it to log any additional information; all required information is already in balance and balance2. Just upgrade the END block to get everything.
I hope you like this series. For more information about awk, refer to the resources listed below.
refer to the previous articles in the awk series that Daniel published on DeveloperWorks: Instances of awk, parts 1th and 2nd. If you want to watch old-fashioned books, O ' Reilly sed & awk, 2ndEdition is a great choice. Please refer to Comp.lang.awkFAQ. It also contains many additional awk links. Patrick Hartigan's awk tutorial also includes a practical awk script. Thompson's Tawkcompiler compiles the awk script into a fast binary executable file. Available versions are Windows, OS/2, DOS, and UNIX editions. The Gnuawk User ' s Guide is available for online reference.
About the author
Daniel Robbins lives in the Albuquerque of New Mexico State. He is the founder of Gentoo Technologies, Inc. 's President and CEO,Gentoo Linux(Advanced Linux for PCs) and Portage systems (Next Generation porting system for Linux). He is also a Macmillan book Caldera OpenLinux Unleashed, SuSE Linux Unleashed, and Samba unleashed collaborators. Daniel has been stuck with the computer since second grade, when he first approached the Logo programming language and indulged in the Pac-man game. That's probably why he still serves as the chief graphic designer for SONY Electronic publishing/psygnosis . Daniel likes to spend time with his wife Mary and her newborn daughter Hadassah. You can contact Daniel by drobbins@gentoo.org.