Linux and other UNIX-like systems always come with a large number of tools that perform a wide range of functions from the obvious to the incredible. The success of the UNIX-like programming environment owes much to the high quality and choice of the tools, and the ease with which these tools are connected.
As a developer, you may find that existing utilities do not always solve the problem. While it is possible to solve many problems easily by using the existing utilities in combination, solving other problems requires at least some practical programming work. These later tasks are often a candidate for creating a new utility, and combining existing utilities to create a new utility can solve the problem by doing minimal work. This article examines the quality of good utilities and the processes that are experienced in designing such utilities.
What are the qualities of a good utility?
Kernighan & Pike The UNIX Programming environment book contains a fascinating discussion of this issue. A good utility is a utility that makes your work as good as possible. It must be compatible with other utilities and must be easily used in conjunction with other utilities. A program that cannot be used in conjunction with other utilities is not a utility, but an application.
The utility should allow you to build a one-time application cheaply and easily based on the material at hand. Many people think that utilities are like tools in the Toolbox. The goal of designing a utility is not to have a single tool do everything, but to have a set of tools in which each tool does one thing as well as possible.
Some utilities themselves are quite useful, and other utilities must be used in conjunction with a series of utilities. Examples of the former include sort and grep. Xargs, on the other hand, is rarely used alone except with other utilities (most commonly found).
What language do you use to write a utility?
Most UNIX system utilities are written in C language. The examples in this article use Perl and sh. You should use the right tools to do the right thing. If you use a utility frequently enough, the cost of writing it in a compiled language may be rewarded with performance gains. On the other hand, the use of scripting languages may provide faster development speed in a fairly common case where the workload of the program is very light.
If you're not sure, you should use the language you know best. At least when you are prototyping a utility or figuring out how it is useful, programmer efficiency takes precedence over performance tuning. Most UNIX system utilities are written in C, because these utilities are used frequently enough to consider efficiency more important than considering development costs. Perl and SH (or ksh) may be a good language for rapid prototyping. For utilities that work with other programs, it may be easier to write them using a shell than with a more traditional programming language. On the other hand, C may be the best choice when you want to interact with the original byte.
Design Utility
A good rule of thumb is that when you have to solve a problem the second time, consider the design of the utility program first. Don't be sorry for the one-off work you've written for the first time; you can consider it as a prototype. For the second time, compare the functionality you need with the functionality you need for the first time. Before and after the third time, you should start thinking about taking the time to write a generic utility. Even pure repetitive tasks can benefit the development of utilities, for example, because people are disappointed to try to rename files in a common way, so many common file rename programs are developed.
Here are some practical programming goals, each of which is described in a separate section below.
do one thing well; don't do a lot of things badly. Perhaps the best example of doing a good thing is sort. No other utility has the sort function except sort. The basic idea is simple: if you solve only one problem at a time, you'll be able to work it out.
Imagine how frustrating it would be if most programs have sort capabilities, but some only support lexical ordering, while others only support sorting by number, and some even support keyword selection rather than whole row sorting. At least, it's annoying.
When you find that a problem needs to be resolved, try to break it down into parts, and do not repeat parts that already exist in other utilities. The more you focus on the tools that allow you to work with existing tools, the more likely your utility will remain useful.
Maybe you need to write multiple programs. The best way to accomplish a special task is to write one or two utilities and then connect them with some clues instead of writing a single program to solve the whole thing. Using a 20-line shell script is ideal for combining new utilities with existing tools. If you try to solve the whole problem at once, the first change that comes with it may require you to reconsider it all at once.
I occasionally need to generate two or three columns of output from the database. It is often more efficient to write a program that generates output in a single column, and then combines a program that is used to disaggregate the output. The shell scripts that combine these two utilities are themselves temporary, and the individual utilities have a longer life span than the script.
Some utilities serve a very single-minded need. For a directory that contains a lot of content, if the output of LS is scrolled out of the screen very quickly, it may be because one of the files has a very long file name, forcing LS to use only a single column for the output. Using more to page out paging will take some time. Why not sort the rows by length as follows, and then pipe out the results by tail?
Listing 1. The smallest practical program that can be found in the world SL
#/usr/bin/perl-w
Print sort {Length $a <=> length $b} <>;
The script in Listing 1 does exactly one thing. It does not accept any option because it does not require an option; it cares only about the length of the line. Thanks to Perl's convenient <> expression, this small utility applies both to standard input and to files specified by the command line.
Become a filter
Almost all utilities are best suited to be thought of as filters, although there are some very useful utilities that do not conform to this model. (for example, a program can be useful when performing a count, although it does not work well as a filter.) Programs that accept only command-line arguments as inputs and potentially produce complex output can be useful. However, most utilities should work as filters. By convention, filters work on the lines of text. Most filters should support multiple input files.
Remember that the utility needs to run in the command line and in the script. Sometimes, the ideal behavior is slightly different. For example, most versions of LS automatically sort input into multiple columns when writing to a terminal. The default behavior of grep is to print the name of the file from which a match was found when multiple files were specified. Such a difference should be related to the way the user wants the utility to work, not to other things. For example, an older version of the GNU BC displays an enforced copyright mark at startup. Please don't do that. Let your utility do only what it should.
Utilities like to live in pipelines. Piping allows utilities to focus on their work rather than focusing on the distal side of the branch. To live in a pipeline, the utility needs to read data from standard input and write data to standard output. If you want to work with records, you'd better be able to make each row a "record." Existing programs, such as sort and join, have been considered that way. They will thank you for doing so.
I occasionally use a utility that repeatedly invokes other programs for a file tree. This takes advantage of the standard UNIX utility filter model, but it applies only to a utility that reads input and then writes out the output, and cannot be used in-place operations or a utility that accepts an input-output filename.
Most programs that you can run with standard input can also be run entirely against a single file or a set of files. Note that this is a violation of the rule against duplication, and it is clear that this can be addressed by feeding the cat's output to the next program in the series. However, this seems to be reasonable in practice.
Some programs may legitimately read a record in a format, but produce a completely different output. An example of this is a utility that divides the input material into columns. Such a utility might treat the rows in the input as records, but produce multiple records on each row in the output.
Not every utility is fully compliant with this model. For example, Xargs does not accept records but accepts file names as input, and all actual processing is done by other programs.
Generalization of
Try to see a task similar to what you actually do, and if you can find a common description of these tasks, it's a good idea to try to write a utility that matches that description. For example, if you find yourself sorting text according to the lexical one day, and another day sorting text by number, it might make sense to consider writing a generic sort utility.
The generalization of functionality sometimes leads you to discover that a program that looks like a single utility is actually the two utilities that work together. This is good. Writing two well-designed utilities may be easier than writing an ugly or complex utility.
Doing a good job doesn't mean doing just one thing. It means dealing with a consistent but useful problem space. Many people use grep. However, its considerable utility lies in the ability to perform related tasks. GREP's various options complete the work of many small utilities, and if these work is done by a separate small utility, it will eventually result in a lot of shared, repetitive code.
This rule, and the rules for doing things well, are the corollary of a fundamental principle: avoid code duplication whenever possible. If you write half a dozen programs, each of which sorts rows, you may end up having to fix six of similar bugs six times instead of using a better-maintained sort program.
This is part of writing a utility that adds most of the work to the process of completing the utility. You may not have the time to fully generic a utility at first, but you'll get a return on the utility when you're using it.
Sometimes it is useful to add related functionality to a program, even though it is not used to accomplish exactly the same task. For example, a program that prints the original binary data perfectly when running on a terminal device may be more useful because it causes the terminal to enter the original mode. This makes it much easier to test problems involving keyboard mappings, new keyboards, and so on. Not sure why did you get the tilde (~) When you pressed the delete key? This is an easy way to figure out what is actually being sent. This is not exactly the same task, but it is similar enough to be an additional feature.
The errno utility in Listing 2 is a good example of generalization because it supports both numeric and symbolic names.
Robust
The stability of the utility program is very important. Utilities that are prone to crashes or that cannot handle real data are not useful utilities. The utility should be able to handle any length of rows, jumbo files, and so on. It may be tolerable for utilities not to handle data sets that exceed their memory capacity, but some utilities do not; For example, sort can, by using temporary files, generally be able to compare data sets that are much larger than their memory size.
Try to be sure to figure out what data your utility might want to manipulate. Do not simply ignore the possibility of data that cannot be processed. You should check this situation and diagnose your utility. The more explicit the error message, the more helpful you are to the user. Try to provide the user with enough information to let them know what happened and how to solve it. When processing the data file, as far as possible accurately identifies the bad data. When trying to parse a number, don't simply give up; you should tell the user what data you got and, if possible, tell the user which row of the data is in the input stream.
As a good example, consider the difference between the two implementations of the DC. If you run Dc/home, one of the implementations will display "cannot use directory as input!" The other implementation just returns silently, with no error messages, and no unusual exit code. When you mistakenly type a CD command, what kind of implementation do you want in the current path? Similarly, if you provide a stream of data in a directory (perhaps executing DC
Security vulnerabilities are often rooted in programs that do not appear robust enough in the face of unexpected data. It's important to remember that a good utility can manage to run as the root (root) user in a shell script. A buffer overflow in a program such as find can pose a risk to a large number of systems.
The better the program handles unexpected data, the more likely it is to adapt to a changing environment. Often, trying to make your program more robust can lead you to better understand the role of the program and make it more generic.
Novel
One of the worst types of utilities to write is the utility you already have. I wrote a wonderful utility called count. It allows me to perform almost any count task. It's an excellent utility, but there's already a standard BSD utility named Jot doing the same thing. Similarly, one of my flexible programs for converting data to columns repeats the functionality of an existing utility RS, which can also be found on BSD systems, except that RS is more flexible and better designed. See resources below for more information about jot and Rs.
If you are about to start writing a utility, take a moment to browse through the various systems to determine if the utility already exists. Don't be afraid to borrow the Linux utility on BSD or borrow the BSD utility on Linux; one of the pleasures of utility code is that almost all utilities are portable.
Don't forget to examine the possibility of combining existing applications to form a utility. In theory, it is possible to combine existing programs to form a utility that is not running fast enough, but writing a new utility is rarely quicker than waiting for a slightly slower pipe.
An example utility
In a sense, this program is an executable file, because it is never useful as a filter. However, it works very well as a command-line utility.
This program only does one thing. It outputs errno rows in/usr/include/sys/errno.h in a near-perfect output format. For example:
$ errno 22
einval [): Invalid argument
Listing 2. errno Finder
#!/bin/sh
Usage () {
echo >&2 ' Usage:errno [numbers or error names]\n '
Exit 1
}
For I
Todo
Case ' $i ' in
[0-9]*)
awk '/^ #define/&& $ = ' $i ' {
for (i = 5; i < NF; ++i) {
foo = foo ' ' $i;
}
printf ('%-22s%s\n ', $ ' [' $ ']: ', foo ');
Foo = '
} ' </usr/include/sys/errno.h
e*)
awk '/^ #define/&& $ = ' $i ' ' {
for (i = 5; i < NF; ++i) {
foo = foo ' ' $i;
}
printf ('%-22s%s\n ', $ ' [' $ ']: ', foo ');
Foo = '
} ' </usr/include/sys/errno.h
*)
echo >&2 ' Errno:can ' t figure out whether ' $i ' is a name or a number. '
Usage
Esac
Done
Has this program been generalized? Yes, it's very ideal. It supports both numeric and symbolic names. On the other hand, it does not know information about other files that may have the same format, such as/usr/include/sys/signal.h. It can be easily extended to do this, but for such a handy utility, it is easier to simply create a copy called "signal" to read the signal.h and use "sig*" as the pattern to match the name.
Although this is only a little easier than using grep for system header files, it is less prone to error. It does not produce useless results because of ill-conceived parameters. On the other hand, if a given name or number is not found in the scratch file, it does not produce diagnostic information. It also does not bother to correct some input errors. Moreover, because the command-line utility has never been intended to be used in an automated environment, the above characteristics are beyond reproach.
Another example might be a program that cancels the sort of input (see Resources for a link to this utility). It's fairly straightforward to read input files, store them in some way, and then generate a random order to output those rows. This is a utility that has virtually unlimited application prospects. Writing this utility is also much easier than writing a sort program, for example, you do not need to specify which keys you don't have, or whether you want to sort alphabetically, in alphabetical order, in lexical order, or in numerical order. The tricky part is reading into lines that can be very long. In fact, the version provided above is deceiving; it assumes that there are no empty bytes in the line being read. Correcting the problem is much more difficult, and I'm too lazy to bother with it when I write it.
Conclusion
If you find yourself performing a task repeatedly, consider writing a program to accomplish this task. If it turns out that the program is more generic, then it's universal, so you write a utility.
Don't design it the first time you need a utility. Wait until you have some experience before you begin to design. Be free to write one or two prototypes; good utilities can prove the time and value of the work better than bad utilities. Don't be sorry if the original idea of a good utility eventually becomes useless after you write it. If you find yourself frustrated by the flaws in your new program, you just need to perform another prototyping phase. If it turns out it's useless, it's not surprising that it happens sometimes.
What you're looking for is a program that looks for generic applications outside of your initial usage patterns. I wrote Unsort because I wanted to find an easy way to get random color sequences from the old X11 "Rgb.txt" file. Since then, I've used it for an incredibly large number of tasks that are not intended to generate test data for debugging and benchmarking routines.
Good utilities can pay for the time you spend on all the less desirable pieces of work. The next thing to do is make it available to other people so that they can experiment with it. Also make your failed attempts available to others, perhaps others have a purpose that you do not need for a utility. More importantly, your failed utility may be the archetype of someone else, giving everyone a wonderful utility.