The art of writing Linux utilities

Source: Internet
Author: User

Linux and other UNIX-like systems always come with a large number of tools that execute a wide range of functions from obvious to incredible. The success of a UNIX-like programming environment is largely attributed to the high quality and selection of tools, as well as the simplicity of their interconnection.

As a developer, you may find that the existing utility is not always able to solve the problem. Although it is easy to solve many problems by combining existing utilities, solving other problems requires at least some practical programming work. These subsequent tasks are usually candidate tasks for creating new Utilities. Creating new utilities in combination with existing utilities can solve the problem by doing the least work. This article describes the quality of excellent utilities and the process of designing such utilities.

What are the qualities of excellent utilities?

The UNIX Programming Environment book by Kernighan & Pike contains a wonderful discussion on this issue. An excellent Utility is a utility used to do your work as well as possible. It must work well with other utilities; it must be easy to work with other utilities. Programs that cannot be used together with other utilities are not utilities, but applications.

Utilities should allow you to easily build disposable applications at a low cost based on the materials at hand. Many people think that utilities are like tools in the toolbox. The purpose of designing a utility is not to allow a single tool to do everything, but to own a set of tools, each of which does one thing as well as possible.

Some utilities are quite useful themselves, while other utilities must work with a series of utilities. Examples of the former include sort and grep. On the other hand, xargs is rarely used separately except for other utilities (the most common is find.

What language is used to write utilities?

Most UNIX system utilities are written in C. The examples in this article use Perl and sh. Use appropriate tools to do the right thing. If you use a utility frequently enough, the cost of writing it in a compiled language may be rewarded by performance improvement. On the other hand, the script language may provide faster development speed when the workload of the program is very low.

If you are not sure, you should use the language you know best. At least when you prototype a utility or figure out how it is available, the programmer's efficiency will take precedence over performance adjustments. Most UNIX system utilities are written in C, because these utilities are used frequently enough to make efficiency more important than development costs. Perl and sh (or ksh) may be good languages for rapid prototyping. For utilities that work with other programs, it may be easier to use shell to write them than to use a more traditional programming language. On the other hand, when you want to interact with the original bytes, C may be the best choice.

Design utilities

A good rule of thumb is that when you have to solve a problem for the second time, you should first consider the design of the utility. Do not feel sorry for the one-time work you wrote for the first time; you can regard it as a prototype. For the second time, compare your required functions with those required for the first time. Before and after the third time, you should consider taking the time to write a common utility. Even purely repetitive tasks may benefit the development of utilities. For example, many universal file rename programs have been developed because people are disappointed to try to rename files in a common way.

The following are some utility design goals; each goal will be described in the following separate sections.

Do one thing well; do not do multiple things badly. The best example of doing one thing well may be sort. No utility except sort has the sorting function. The basic idea is simple: if you only solve one problem at a time, you can spend time solving it.

Imagine that if most programs have the sorting function, but some only support the lexical sorting, while others only support the numerical sorting, and some even support the selection of keywords instead of the entire row, that would be a frustrating thing. At least, this is also annoying.

When you find that a problem needs to be solved, you should try to break down the problem into multiple parts. Do not repeat the existing parts in other utilities. The more you pay attention to tools that can be used with existing tools, the more useful your utilities are.

You may need to write multiple programs. The best way to complete specialized tasks is usually to write one or two utilities and link them with clues, rather than writing a single program to solve the whole thing. It is ideal to use 20 rows of shell scripts to combine the new utility with existing tools. If you try to solve the problem once, the first change that follows may require you to reconsider the whole process.

I occasionally need to generate two or three columns of output from the database. Compile a program to generate output in a single column, and then use a program to separate the output, which is usually more efficient. The shell scripts that combine these two utilities are temporary, and individual utilities have a longer life cycle than the scripts.

Some utilities serve very specific needs. For a directory containing a large amount of content, if ls output gets out of the screen very quickly, this may be because one of the files has a very long file name, this forces ls to only use a single column for output. It takes some time to use more for the output pagination. Why do we sort the rows by length like the following and then output the results through tail?

List 1. Minimal utility sl that can be found in the world

Print sort {length $ a <=> length $ B} <>;

The script in Listing 1 exactly does one thing. It does not accept any options because it does not require options; it only cares about the length of the row. Thanks to the convenient <> Expression of Perl, this small utility applies to both standard input and Files specified by the command line.

Become a filter

Almost all utilities are best suited to think of as filters, although some very useful utilities do not match this model. (For example, a program may be very useful when executing a count, although it does not work well as a filter. Programs that accept command line parameters only as input and potentially generate complex output may be very useful .) However, most utilities should work as filters. By convention, filters act on the rows of text. Most filters should support multiple input files.

Remember that the utility needs to be run in the command line and script. Sometimes, the ideal behavior is slightly different. For example, most versions of ls will automatically sort the input to multiple columns when writing to the terminal. By default, grep prints the name of the file from which the matching item is found when multiple files are specified. The difference should be related to the way the user wants the utility to work, rather than other matters. For example, the old version of GNU bc displays a forced copyright tag at startup. Do not do that. Let your utility do what it should do.

Utilities like to live in pipelines. The pipeline allows utilities to focus on their work, rather than focusing on the details. To live in a pipeline, the utility needs to read data from the standard input and then write data to the standard output. If you want to process records, you 'd better make each row a "record ". Existing programs such as sort and join have already considered this. They will thank you for doing so.

I occasionally use a utility that repeatedly calls other programs for a file tree. This fully utilizes the standard UNIX utility filter model, but this model is only applicable to the utility that reads input and writes output; A utility that cannot be used for local operations or for receiving input/output file names.

Most programs that can run with standard input can also run on a single file or a group of files. Note: it can be proved that this violates the rule against repeated work; obviously, this can be solved by feeding cat output to the next program in the series. However, this seems reasonable in practice.

Some programs may read records in a valid format, but produce completely different outputs. An example is a utility that divides input materials into columns. Such a utility may regard the rows in the input as records, but multiple records are generated on each line in the output.

Not every utility fully fits this model. For example, xargs does not accept records, but accepts file names as input, and all actual processing is done by other programs.


Try to think of a task as similar to the one you actually execute. If you can find a general description of these tasks, you 'd better try to write a utility that fits the description. For example, if you find that you are sorting the text by morphology in one day, and sorting the text by number in another day, it may be meaningful to write a general sorting utility.

The generalization of functions sometimes leads to the discovery that a program that looks like a single utility is actually two utilities used together. This is good. Writing two well-designed utilities may be easier than writing an ugly or complex utility.

Doing one thing well does not mean simply doing one thing. It means to handle consistent but useful problem space. Many people use grep. However, its major utility lies in its ability to execute related tasks. The various options of grep are used to complete the work of many small utilities. If these work is done by a separate small utility, a large amount of shared and repeated code will eventually be generated.

This rule and the rule for doing one thing well are all inevitable results of a fundamental principle: Avoid code duplication whenever possible. If you write a half-dozen program, each of which sorts rows, you may have to fix six similar bugs six times instead of using a sort program for better maintenance.

This is part of writing a utility, that is, adding most of the work to the process of completing the utility. You may not have time to fully generalize a utility at the very beginning, but when you keep using the utility, you will get a corresponding return.

Sometimes it is useful to add related functions to a program, even if this function is not used to complete identical tasks. For example, a program that prints raw binary data perfectly when running on a terminal device may be more useful because it enables the terminal to enter the original mode. This makes it much easier to test problems involving keyboard ing and new keyboards. I'm not sure why you get the font size when you press the delete key (~) ? This is an easy way to find out what content is actually sent. This is not a completely identical task, but it is similar enough and may become an additional feature.

The errno utility in Listing 2 is a good example of generalization because it supports both numbers and Symbol names.

The stability of the utility is very important. Utilities that are prone to crashes or cannot process real data are not useful utilities. Utilities should be able to process rows of any length, giant files, and so on. It may be tolerable that a utility cannot process datasets that exceed its memory capacity, but some utilities do not. For example, sort uses temporary files, generally, data sets with a much larger memory capacity can be sorted.

Make sure that you know what data your utility may want to operate on. Do not simply ignore the possibility of data that cannot be processed. Check this situation and diagnose your utility. The clearer the error message, the more helpful you are to the user. Try to provide users with sufficient information so that they can know what happened and how to solve it. When processing data files, identify bad data as accurately as possible. When trying to parse a number, do not simply give up; tell the user what data you have obtained, and, if possible, tell the user which row of the data is in the input stream.

As a good example, consider the difference between the two dc implementations. If you run dc/home, one of the implementations displays "Cannot use directory as input !" Another method is to return the code silently without any error message or unusual exit code. When you mistakenly type A cd command, what kind of implementation do you prefer in the current path? Similarly, if you provide data streams in a directory (maybe execute dc

Security Vulnerabilities are often rooted in programs that do not show robust enough in the face of unexpected data. Remember that excellent utilities can run as root users in shell scripts. Buffer Overflow in programs such as find may bring risks to a large number of systems.

The better the program processes unexpected data, the more likely it is to adapt to the changing environment. Generally, trying to make the program more robust will lead you to better understand the role of the program and make it more universal.


One of the worst types of utilities to write is the ones you already have. I wrote a wonderful utility named count. It allows me to execute almost any count task. It is an excellent utility, but there is already a standard BSD utility named jot to do the same thing. Similarly, one of my flexible programs used to convert data into columns repeats the functionality of an existing utility rs, which can also be found on the BSD system, but rs is more flexible and better designed. For more information about jot and rs, see the references below.

If you are about to write a utility, take a moment to look at the various systems to see if that utility already exists. Do not be afraid to borrow Linux utilities on BSD or BSD utilities on Linux. One of the fun of utility code is that almost all utilities are very portable.

Do not forget to examine the possibility of combining existing applications to form a utility. Theoretically, it is possible that the program running by combining existing programs is not fast enough, but writing a new utility is rarely faster than waiting for a slow pipeline.

An example utility

In a sense, this program is an executable file, because it will never be of any use as a filter. However, it works very well as a command line utility.

This program does only one thing. It outputs the errno row in/usr/include/sys/errno. h in almost perfect output format. For example:

$ Errno 22
EINVAL [22]: Invalid argument

Listing 2. errno Finder

#! /Bin/sh
Usage (){
Echo> & 2 'usage: errno [numbers or error names] \ N'
Exit 1
For I
Case '$ I' in
[0-9] *)
Awk '/^ # define/& amp; $3 = ''$ I ''{
For (I = 5; I <NF; ++ I ){
Foo = foo ''$ I;
Printf ('%-22 s % s \ n', $ 2' [' $ 3']: ', foo );
Foo =''
} '</Usr/include/sys/errno. h

E *)
Awk '/^ # define/& amp; $2 = ''' $ I '''{
For (I = 5; I <NF; ++ I ){
Foo = foo ''$ I;
Printf ('%-22 s % s \ n', $ 2' [' $ 3']: ', foo );
Foo =''
} '</Usr/include/sys/errno. h

Echo> & 2 'errno: can't figure out whether '$ I' is a name or a number .'


Is this program universal? Yes, it is ideal. It supports both numbers and Symbol names. On the other hand, it does not know information about other files that may have the same format, such as/usr/include/sys/signal. h. It can be easily extended to achieve this, but for such a convenient utility, a copy named "signal" is simply created to read signal. h. Use "SIG *" as the pattern to match the name, which makes it easier.

Although it is easier to use grep than the system header file, it is not prone to errors. It does not produce useless results because of the low-week parameter. On the other hand, if a given name or number is not found from the original file, it will not generate diagnostic information. It also does not bother to correct some input errors. Moreover, since the command line utility has never been intended to be used in an automated environment, the above features are beyond review.

Another example may be to cancel the program for sorting input (see references for a link to this utility ). This is quite simple; that is, read the input files, store them in some way, and then generate a random order to output those rows. This is a utility with almost unlimited application prospects. Writing this utility is much easier than writing a sort program. For example, you do not need to specify which keys are not sorted, or you want to sort data in alphabetical, lexical, or numerical order. The tricky part is that reading a row may be very long. In fact, the version provided above is spoofing; it assumes that there is no Null Byte in the row to be read. It is much more difficult to correct this problem. I am too lazy to ignore it when writing it.


If you find that you are executing a task repeatedly, you can write a program to complete the task. If it turns out that the program is more generic, it will be more generic, So that you write a utility.

Do not design a utility when you need it for the first time. You must wait for some experience to start designing. Compile one or two prototypes at will; good utilities prove the time and value of research work better than bad ones. Do not be sorry if the previously designed utility is useless after you write it. If you find yourself frustrated with the disadvantages of the new program, you only need to perform another prototype phase. If the result proves that it is useless, it is not surprising that sometimes such a thing will happen.

You are looking for a program that looks for general applications out of your original usage mode. I wrote unsort because I want to find an easy way to obtain a random color sequence from the old x111_rgb.txt file. Since then, I have applied it to an incredible number of tasks, not to generate test data for debugging and benchmarking routines.

Excellent utilities can bring you back the time spent on all the less ideal works. The next thing to do is to make it available to others so that they can test it. It also makes your failed attempt available to others. Maybe others have a purpose you do not need for a utility. More importantly, your failed utility may be the prototype of someone else, bringing a wonderful utility to everyone.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.