[Translator's note] Eric Raymond recently translated his comments on several major development languages,
It aroused heated discussions among netizens. Eric Raymond's batch of OO is involved.
Comments, causing controversy. For this reason, I will translate a piece of related text. Please
Read and think.
Modular -- keep it clean, keep it simple
Programmers are facing increasing complexity, and code division methods also have a natural development process. In the beginning, software was just a large piece of machine code. The earliest Procedural language introduced the concept of dividing code based on child routines. Next we invented the library to provide public services for different programs. Then we invented the independent address space and inter-process communication. Today, we are used to distributing program systems on interconnected hosts thousands of miles apart.
Early UNIX developers were also pioneers in the idea of software modularization. Before them, modularization was only the idea of computer science, but not engineering practice. Qihong aims: the only way to develop complex software that can work correctly is to link many simple modules with well-defined interfaces to form the entire system. Only in this way can most local problems be corrected or optimized locally without damaging the whole.
Today, all programmers are taught to be modularized at the subprogram level. Some lucky people have mastered the modular ability at the abstract data type (ADT) level and are already considered good designers. Today's design pattern movement is an attempt to increase this level further and discover the design abstraction that helps to properly organize large program structures.
Encapsulation and optimum module size
The first important characteristic of module code is encapsulation. Encapsulated modules never expose internal information to each other. Instead of looking at the implementation of other modules, they do not share global data randomly, but communicate with each other through APIS.
APIS between modules have a dual identity. At the implementation level, API functions guard the connection points between modules to prevent internal information leakage. At the design level, the API actually defines your system architecture.
The more detailed the module decomposition, the smaller the module, the more important the API definition. The overall complexity and likelihood of errors also decrease.
However, if too much decomposition leads to too small modules, unexpected situations may occur. The figure below is from the statistical data of Hatton in 1997. The image is U-shaped.
Hatton data is trustworthy because it is widely used in different languages and platforms. It can be seen that the module with a code volume between 200 and 400 logic lines works best.
Compact and orthogonal
The Code is not the only software element with the so-called "best single block size. Language and APIs are also limited by human cognitive capabilities and cannot escape the Hatton U curve.
As a result, Unix programmers have discovered two modular elements during the painstaking thinking process of designing APIs, command sets, protocols, and other things: closeness and orthogonal.
Closeness
Closeness refers to the degree to which design is "easy to understand and accept" for the human brain ".
Compact software tools have many advantages, just like handy manual tools. It makes people happy to use and easy to use, greatly improving your efficiency, and is safe and reliable. Unlike those complex tools, it will hurt you without moving.
Compact does not mean weak functionality. If a design building is built on an easily understandable abstract, it can be very powerful and flexible, while maintaining compact. Being compact does not mean that learning is easy. You may have to understand the abstract concept model before you can feel it easy.
Few software products are absolutely compact, but many of them are relatively compact. They have a compact working set and a function subset that can meet the daily needs of 80% or more expert users.
For example, calling APIs in UNIX systems is compact, while the C standard library is not. In UNIX tools, make (1) is compact, while Autoconf (1) and automake (1) are not. In markup language, HTML is compact, but docbook is not. MAN (7) is compact, and troff (1) is not. In general languages, C and Python are compact. Perl, Java, emacs lisp, and shell are not. C ++ is "uncompact"-the designers of the language admit that he does not expect anyone to fully understand C ++.
But it doesn't mean that an uncompact design is evil or bad. Some of the problem domains are too complex to implement in a compact design. The purpose of this emphasis on closeness is not to expect readers to regard it as an absolute requirement, but to treat it as appropriate and practice it as a Unix programmer and never give up easily.
Orthogonal
Orthogonal helps you to compact complex designs. At this point, it is very important. In the pure interaction design, operations have no side effects. Each action changes only one thing without affecting other things. Each attribute in the system has only one way to change it.
Computer Monitor is a good example of orthogonal, you can adjust the light and shade without affecting the saturation, color balancing control is also independent of each other. If not, imagine how much trouble you will encounter!
There are too many non-orthogonal designs in the software. For example, the author of The format conversion function often needs the path name of a source file as a parameter without thinking about it. However, the input often comes from an open file handle, this function should not have the "open file" side effect. In the future, this function can process data streams from various sources.
Doug mciloy's famous saying "only do one thing well" is often regarded as a rumor about simplicity. In fact, the emphasis on orthogonal in this sentence is at least the same weight.
The main problem with non-orthogonal thinking is that side effects disrupt the thinking models of programmers and users and are often forgotten, resulting in different degrees of disaster.
UNIX APIs are a good orthogonal design example. Because of this, the C library on other platforms tries their best to imitate it. So even if you are not a Unix programmer, it is worth studying.
Spot rules
The pragmatic programmer book points out a particularly important class of orthogonal, "Don't repeat yourself" Law: any knowledge point should be unique and unambiguous, there is no doubt in the system. In this book, I follow the advice of brain kernighan and call this rule single point of truth, or spot for short.
Duplication leads to inconsistency, which may pose potential harm to the code. Because if you want to change one of the duplicate information, you must remember to change all its embodiment. This shows that you have not clearly organized your code.
The software is multi-layer
Broadly speaking, when designing functions or hierarchy, you have two options to choose from, and your choice is code layering) it will have a major impact.
Top-down, bottom-up
One Direction is from the bottom up. From the problem domain, specific operations will be used to start from the beginning-from the specific to the abstract. For example, if you want to develop a firmware for a disk drive (firmware), you can have some operation primitives on the lower layer, such as "moving the head to a physical block" and "reading the physical fast ", "Writing physical fast", "switching led", etc.
The other direction is from top to bottom, from abstraction to detail, from the top-level program or overall logic description specifications to down to individual operations. For example, if someone designs a massive memory controller that can control different media, they can start with abstract operations, such as addressing logical blocks, read logical blocks, and write logical blocks ", "Switch indicates the device ". The operations at the hardware level mentioned above are very different,
A big example is Web browsers. The top-down design starts with a standard description-What URL types can be accepted, what images can be displayed, and how Java and JavaScript are supported. The Implementation Layer corresponding to the top-down view is the main event loop of the application.
At the same time, the Web browser must call a large number of special meta operations (primitives ). For example, establishing a network connection, sending data, and receiving responses, such as GUI-related operations, such as HTML parsing operations.
It is important to start from where, because your starting point may have limits on your ending point. If you perform full top-down operations, you may find that the meta operations required logically cannot be fully implemented. If you are completely low, you will find that you have done a lot of things unrelated to the program.
Since the 1960 s, junior programmers have been taught that writing programs should be "top-down, gradually subdivided ". It is a good experience to set up the following three conditions from top to bottom:. you can determine the application requirements in advance, B. this specification is unlikely to change during implementation. at the bottom layer, you have full freedom to choose the way you complete your work.
The higher the program hierarchy, the more easily these conditions are met. However, these conditions are often not true even in application development at the highest level.
Access self-protection, programmers try to work on both sides. On the one hand, abstract specifications are expressed based on top-down application logic, and on the other hand, Meta operations in the field are summarized using functions and libraries, which can be reused when high-level design changes.
UNIX programmers are mainly engaged in system program design, so they tend to develop in a bottom-up manner.
In general, bottom-up development is more attractive. It allows you to develop in an exploratory way and gives you plenty of time to refine vague specifications, it is also more in line with the inherent laziness of programmers-in case of errors, the amount of decommissioned code is usually much less.
However, the actual code is generally combined with the top-down and bottom-up. The two are often used in a project, which directly leads to the emergence of the glue layer.
Gluing Layer
When the top-down and bottom-up cars collide, the situation is usually chaotic. The application logic at the top layer and the metadata operation at the bottom layer must be blocked by the glue layer.
For decades, Unix programmers have understood the truth that the bonding layer is an annoying thing. the thinner the bonding layer, the better. This is a major event of life! The glue layer should be used to stick things together, rather than to mask conflicts and cracks between layers.
In the example of the browser above, the bonding layer includes: Setting the Document Object parsed by HTML as a bitmap in the display buffer. This part of the code is notorious and hard to write, with hundreds of errors. Errors and defects in HTML parsing and GUI Libraries are displayed in this layer.
The bonding layer of the browser must act as the intermediary between the specifications and metadata operations, and act as the intermediary between several different external specifications-HTTP network protocol behavior, HTML document structure, different graphics and multimedia formats, as well as user behavior from the GUI.
A single layer of glue is already prone to errors, but this is not the worst. If a designer is aware of the existence of the glue layer and tries to organize it into an intermediate layer using a set of data structures or objects, then there will be two more layers-one on the middle layer and the other on the bottom. Programmers who are smart but lack experience often actively jump into this trap. They make basic classes (application logic, middle-level and meta-Operations) as beautiful as the example in the textbook, in the end, in order to bond these beautiful codes together, the layers become increasingly thick until they become sleepy.
C language itself is considered a good example of thin gluing layer.
UNIX and object-oriented languages
Since the middle of the 1980 s, new languages have claimed to provide direct support for object-oriented programming.
The concept of OO design is proven to be valuable in graphics systems, Gui systems, and simulation systems. However, history has proved that, apart from these areas, OO has not brought obvious benefits, which many people are surprised and frustrated. We should try to understand the truth, which will be very meaningful.
There are some conflicts and tension between the traditional Modular Technology of UNIX and the model developed around OO language. UNIX programmers are more skeptical about oo than others. One of the reasons is the diversity rule. Oo is said to be the only correct solution to software complexity issues, which is no doubt. However, there are further reasons.
As we mentioned earlier, in the traditional modularization of UNIX, the thin gluing layer is an important principle. The fewer abstract layers from the top-Layer Program objects to the lower-layer hardware, the better.
This is partly due to the impact of C. Simulating a real object in C is a very laborious task. Therefore, stacking a large number of abstract layers is simply an old thing. Therefore, the object layers in C tend to be flat and transparent. In the long run, Unix programmers were used to thin Bonding/light layers in other languages.
OO language makes abstraction easier-maybe it is too easy. It encourages the entire architecture to have a thick and exquisite gluing layer. If the problem domain is indeed complex and requires a lot of abstraction, this may be a good thing. However, this is also a bad thing, because programmers will use very complicated methods to do simple things at last, just because they can do this.
All oo languages tend to attract programmers into the trap of "excessive layering. Object frameworks and object browsers cannot replace good designs and documents, but they are often seen as one thing. Too many layers undermine transparency-we can hardly see the following, and it is difficult to establish a clear model for the functions of the Code. Simplicity, clarity, and transparency are all destroyed. The result code is full of obscure errors, causing serious maintenance problems.
This situation continues to deteriorate. Many training courses treat thick software layering as a good thing to teach-you have a lot of classes that are considered to be the embodiment of the hidden knowledge in data. The problem is that the "Smart Data" in the glue layer is often irrelevant to the natural entities operated by the program, but only the glue itself. (An exact identifier is the constant value-added of the abstract subclass and the so-called "minxins ".)
UNIX programmers have instincts on these issues. In UNIX, the OO language cannot replace non-oo languages such as C, Perl (although Oo is supported, but rarely used) and shell, which is probably one of the reasons. In the Unix world, the criticism of OO is much more acute than in other fields. UNIX programmers know when Oo should not be used. Even if Oo is used, they try to keep the object design as concise as possible. As Michael padlipsky said: "If you know what you are doing, three layers are sufficient. If you do not know what you are doing, layer 17 is useless ."
The reason for oo's success in the GUI, simulation, and graphics fields may be that in these fields, it is relatively easy to solve the problem of "type existence or not. For example, there is a natural ing between classes and visual objects in the GUI and graphic system. If you find that the added classes do not directly map visual objects, you may also find that the layers have become very thick.