Chen Shuo (giantchen_at_gmail)
Blog.csdn.net/solstice
Abstract: As the author of the C ++ dynamic library, virtual functions should be avoided as the library interface. This will cause a lot of trouble to maintain binary compatibility. We have to add a lot of unnecessary interfaces and eventually repeat the same mistakes of COM.
This article mainly discusses the Linux X86 platform, and will continue to use Windows/COM as a negative teaching material.
This article is a continuation of the previous article "C ++ Engineering Practice (4): binary compatibility ".ArticleWhen I reached a consensus on the harm of "using virtual functions as interfaces", I wrote it briefly. It seems that this is not the case, so I have to talk about it.
"Interfaces" have different meanings and narrow meanings. In this document, "interfaces" are used to represent interfaces in a broad sense, that isCodeInterface; use an English interface to express a narrow interface, that is, a class that only contains virtual functions. This class generally does not have data member and has a special keyword interface in Java to represent it.
C ++ Program Survival environment of the database author
Suppose you are the maintainer of a shared library, and your library is used by two or three other teams of the company. You have discovered a security vulnerability or a bug that may cause crash to be repaired urgently. Can you directly deploy the binary file of the library after the fix? Does this affect binary compatibility? Will it destroy executable files that have been compiled by other teams into the environment? Is it necessary to force other teams to re-compile the link and release the new version of the executable file? Will it disrupt others' release cycle? These are common problems in engineering development.
If you want to write a new C ++ library, make the following decisions:
- In what way? Dynamic library or static library? (This article does not considerSource codeRelease this situation, which is actually similar to the static library .)
- How can I expose the library interface? Optional methods: Use global (including namespace level) functions as interfaces, non-virtual member functions of the class as interfaces, and virtual functions as interfaces ).
(Java programmers do not have to consider so much. Just write the class member function directly. At most, consider whether to mark final for method or class. You do not need to consider static libraries of dynamic libraries. Jar files .)
Before making the above two decisions, we should consider two basic assumptions:
- The Code has bugs, and the library is no exception. Bug fixes may be released in the future.
- There will be new functional requirements. Writing code is not just a matter of buying and selling. there are always new demands and programmers need to add something to the library. This is a good thing, so that programmers do not lose their jobs.
(If your code is perfect when it is released for the first time and you do not need to modify it in the future, you can do whatever you want, so you do not have to continue reading this article .)
That is to say,When designing a database, you must consider how to upgrade it in the future..
Make decisions based on the above two basic assumptions. The first decision is very easy to do. If hot fix is required, only dynamic libraries can be used. Otherwise, it is easier to deploy static libraries in Distributed Systems, which has been discussed previously. (The advantage of "saving memory in dynamic databases than in static databases" is not very important today .)
The following document assumes that you or your boss chooses to release the. So or. dll file in a dynamic library to see how to make the second decision. (In other words, if you can publish a static library, you won't have any trouble in the future .)
The second decision is not that easy. The key issue is that you should select an extensible Interface style to make the upgrade of the database easier. "Upgrade" has two meanings:
- For Bug fix only upgrades, the replacement of binary library files should be compatible with existing binary executable files. The problem of binary compatibility has been discussed in the previous article.
- The Customer Code should be friendly for the upgrade of new features. After the database is upgraded, the cost of using new features on the client is relatively small. You only need to include the new header file (this step can be omitted, if the new function has been added to the original header file), and then write the new code. Besides, do not leave it in the Customer CodeJunkIn the following article, we will talk about what garbage is.
Before discussing the drawbacks of the virtual function interface, let's take a look at the common usage of the virtual function as an interface.
Two major uses of virtual functions as library Interfaces
The following two methods are available for virtual functions:
- CallThat is, the functions provided by the Library (such as drawing graphics) are exposed to the client code through the virtual function interface. Client Code generally does not need to inherit this interface, but directly calls its member function. This is said to be conducive to the separation of interfaces and implementations. I think it is purely a waste of pants and fart.
- CallbackThat is, event notification, such as "connection establishment", "Data arrival", and "connection disconnection" of the network library. Client Code generally inherits this interface, and then registers the object instance to the database, and calls itself back and forth. Generally, the client does not call these member functions by itself, unless it is used to write unit tests and simulate the behavior of the database.
- HybridA class can be inherited by the client code and used as a callback, and can be directly called by the client. To be honest, I did not see the benefits of doing so, but in reality some object-oriented C ++ libraries are designed in this way.
For the "Callback" method, modern c ++ has a better practice, that is, boost: Function + boost: bind. For details, see [4]. all muduo callbacks use this new method. For details, see muduo network programming example: preface. This article does not consider the obsolete method of using virtual functions as callback.
For the "call" method, here is a fictitious graphics library. The function of this library is to draw lines, draw rectangles, and draw arcs:
1: StructPoint
2:{
3:IntX;
4:IntY;
5:};
6:
7: ClassGraphics
8:{
9:Virtual VoidDrawline (IntX0,IntY0,IntX1,IntY1 );
10:Virtual VoidDrawline (point P0, Point P1 );
11:
12:Virtual VoidDrawrectangle (IntX0,IntY0,IntX1,IntY1 );
13:Virtual VoidDrawrectangle (point P0, Point P1 );
14:
15:Virtual VoidDrawarc (IntX,IntY,IntR );
16:Virtual VoidDrawarc (point P,IntR );
17:};
Many irrelevant details are omitted here, such as the structure and structure of graphics, the draw * () function should be public, and graphics should not be allowed to copy, for example, graphics may use pure virtual functions, which does not affect the discussion in this article.
The graphics library is easy to use and the client looks like this.
Graphics * g = getgraphics ();
G-> drawline (0, 0,100,200 );
Releasegraphics (g); G = NULL;
Everything seems to be fine, sunny, and in line with the "Object-Oriented principle", but once you consider upgrading, it will immediately become complicated.
Disadvantages of using virtual functions as interfaces
It is difficult to use virtual functions as interfaces in binary compatibility: "Once released, it cannot be modified ".
Suppose I want to add several drawing functions to graphics while maintaining binary compatibility. The coordinates of these new functions are represented by floating-point numbers. My ideal new interface is:
--- Old/graphics. h 13:12:44. 000000000 + 0800 ++ new/graphics. h 13:13:30. 000000000 + 0800 @-+ @ Class graphics {virtual void drawline (INT x0, int y0, int X1, int Y1); + virtual void drawline (double x0, double y0, double X1, double Y1); Virtual void drawline (point P0, Point P1); Virtual void drawrectangle (INT x0, int y0, int X1, int Y1 ); + virtual void drawrectangle (double x0, double y0, double X1, double Y1); Virtual void drawrectangle (point P0, Point P1); Virtual void drawarc (int x, int y, int R); + virtual void drawarc (Double X, Double Y, Double R); Virtual void drawarc (point P, int R );};
We cannot do this because of the limitations in the binary compatibility of C ++. The essential problem is that C ++ implements virtual function calling in vtable [offset] mode, and offset is implicitly determined based on the position declared by the virtual function, which causes vulnerability. I added drawline (double x0, double y0, double X1, double Y1) to change the vtable arrangement, the existing binary executable files cannot use the old offset to call the correct function.
What should we do? There is a dangerous and ugly way: Put the new virtual function at the end of the interface, for example:
--- Old/graphics. h 13:12:44. 000000000 + 0800 ++ new/graphics. h 13:58:22. 000000000 + 0800 @-+ @ Class graphics {virtual void drawline (INT x0, int y0, int X1, int Y1); Virtual void drawline (point P0, point P1); Virtual void drawrectangle (INT x0, int y0, int X1, int Y1); Virtual void drawrectangle (point P0, Point P1); Virtual void drawarc (int x, int y, int R); Virtual void drawarc (point P, int R); ++ virtual void drawline (double x0, double y0, double X1, double Y1 ); + virtual void drawrectangle (double x0, double y0, double X1, double Y1); + virtual void drawarc (Double X, Double Y, Double R );};
This is ugly because the new drawline (double x0, double y0, double X1, double Y1) function is not in the same way as the original drawline () function, causing inconvenience in reading. This operation is dangerous at the same time, because if graphics is inherited, the new virtual function will change the vtable offset in the derived class, which is also not Binary compatible.
There are also two seemingly safe practices, which are also the methods adopted by COM:
1.Extend existing interfaces through chain inheritanceFor example
--- graphics. h 13:12:44. 000000000 + 0800 ++ graphics2.h 13:58:35. 000000000 + 0800 @-+ @ Class graphics {virtual void drawline (INT x0, int y0, int X1, int Y1); Virtual void drawline (point P0, point P1); Virtual void drawrectangle (INT x0, int y0, int X1, int Y1); Virtual void drawrectangle (point P0, Point P1); Virtual void drawarc (int x, int y, int R); Virtual void drawarc (point P, int R) ;};++ class graphics2: Public graphics +{+ using graphics: drawline; + using graphics:: drawrectangle; + using graphics: drawarc; ++ // added in version 2 + virtual void drawline (double x0, double y0, double X1, double Y1 ); + virtual void drawrectangle (double x0, double y0, double X1, double Y1); + virtual void drawarc (Double X, Double Y, Double R); ++
If you want to add more functions in the future, there will also be class graphics3: Public graphics2; and class graphics4: Public graphics3. This is as ugly as the previous practice, because the new drawline (double x0, double y0, double X1, double Y1) function is located in the graphics2 interace derived from the original drawline () functions stay together, resulting in separation.
2.Extend existing interfaces through multi-InheritanceFor example, define graphics2 with the same members as graphics class.
--- Graphics. h 13:12:44. 000000000 + 0800 ++ graphics2.h 13:16:45. 000000000 + 0800 @-+ @ Class graphics {virtual void drawline (INT x0, int y0, int X1, int Y1); Virtual void drawline (point P0, point P1); Virtual void drawrectangle (INT x0, int y0, int X1, int Y1); Virtual void drawrectangle (point P0, Point P1); Virtual void drawarc (int x, int y, int R); Virtual void drawarc (point P, int R) ;};++ class graphics2 + {+ virtual void drawline (INT x0, int y0, int X1, int Y1); + virtual void drawline (double x0, double y0, double X1, double Y1); + virtual void drawline (point P0, Point P1 ); + virtual void drawrectangle (INT x0, int y0, int X1, int Y1); + virtual void drawrectangle (double x0, double y0, double X1, double Y1 ); + virtual void drawrectangle (point P0, Point P1); ++ virtual void drawarc (int x, int y, int R); + virtual void drawarc (Double X, double y, double R); + virtual void drawarc (point P, int R); ++ // use multiple interfaces to inherit from each other in the implementation + class graphicsimpl: Public graphics, // version 1 + public graphics2, // version 2 + {+ //... + };
ThisInterface with versionIn the eyes of COM users, it seems to be normal. The problem of binary compatibility is solved, and the client source code is not affected.
In my opinion, the interface with a version is really ugly, because each change introduces a new interface class, which makes it difficult to manage the client code in the future. For example, if the Code uses the graphics3 function, do you want to replace all existing graphics2?
- If it is not replaced, a program depends on multiple versions of graphics at the same time, and it is always carrying the historical burden. There are more and more dependent graphics versions. How can we manage them in the future?
- If you want to replace it, why does the irrelevant Code (the existing code that runs graphics2 well) be modified because graphics3 is used elsewhere?
These two difficulties are purely caused by "virtual functions as the library interface. If we can directly expand class graphics in the same place, this will not happen. For details, see the "Recommended Practices" section in this article.
If Linux calls are implemented using the COM interface
Perhaps the graphics example above is too simple to fully expose the disadvantages of "using virtual functions as interfaces". Let's look at a real case: Linux kernel.
Linux Kernel has grown from 67 system calls in 0.10 to 340 in 2.6.37. The kernel interface has been extended and maintains good compatibility. The method to maintain compatibility is very good, it is to assign a permanent numeric code to each system call, which is equivalent to fixing the arrangement of the virtual function table. Click the two links at the beginning of this section to see that fork () is named 2 in Linux 0.10 and Linux 2.6.37. (The number of the system call is related to the hardware platform. Here we are looking at the x86 32-bit platform .)
Imagine what a spectacular phenomenon would Linus use the chain inheritance style of the COM interface to describe? To avoid disturbing the line of sight, watch the Code inherited from nearly a hundred layers. (The relationship and version number are not necessarily 100% accurate. I used git blame to check the Code. The Code listed now only ranges from 0.01 to 2.5.31. I believe it is sufficient to show the disadvantages of the COM interface method .)
Do not mistakenly think that "once an interface is released, it cannot be changed". It is just an Inherent Drawback of "using C ++ virtual functions as an interface". If you think out of this box, in fact, the C ++ library interface is easy to do better.
Why can't I change it? It is not because C ++ virtual functions are used as interfaces. Java interfaces can be used to add new functions, and libraries in the C language can also be used to add new global functions, c ++ class can also add new non-virtual member functions and non-member functions at the namespace level, which can expand the original interface without inheriting the new interface. However, the COM interface cannot be expanded in the original place. It can only work around und through inheritance to generate a bunch of interfaces with versions. Some people say that com is a positive example of binary compatibility. COM does achieve "binary compatibility" in the most ugly way ". Weakness and rigidity are the fate of using C ++ virtual functions as interfaces.
On the contrary, Linux system calls are fixed by the compile-time constant and remain unchanged for years, easily solving this problem. In other object-oriented languages (Java/C #), I have never seen an incremental version number for the interface every time I change it.
In response to the sentence in the Zen of Python, the explicit is better than implicit, flat is better than nested.
Recommended Practices for dynamic library Interfaces
There are two methods depending on the scope of use of the dynamic library.
If the dynamic library is used in a narrow range, for example, the two or three programs in the team are in use, the users are controlled, and it is easier to coordinate the release of new versions, so it is not too time-consuming, you only need to manage the release version. Then, use rpath In the executable file to determine the full path of the database.
For example, the graphics library has released version 1.1.0 and version 1.2.0. These two versions do not have to be binary compatible. When your code is upgraded from 1.1.0 to 1.2.0, you need to re-compile the code. Anyway, they need to re-compile the code to use new functions. If the patch is to be installed in the same place, 1.1.1 should be Binary compatible with 1.1.0, while 1.2.1 should be compatible with 1.2.0. If you want to add new features that are incompatible with 1.2.0, you should release them to 1.3.0.
To facilitate binary compatibility check, you can consider distinguishing the exposure of the library code. The muduo header file and class are consciously divided into user-visible and user-invisible parts. For the visible parts of the user, pay attention to the binary compatibility during upgrade and select a reasonable version number. For invisible parts of the user, you do not have to worry about the upgrade of the library. In addition, muduo is designed to be released in the form of a static library, so there is not much consideration for binary compatibility.
If the library is widely used and many users have different release cycle types, we recommend the pimpl technique [2, item 43], we also consider using the non-member non-friend function in namespace [1, item 23] [2, item 44 Abd 57] As the interface. The previous graphics example illustrates the basic pimpl method.
1. Do not have virtual functions in the exposed interface, and sizeof (graphics) = sizeof (graphics: impl *).
Class graphics {public: Graphics (); // outline ctor ~ Graphics (); // outline dtor void drawline (INT x0, int y0, int X1, int Y1); void drawline (point P0, Point P1); void drawrectangle (INT x0, int y0, int X1, int Y1); void drawrectangle (point P0, Point P1); void drawarc (int x, int y, int R); void drawarc (point P, int R); Private: Class impl; Boost: scoped_ptr <impl> impl ;};
2. In the implementation of the library, forward the call (forward) to the implementation of graphics: impl. This part of the code is located in. So/. dll and changes along with the upgrade of the library.
# Include <graphics. h> class graphics: impl {public: void drawline (INT x0, int y0, int X1, int Y1); void drawline (point P0, Point P1 ); void drawrectangle (INT x0, int y0, int X1, int Y1); void drawrectangle (point P0, Point P1); void drawarc (int x, int y, int R ); void drawarc (point P, int R) ;}; graphics: Graphics (): impl (New impl) {} graphics ::~ Graphics () {} void graphics: drawline (INT x0, int y0, int X1, int Y1) {impl-> drawline (x0, y0, X1, Y1 );} void graphics: drawline (point P0, Point P1) {impl-> drawline (P0, P1 );}//...
3. If you want to add new functions, you do not need to extend them by inheritance. You can modify them in situ and maintain binary compatibility. First move the header file:
--- Old/graphics. h 15:34:06. 000000000 + 0800 ++ new/graphics. h 15:14:12. 000000000 + 0800 @-+ @ Class graphics {public: Graphics (); // outline ctor ~ Graphics (); // outline dtor void drawline (INT x0, int y0, int X1, int Y1); + void drawline (double x0, double y0, double X1, double Y1); void drawline (point P0, Point P1); void drawrectangle (INT x0, int y0, int X1, int Y1); + void drawrectangle (double x0, double y0, double X1, double Y1); void drawrectangle (point P0, Point P1); void drawarc (int x, int y, int R); + void drawarc (Double X, double y, double R); void drawarc (point P, int R); Private: Class impl; Boost: scoped_ptr <impl> impl ;};
Then add forward to the implementation file, which does not affect binary compatibility, because adding non-virtual functions does not affect the existing executable files.
--- Old/graphics. CC 15:15:20. 000000000 + 0800 ++ new/graphics. CC 15:15:26. 000000000 + 0800 @-+ @ # include <graphics. h> class graphics: impl {public: void drawline (INT x0, int y0, int X1, int Y1); + void drawline (double x0, double y0, double X1, double Y1); void drawline (point P0, Point P1); void drawrectangle (INT x0, int y0, int X1, int Y1); + void drawrectangle (double X0, double y0, double X1, double Y1); void drawrectangle (point P0, Point P1); void drawarc (int x, int y, int R ); + void drawarc (Double X, Double Y, Double R); void drawarc (point P, int R) ;}; graphics: Graphics (): impl (New impl) {} graphics ::~ Graphics () {} void graphics: drawline (INT x0, int y0, int X1, int Y1) {impl-> drawline (x0, y0, X1, Y1 );} + void graphics: drawline (double x0, double y0, double X1, double Y1) + {+ impl-> drawline (x0, y0, X1, Y1 ); +} + void graphics: drawline (point P0, Point P1) {impl-> drawline (P0, P1 );}
The adoption of pimpl adds a forward procedure. The advantage is the scalability and binary compatibility, which is usually cost-effective. Pimpl plays the role of the compiler firewall.
Pimpl can be used not only in C ++, but also in C libraries. It also brings binary compatibility benefits. For example, struct event_base in libevent2 is an opaque pointer, and the client cannot see its members, libevent functions are used to deal with it, so that the library version upgrade is easier to achieve binary compatibility.
Why are non-virtual functions more robust than virtual functions? Because the virtual function is bind-by-vtable-offset, and the non-virtual function is bind-by-name. The loader will make a resolution when the program starts, and link the executable file with the dynamic library through the mangled name. Just like using an Internet domain name is better suited to changes than using an IP address.
What should I do in case of Cross-language access? It's easy to expose C-language interfaces. Java has JNI which can call C-language code. Python, Perl, Ruby, and other interpreters are all written in C language, and the use of C functions is nothing more. C functions are omnipotent interfaces, and C language is the greatest system.Programming Language.
This article only talks about using class as an interface. In fact, using free function is sometimes better (for example
Muduo/base/timestamp. h not only defines class timestamp, but also defines muduo: timedifference () and other free functions. This is also a better place for C ++ than Java and other pure object-oriented languages. Let's talk about it in the future.
References
[1] Scott Meyers, Objective C ++ version 3rd,Clause 35:Consider other options than virtual functions; Clause 23:Replace the member function with non-Member and non-friend..
[2] herb Suter and Andrei Alexandrescu, C ++ programming specifications,Clause 39:Consider making virtual functions non-public and making public functions non-virtual.; Cla43:Use pimpl wisely; Cla44:Write the nonmember and nonfriend functions as much as possible.; Clause 57:Put the class and its non-member function interfaces into the same namespace.
[3] Meng Yan, "Salvation of function/BIND (I)", and "four semi-abstractions" in "reply to several questions ".
[4] Chen Shuo, "replacing virtual functions with boost: function and boost: bind", and "simple C ++ design".
Works are licensed using the knowledge sharing signature-non-commercial use-share 3.0 unported license agreement in the same way.