Chen Shuo (giantchen_AT_gmail)
Blog.csdn.net/Solstice
This article mainly discusses the Linux x86/x86-64 platform, and occasionally uses Windows as a negative teaching material.
The binary compatibility of C/C ++ has multiple meanings. This article mainly discusses whether executable files are affected by the upgrade of header files and library files, I call it the ABI (application binary interface) of library (mainly shared library, that is, dynamic link library ). The ABI of the compiler and the operating system is reserved for the next article on C ++ standards and practices.
What is binary compatibility?
Before interpreting this definition, let's take a look at a historical problem in Unix/C: the value of the flags parameter of open. The prototype of the open (2) function is
Int open (const char * pathname, int flags );
Among them, flags has three values: O_RDONLY, O_WRONLY, and O_RDWR.
Contrary to the intuition of ordinary people, these values are not in bitwise OR (bitwise-OR) relationship, that is, O_RDONLY | O_WRONLY! = O_RDWR. To open a file in read/write mode, you must use O_RDWR instead of O_RDONLY | O_WRONLY ). Why? Because the values of O_RDONLY, O_WRONLY, and O_RDWR are 0, 1, and 2 respectively. They are less than full or.
So why has C language not corrected this deficiency since its birth? For example, O_RDONLY, O_WRONLY, and O_RDWR are defined as 1, 2, and 3 respectively, so that O_RDONLY | O_WRONLY = O_RDWR is intuitive. In addition, these three values are macro definitions, and you do not need to modify the existing source code. You only need to modify the system header file.
This will compromise the binary compatibility. For compiled executable files, the parameters that call open (2) are written to death. Changing the header file does not affect the compiled executable files. For example, the executable file will call open (path, 1) to write the file. In the new regulation, this indicates that the program will be confused when the file is read.
The preceding example shows that if the function library is provided in the shared library mode, the header file and library file cannot be easily modified; otherwise, the existing binary executable files are easily damaged, or other libraries that use this shared library. The system call of the operating system can be seen as the interface between the Kernel and the User space. In this sense, the kernel can also be used as the shared library. You can upgrade the Kernel from 2.6.30 to 2.6.35, instead of re-compiling all user-state programs.
The so-called "binary compatibility" refers to the upgrade (or bug fix) of the library file, you do not have to re-compile the executable files of the database or other library files of the database. The functions of the program are not damaged.
See related terms of qt faq: http://developer.qt.nokia.com/faq/answer/you_frequently_say_that_you_cannot_add_this_or_that_feature_because_it_woul
In Windows, DLL Hell is an evil name. For example, there are a bunch of DLL in MFC, mfc40.dll, mfc42.dll, mfc71.dll, mfc80.dll, and mfc90.dll. This is an essential issue of the dynamic link library and cannot be blamed on MFC.
Which situations will the ABI of the database be damaged?
How can we determine whether a change is Binary compatible? This is directly related to the implementation method of C ++. Although the C ++ standard does not specify the ABI of C ++, almost all mainstream platforms have clear or de facto ABI standards. For example, ARM has EABI, Intel Itanium has http://www.codesourcery.com/public/cxx-abi/abi.html,x86-64 has imitation Itanium ABI, both of them have clearly specified ABI, and so on. X86 is an exception. It only has the actual ABI. For example, Windows is Visual C ++, Linux is G ++ (G ++'s ABI has multiple versions, currently, the latest version is G ++ 3.4. Intel's C ++ compiler must also generate code according to Visual C ++ or G ++ ABI, otherwise, it cannot be compatible with other parts of the system.
Main content of C ++ ABI:
Function parameters, for example, the x86-64 uses registers to pass the first four Integer Parameters of the Function
The call method of the virtual function, usually vptr/vtbl, is then called using vtbl [offset]
Memory layout of struct and class, access data members through offset
Name mangling
RTTI and Exception Handling implementation (this article does not consider Exception Handling below)
C/C ++ exposes the usage of the dynamic library through the header file. This "usage method" is mainly for the compiler, And the compiler generates Binary Code accordingly, then, the executable files and dynamic libraries are bound together through the loader during running. How to determine whether a change is Binary compatible depends on whether the "usage instructions" exposed by the header file can be compatible with the actual usage of the new dynamic library. Because the new library must have a new header file, but the existing binary executable file still calls the dynamic library according to the old header file.
Here are some examples of source code compatibility But binary code incompatibility.
Add a default parameter to the function. This parameter cannot be passed to an existing executable file.
Adding a virtual function will change the arrangement in vtbl. (Do not consider "Add at the end" as your class may have been inherited .)
Add the default template type parameter. For example, if Foo is changed to Foo>, this will change the name mangling.
Change the enum Color {Red = 3}; to Red = 4. This will cause dislocation. Of course, because enum automatically arranges values, it is not safe to add enum items unless they are added at the end.
Adding data members to the class Bar makes sizeof (Bar) larger and changes to the offset of internal data members. Is this safe? It is usually not safe, but there are exceptions.
If the customer code contains a new Bar, it is certainly not safe because the number of bytes of the new Bar is not enough. Conversely, if the library returns Bar * through factory (and destroys objects through factory) or directly returns shared_ptr, the client does not need sizeof (Bar), it may be safe.
If the customer Code contains Bar * pBar; pBar-> memberA = xx;, it is certainly not safe, because the offset of the new Bar of memberA may change. On the contrary, if only the member functions are used to access the data members of the object, the client does not need to use the offsets of data member, which may be safe.
If the customer calls pBar-> setMemberA (xx) and Bar: setMemberA () is an inline function, it is definitely not safe, because the offset has been included in the client's binary code by inline. If setMemberA () is an outline function, its implementation is in the shared library and will be updated with Bar updates, it may be safe.
Is it safe to only use the header-only library file? Not necessarily. If your program uses boost 1.36.0 and the library you depend on is 1.33.1 during compilation, your program and the library will not work normally. Because the boost: function templates of 1.36.0 and 1.33.1 have different parameter types, one of which has allocator.
Here there is a blacklist, the column here is certainly not compatible with the level-2 system, not listed may also be binary incompatible, see the KDE documentation: http://techbase.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B
Which of the following are mostly safe?
As I mentioned earlier, "It cannot be modified easily" implies that some changes are mostly safe. Here we have a whitelist. You are welcome to add more content.
As long as the library changes do not affect the correctness of the binary code of the existing executable files, it is safe. We can deploy a new library to benefit the existing binary program.
Add new class
Add a non-virtual member function
Modify the name of a data member because the generated binary code is accessed by offset. Of course, this will cause source code incompatibility.
There are many more. I will not list them one by one.
Please add
Negative textbook: COM
In C ++, the use of virtual functions as an interface is basically the same as binary compatibility. Specifically, the class containing only virtual functions (called interface class) is used as the interface of the library. Such an interface is stiff and cannot be modified once released.
For example, for M $ COM, both DirectX and MSXML are released as COM components. Let's take a look at its versioned interfaces ):
IDirect3D7, IDirect3D8, IDirect3D9, ID3D10 *, ID3D11 *
IXMLDOMDocument, IXMLDOMDocument2, IXMLDOMDocument3
In other words, every time a new version is released, a new interface class is introduced, instead of being expanded on the existing interface. In this way, the existing Code cannot be compatible, and the client code must be rewritten.
Let's look at the C language. C/Posix has gradually added many new functions over the years. At the same time, the existing code can run well without modification. If you want to use these new functions, you can use them directly without modifying the existing code. On the contrary, to use the IXMLDOMDocument3 function in COM, You have to upgrade all the existing code from IXMLDOMDocument to IXMLDOMDocument3, which is ironic.
Tip: If you want to use API-oriented programming in C ++, you can test binary compatibility.
Solution
Use static links
This is King. In the distributed system, the use of static links also brings about the advantages of deployment. As long as the executable file is put on the machine, it can run without considering the libraries it depends on. Currently, muduo uses static links.
Control Compatibility through version management of dynamic libraries
This requires you to carefully check the binary compatibility of each change and make a release plan. For example, 1.0.x is Binary compatible, 1.1.x is Binary compatible, whereas 1.0.x and 1.1.x are binary incompatible. As mentioned in "Programmer self-cultivation", the naming of the so file and the binary compatibility are worth reading.
Use the pimpl technique to compile the Firewall
Only the non-virtual Interface is exposed in the header file, and the class size is fixed to sizeof (Impl *). In this way, you can update the library file without affecting the executable file. Of course, there is an additional indirect property to do so, which may cause some performance loss. See Exceptional C ++ related terms and C ++ Coding Standards 101.
How does Java respond?
Java actually delays the linking step of C/C ++ until class loading. There are no problems such as "cannot add virtual functions" and "cannot modify data member. In Java, interface-oriented programming is far more common and natural than C ++, and there is no "stiff interface" mentioned above.