This article mainly explains the issue of virtual inherited Object Memory Distribution in the G ++ compiler, the essential differences between dynamic_cast and static_cast, and the format of virtual function tables are also introduced. Most C ++ programmers are similar. The problem is very well handled. below is my translation of the original article. For the original article, see here (by edsko de Vries, January 2006 ).
Http://www.alidata.org/archives/878
This article is a technical article about C ++. It is assumed that the reader has a deep understanding of C ++ and requires some compilation knowledge.
This article will explain the GCC compiler's memory layout for objects under multi-inheritance and virtual inheritance. In an ideal environment, a C ++ programmer does not need to understand the internal implementation details of these compilers. In fact, the compiler targets multiple inheritance (especially virtual inheritance) the various implementation details have more or less impact on the compilation of C ++ code (such as downcasting pointer, pointers to pointers, and the call sequence of the virtual base class constructor ). If you can understand how multi-inheritance is implemented, you can anticipate these effects and be able to better cope with them in your code. Furthermore, it is helpful to understand the virtual inheritance correctly if you are very concerned about the code running efficiency. Finally, the hack process is very interesting :)
Multi-Inheritance
First, let's consider a simple multi-inheritance (non-virtual. Let's take a look at the C ++ class hierarchy below.
1 class top
2 {
3 public:
4 int;
5 };
6
7 class left: Public top
8 {
9 public:
10 int B;
11 };
12
13 class right: Public top
14 {
15 public:
16 int C;
17 };
18
19 class bottom: Public left, public right
20 {
21 public:
22 int D;
23 };
24
Use UML to describe the following:
Note that the top class is actually inherited twice (this mechanism is called repeated inheritance in Eiffel), which means that in a bottom object, there are actually two attributes of a (attributes, you can use bottom. left: A and bottom. right: A access ).
How are left, right, and bottom distributed in the memory? Let's take a look at the simple left and right memory distribution:
[The layout of the right class is the same as that of the left class, so I won't draw any more here.]
Note that the first attribute of each class is inherited from the top class, which means the following two values are assigned:
1 left * Left = New Left ();
2 top * Top = left;
Left and top actually point to two identical addresses. we can regard the left object as a top object (or use the right object as the top object ). But what about the botom object?
The GCC process is as follows:
But what will happen if we use a bottom pointer in upcast?
1 bottom * Bottom = new bottom ();
2 left * Left = bottom;
This code runs correctly. This is because the memory layout selected by GCC allows us to regard the bottom object as the left object, which is exactly the same as the left object. However, what if we convert the bottom Object Pointer upcast to the right object?
1 right * Right = bottom;
If we want this code to work properly, we need to adjust the pointer to the corresponding part of the bottom.
Through adjustment, we can use the right pointer to access the bottom object. In this case, the bottom object is like the right object. However, the bottom and right pointers point to different memory addresses. Finally, let's consider:
1 top * Top = bottom;
Well, there is no result. This statement is actually ambiguous. the compiler will report the error: 'top' is an ambiguous base of 'bottom '. In fact, the two possibilities with ambiguity can be distinguished using the following statements:
1 top * topl = (left *) bottom;
2 top * topr = (right *) bottom;
After these two values are executed, the topl and left pointers point to the same address, and the topr and right pointers also point to the same address.
Virtual inheritance
To avoid multiple inheritance of the top class, we must virtualize the top class.
1 class top
2 {
3 public:
4 int;
5 };
6
7 class left: virtual public top
8 {
9 public:
10 int B;
11 };
12
13 class right: virtual public top
14 {
15 public:
16 int C;
17 };
18
19 class bottom: Public left, public right
20 {
21 public:
22 int D;
23 };
24
The above code will generate the following class Hierarchy Diagram (in fact, this may be exactly the inheritance method you first wanted ).
For programmers, this type of hierarchy chart is simpler and clearer, but it is much more complicated for a compiler. We use the bottom memory layout as an example to consider it. It may be like this:
The advantage of this memory layout is that its first part (left part) is exactly the same as the left layout. We can easily access a bottom object through a left pointer. However, let's consider right:
1 right * Right = bottom;
What address should we assign to the right pointer here? Theoretically, with this value assignment statement, we can use this right pointer as a pointer to a right object (currently directed to bottom. But in fact this is unrealistic! The memory layout of a real right object is completely different from that of the right part of the bottom object, so we can no longer use this upcasted bottom object as a real right object. Moreover, there is no room for improvement in our layout design. Here we will first look at how the memory is actually distributed, and then explain why it was designed.
There are two points worth your attention. The first point is that the distribution order of members in the class is completely different (in fact, the opposite is true ). Second, the vptr pointer is added to the class, which is inserted into the class by the compiler during compilation (if virtual inheritance is used during class design, virtual functions will generate related vptr ). At the same time, related pointers will be initialized in the class constructor, which is also the work completed by the compiler. The vptr Pointer Points to a "virtual table ". Each virtual base class in the class has a vptr pointer corresponding to it. To show you the role of virtual table, consider the following code.
1 bottom * Bottom = new bottom ();
2 left * Left = bottom;
3 int P = left->;
The second value assignment statement points the left pointer to the same starting address as the bottom (that is, it points to the "TOP" of the bottom object "). Let's consider the third assignment statement. The following is the compilation result:
1 movl left, % eax # % eax = left
2 movl (% eax), % eax # % eax = left. vptr. Left
3 movl (% eax), % eax # % eax = virtual base offset
4 addl left, % eax # % eax = left + virtual base offset
5 movl (% eax), % eax # % eax = left.
6 movl % eax, P # P = left.
In summary, we use the left pointer to index (locate) the virtual table, and then obtain the virtual base offset (vbase offset) in the virtual table ), then add this offset to the left pointer so that we can get the starting address of the top class in the bottom class. From this, we can see that for the left pointer, its virtual base offset is 20. If we assume that each member in the bottom is 4 bytes in size, then the left pointer plus 20 bytes is exactly the address of member.
We can also access the right part of bottom in the same way.
1 bottom * Bottom = new bottom ();
2 right * Right = bottom;
3 int P = right->;
The right Pointer Points to the corresponding position in the bottom object.
Here, the value assignment statement for P is eventually compiled to access a in the same way as left. The only difference is vptr. The vptr we access now points to another address of virtual table, and the virtual base offset we get is also changed to 12. Let's draw a summary:
Of course, the key point is that we want to access a single right object just like a bottom object that passes through upcasted (to the right object. We also introduce vptrs in the right object.
Okay. Now, this design allows us to access the bottom object through a right pointer. However, it should be noted that the above design has to bear a considerable cost: we need to introduce a virtual function table, and the underlying object must also be extended to support one or more virtual function pointers, A simple member access now requires two indirect addressing through the virtual function table (compiler optimization can reduce performance loss to a certain extent ).
Downcasting
As we suppose, the conversion from a pointer from a derived class to a base class involves adding an offset to the pointer. Some may guess that downcasting is just a pointer minus some offsets. In fact, this is true in the case of non-virtual inheritance. However, for virtual inheritance, it is necessary to introduce other complex problems. Here we add some inheritance relationships in the above example:
1 class anotherbottom: Public left, public right
2 {
3 public:
4 int e;
5 Int F;
6 };
Shows the inheritance relationship:
Now consider the following code:
1 bottom * bottom1 = new bottom ();
2 anotherbottom * bottom2 = new anotherbottom ();
3 top * top1 = bottom1;
4 top * top2 = bottom2;
5 left * Left = static_cast (top1 );
The following figure shows the memory layout of bottom and anotherbottom, as well as the locations pointed to by their top pointers.
Now let's consider the static_cast from top1 to left. It is not clear whether the top1 Pointer Points to the bottom or anotherbottom object. It cannot be compiled at all! Because the offset to be adjusted during top1 runtime cannot be confirmed at all (20 for bottom and 24 for anotherbottom ). Therefore, the compiler will make an error: Error: cannot convert from base 'top' to derived type 'left'via virtual base'top '. Here we need to know the runtime information, so we need to use dynamic_cast:
1 left * Left = dynamic_cast (top1 );
However, the compiler still reports an error: cannot dynamic_cast 'top' (of Type 'class top * ') to type 'class left *' (source type is not polymorphic ). The key issue is that using dynamic_cast (the same as using typeid) requires the runtime information of the object indicated by the pointer. However, looking back at the above structure, we will find that the top1 pointer only refers to an integer member. The compiler does not include the top vptr in the bottom class, which is considered unnecessary. To force the compiler to include top vptr in the bottom, we can add a virtual destructor in the top class.
1 class top
2 {
3 public:
4 virtual ~ Top (){}
5 int;
6 };
This forces the compiler to add a vptr to the top class. Next let's take a look at bottom's new memory layout:
Yes. A vptr. Top will be added to other derived classes (left and right). The Compiler generates a library function call for dynamic_cast.
1 left = _ dynamic_cast (top1, typeinfo_for_top, typeinfo_for_left,-1 );
_ Dynamic_cast is defined in libstdc ++ (the header file is cxxabi. h). With Top, left, and bottom types, the conversion can be executed. Here, parameter-1 indicates that the relationship between class left and class top is unknown. For more information, see the implementation of tinfo. CC.
Summary
Finally, let's talk about some related content.
Level 2 pointer
The problem is hard to understand at first, but it is obvious to think about it in detail. Here we will consider a problem, or the class inheritance structure diagram in the downcasting section above as an example.
1 bottom * B = new bottom ();
2 right * r = B;
(When the value of pointer B is assigned to pointer R, the pointer B is added with 8 bytes so that the pointer R points to the right part of the bottom object ). Therefore, we can assign a value of the bottom * type to the right * object. But what about the value assignment between the bottom ** and right ** types?
1 bottom ** BB = & B;
2 right ** RR = BB;
Can the compiler use these two statements? In fact, the compiler reports the following error: Error: Invalid conversion from 'bottom ** 'to 'right **'
Why? Think about it. If you can assign BB values to RR, as shown in. Therefore, both the BB and RR pointers point to B, and both B and R point to the corresponding part of the bottom object. Now consider what will happen if * RR is assigned.
1 * RR = B;
Note that * RR is a pointer of the right * type (level 1), so this assignment is valid!
This is the same as assigning a value to the r pointer above (* RR is a level-1 right * type pointer, and R is also a level-1 right * pointer ). Therefore, the compiler will assign values to * RR in the same way. In fact, we have to adjust the value of B, add 8 bytes, and assign the value to * RR, but now ** RR actually points to B! For example
Er, if we access the bottom object through RR, we can access the bottom object according to the structure, but what if we use B to access the bottom object, all object references are actually 8 bytes offset-obviously incorrect!
All in all, although * A and * B can be converted by class inheritance relationships, ** A and ** B cannot have such inferences.
Constructor of the virtual base class
The compiler must ensure that all virtual function pointers are correctly initialized. In particular, make sure that the constructors of all the virtual base classes in the class are called only once. If you do not display the calling constructor when writing code, the compiler automatically inserts a piece of constructor calling code. This will lead to some strange results. We also consider the above class inheritance structure, but we need to add the constructor.
1 class top
2 {
3 public:
4 top () {A =-1 ;}
5 top (INT _ A) {A = _ ;}
6 int;
7 };
8
9 class left: Public top
10 {
11 public:
12 left () {B =-2 ;}
13 left (INT _ A, int _ B): Top (_ A) {B = _ B ;}
14 int B;
15 };
16
17 class right: Public top
18 {
19 public:
20 right () {c =-3 ;}
21 right (INT _ A, int _ C): Top (_ A) {c = _ C ;}
22 int C;
23 };
24
25 class bottom: Public left, public right
26 {
27 public:
28 bottom () {d =-4 ;}
29 bottom (INT _ A, int _ B, int _ c, int _ d): Left (_ A, _ B), right (_ A, _ C)
30 {
31 d = _ d;
32}
33 int D;
34 };
35
First, consider the case where the virtual function is not included. What is the output of the following code?
1 bottom (1, 2, 3, 4 );
2 printf ("% d \ n", bottom. left: A, bottom. right: A, bottom. b, bottom. c, bottom. d );
You may guess the result is as follows:
1 1 2 3 4
However, if we consider the case that contains a virtual function, if we derive a subclass from top virtual, we will get the following result:
-1-1 2 3 4
As mentioned at the beginning of this section, the compiler inserts a top default constructor in bottom, and this default constructor is scheduled before other constructor functions, when left starts to call its base class constructor, we find that top has been constructed and initialized, so the corresponding constructor will not be called. If you follow the constructor, we will see
Top: Top ()
Left: Left (1, 2)
Right: Right (1, 3)
Bottom: Bottom (1, 2, 3, 4)
To avoid this situation, we should explicitly call the constructor of the virtual base class.
1 bottom (INT _ A, int _ B, int _ c, int _ d): Top (_ A), left (_ A, _ B), right (_, _ C)
2 {
3 D = _ d;
4}
Conversion to void *
1 dynamic_cast (B );
Finally, we will consider converting a pointer to void *. The compiler will adjust the pointer to the starting address of the object. By querying vtable, this should be easy to implement. Take a look at the above vtable structure, where offset to top is the starting address of the vptr to the object. In addition, you need to use dynamic_cast to view the vtable.
Pointer comparison
Taking the inheritance relationship of the bottom class as an example, will the following code print equal?
1 bottom * B = new bottom ();
2 right * r = B;
3
4 If (r = B)
5 printf ("equal! \ N ");
First, it is clear that the two pointers actually point to different addresses. The R pointer actually offsets 8 bytes from the address pointed by the B pointer. However, these C ++ internal details cannot be told to C ++ programmers. Therefore, when comparing R and B, the C ++ compiler will subtract R 8 bytes and then compare it, therefore, the printed value is "equal ".
References
[1] codesourcery, in particle the C ++ Abi summary, the itanium C ++ Abi (despite the name, this document is referenced in a platform-independent context; in particle, the structure of the vtables is detailed here ). the libstdc ++ Implementation of Dynamic
Casts, as well rtti and name unmangling/demangling, is defined in tinfo. CC.
[2] The libstdc ++ website, in the particle the section on the C ++ standard library API.
[3] C ++: under the hood by Jan gray.
[4] Chapter 9, "multiple inheritance" of thinking in C ++ (Volume 2) by Bruce Eckel. The author has made this book available for download.