Virtual inheritance of diamond structure
This time we will look at the virtual inheritance of the diamond structure. The introduction of virtual inheritance aims to solve the problem of the inheritance system of complex structures. In the previous article, we used a simple inheritance structure when discussing virtual inheritance, just to pave the way.
Let's take a look at these classes. This is a typical diamond inheritance structure. C100 and c101 share the same parent class c041 through virtual inheritance. C110 is inherited from C100 and c101.
Struct c041
{
C041 (): C _ (0x01 ){}
Virtual void Foo () {C _ = 0x02 ;}
Char C _;
};
Struct C100: Public Virtual c041
{
C100 (): C _ (0x02 ){}
Char C _;
};
Struct c101: Public Virtual c041
{
C101 (): C _ (0x03 ){}
Char C _;
};
Struct c110: Public C100, public c101
{
C110 (): C _ (0x04 ){}
Char C _;
};
Run the following code:
Print_size_detail (c110)
Result:
The size of c110 is 16
The detail of c110 is 28 C3 45 00 02 1C C3 45 00 03 04 18 C3 45 00 01
We can plot the memory layout of an object as in the previous article.
| C100, 5 | c101, 5 | c110, 1 | c041, 5 |
| Ospt, 4,11 | M, 1 | ospt, 4,6 | M, 1 | M, 1 | vtpt, 4 | M1 |
(Note: I used the abbreviation to keep the rows straight. Ospt indicates the offset value pointer, M indicates the member variable, and vtpt indicates the virtual table pointer. The first number is the size of the region, that is, the number of bytes. Only the offset value pointer has a second number, and the second number is the size of the Offset Value pointed to by the offset value pointer .)
We can see that there is only one c041 in the memory layout of the object, that is, there is only one part of the grandfather class, and it is placed at the end. This is Diamond inheritance. Compared with the previous discussions, we can know that if the virtual Inheritance Mechanism is not used, two c041 parts will appear in the memory layout of the c041 object, which is also called the V-type inheritance. The corresponding object layout is c041 + C100 + c041 + c101 + c110. In V-type inheritance, you cannot directly transform from c110, that is, the grandson class, to c041, that is, the grandfather class. Because there are two grandfathers in the object layout, one from C100 and the other from c101. The compiler has ambiguity in the resolution. It does not know which entity is used after the transformation. Although it can be solved by first transforming to a parent class and then to a grandfather class. However, when this method is used, if the content of the member variable of the grandfather class is rewritten, runtime will not synchronize the States of the two grandfather class entities, so there may be semantic errors.
Let's analyze the memory layout above. General inheritance layout, top-level class in front. Multiple inheritance is arranged from left to right. The inheritance from C100 and c101 to c110 is normal inheritance, so following this principle, first the left parent class and then the right parent class, and then the Child class. Virtual inheritance requires that the shared parent class be placed at the end of the entire object layout (even if the virtual parent class is not truly shared, this is the case for the c020 class in the previous article. I don't know if the optimization switch will change .) Therefore, the grandfather class in the above example is placed at the end.
Let's look at the access to members. Run the following code and view the corresponding assembly code.
C110 c110;
C110.c _ = 0x51;
C110.c100: C _ = 0x52;
C110.c101: C _ = 0x52;
C110.c041: C _ = 0x53;
C110.foo ();
The corresponding assembly code is:
01 00423993 Push 1
02 00423995 Lea ECx, [EBP + fffff7f0h]
03 0042399b call 0041de60
04 004239a0 mov byte PTR [EBP + fffff7fah], 51 H
05 004239a7 mov byte PTR [EBP + fffff7f4h], 52 h
06 004239ae mov byte PTR [EBP + fffff7f9h], 52 h
07 004239b5 mov eax, dword ptr [EBP + fffff7f0h]
08 004239bb mov ECx, dword ptr [eax + 4]
09 004239be mov byte PTR [EBP + ECx + fffff7f4h], 53 H
10 004239c6 mov eax, dword ptr [EBP + fffff7f0h]
11 004239cc mov ECx, dword ptr [eax + 4]
12 004239cf Lea ECx, [EBP + ECx + fffff7f0h]
13 004239d6 call 0041df32
The first three rows are object initialization and the object constructor is called. Rows 4, 5, and 6 assign values to member variables of child classes and left and right parent classes. We can see that it is written directly, because the inheritance of this layer is normal inheritance. Rows 7th, 8, and 9 are values assigned to the grandfather class member variables. As discussed in the previous article, they are indirectly accessed through the offset value pointed to by the offset value pointer. The last four lines of commands call member functions. We can see that the called function address is provided directly (the last line), because we call it through an object, and even virtual function calls do not have polymorphism. However, the method for getting the this pointer is indirect, that is, rows 10th, 11, and 12. Because this function is defined in the grandfather class, the data members it operates on should be grandfather class. Therefore, the compiler must adjust the position of this pointer. The grandfather class is inherited by virtual means, so it must be adjusted by the offset value pointed to by the offset value pointer.
Then, let's look at rows 9th and 12th. We can see that the calculated address values are different. This is because the 9th behavior assigns values to the member variables of the grandfather class, and the grandfather class has a virtual table pointer. Therefore, after obtaining the starting address of the object, the compiler adds a 4-byte offset to skip the virtual pointer. The actual calculation of the obtained address is: [EBP + ECx + fffff7f0h + 4 H]. The compiler will directly perform the last operation when generating the code.
Virtual inheritance of diamond structure (2)
Let's look at another example. The Inheritance structure of this example is the same as that in the previous article. It is also a diamond structure. The difference is that each class overrides the virtual function declared by the top-level class. The Code is as follows:
Struct c041
{
C041 (): C _ (0x01 ){}
Virtual void Foo () {C _ = 0x02 ;}
Char C _;
};
Struct c140: Public Virtual c041
{
C140 (): C _ (0x02 ){}
Virtual void Foo () {C _ = 0x11 ;}
Char C _;
};
Struct c141: Public Virtual c041
{
C141 (): C _ (0x03 ){}
Virtual void Foo () {C _ = 0x12 ;}
Char C _;
};
Struct c150: Public c140, public c141
{
C150 (): C _ (0x04 ){}
Virtual void Foo () {C _ = 0x21 ;}
Char C _;
};
First, run the following code to check their memory layout.
Print_size_detail (c041)
Print_size_detail (c140)
Print_size_detail (c141)
Print_size_detail (c150)
Result:
The size of c041 is 5
The detail of c041 is F0 C2 45 00 01
The size of c140 is 14
The detail of c140 is 48 C3 45 00 00 00 00 00 44 C3 45 00 01
The size of c141 is 14
The detail of c141 is 58 C3 45 00 00 00 00 00 54 C3 45 00 01
The size of c150 is 20
The detail of c150 is 74 C3 45 00 02 68 C3 45 00 03 04 00 00 00 64 C3 45 00 01
Different from the previous layout, the sharing part and the previous non-sharing part have a 4-byte value of 0. Only the shared part has a virtual table pointer. This is because the derived classes do not define their own virtual functions, but only rewrite the virtual functions of the top-level classes. Let's analyze the c150 object layout.
| C140, 5 | c141, 5 | c150, 1 | zero, 4 | c041, 5 |
| Ospt, 4,15 | M, 1 | ospt, 4,10 | M, 1 | M, 1 | 4 | vtpt, 4 | M1 |
(Note: I used the abbreviation to keep the rows straight. Ospt indicates the offset value pointer, M indicates the member variable, and vtpt indicates the virtual table pointer. The first number is the size of the region, that is, the number of bytes. Only the offset value pointer has a second number, and the second number is the size of the Offset Value pointed to by the offset value pointer .)
Let's look at the function call:
C150 OBJ;
Print_obj_adr (OBJ)
OBJ. Foo ();
The output object address is:
OBJ's address is: 0012f624
The assembly code corresponding to the code for the last function call is:
00423f74 Lea ECx, [EBP + fffff757h]
00423f7a call 0041da3
After one step, we can see that the value of ECx is 0x0012f633, which is the starting address of the grandfather class part in the OBJ object layout. Through the above layout analysis, we know that the starting offset value pointer of c150 points to 15, that is, the offset value from the starting point of the object to the shared part (grandfather class part. After the start address of the OBJ output above is 0x0012f624 plus 15 in decimal format, it is exactly the value 0x0012f633 in ECx.
Since function calls act on objects, we can see that the call command of the second line is directed to the address.
The confusing problem here is that we know that ECx is used to pass the this pointer. In the previous article, we analyzed the foo method calls on the c110 object. In that example, because foo is a virtual function defined in the top-level class and is not overwritten by the following derived class, when you call this method through a subclass object, the code generated by the compiler is to calculate the starting address of the grandfather class by pointing to the offset value of the starting offset pointer of the subclass, and use this address as the address pointed to by this pointer. But in the c150 class, foo is no longer inherited from the grandfather class, but overwritten by the quilt class. In this case, the this pointer should point to the starting address of the subclass, that is, 0x0012f62e, instead of the value 0x0012f633 in ECx.
Let's take a look at the compilation code of c150: Foo () and see how it locates the member variables of the subclass by pointing to the this pointer of the grandfather class.
01 0020.c00 push EBP
02 00366c01 mov EBP, ESP
03 00426c03 sub ESP, 0cch
04 00426c09 push EBX
05 00426c0a push ESI
06 00426c0b push EDI
07 00366c0c push ECx
08 00366c0d Lea EDI, [EBP + ffff34h]
09 00347c13 mov ECx, 33 H
10 0036618mov eax, 0 cccccccch
11 00366c1d rep STOs dword ptr [EDI]
12 001_c1f pop ECx
13 001_c20 mov dword ptr [ebp-8], ECx
14 001_c23 mov eax, dword ptr [ebp-8]
15 0020.c26 mov byte PTR [eax-5], 21 h
16 00426c2a pop EDI
17 00426c2b pop ESI
18 0020. C2C pop EBX
19 0020.c2d mov ESP, EBP
20 0020.c2f pop EBP
21 00426c30 RET
Sure enough, because the Pointer Points to not the starting part of the Child class (but the starting part of the grandfather class) at this time, because the address of the member variable located forward is reduced by an offset value. Pay attention to the 15th rows, then eax stores the value of this pointer, write the value address is [eax-5], combined with the previous object layout and Object Memory output, we can know the value of this pointer (pointing to the starting part of the grandfather class c041 at this time) minus 5 bytes (4 bytes of 0 value and 1 byte of member variable value, the starting address of the subclass c150.
Why does it indirectly locate the Child class through the parent class address instead of the Child class address? This involves implementing restrictions within compilation and a comprehensive understanding of a system problem. It is difficult to find the answer by analyzing the phenomenon.
We call it again through pointers.
C150 * PT = & OBJ;
Pt-> Foo ();
The Assembly command corresponding to the second line of code is:
01 00423f8b mov eax, dword ptr [EBP + fffff73ch]
02 00423f91 mov ECx, dword ptr [eax]
03 00423f93 mov edX, dword ptr [ECx + 4]
04 00423f96 mov eax, dword ptr [EBP + fffff73ch]
05 00423f9c mov ECx, dword ptr [eax]
06 00423f9e mov eax, dword ptr [EBP + fffff73ch]
07 00423fa4 add eax, dword ptr [ECx + 4]
08 00423fa7 mov ECx, dword ptr [EBP + fffff73ch]
09 00423fad mov edX, dword ptr [ECx + EDX]
10 00423fb0 mov ESI, ESP
11 00423fb2 mov ECx, eax
12 00423fb4 call dword ptr [edX]
13 00423fb6 cmp esi, ESP
14 00423fb8 call 0041ddf2
Oh! It's even worse. This code is very inefficient and contains many obvious redundant commands, such as lines 1st, 4, 6, 2, and 5. If the optimization switch is enabled, this command may be much more efficient.
Row 9th obtains the function address through the virtual table pointer of the grandfather class. Row 11th also saves the starting address 0x0012f633 of the grandfather class as the address pointed to by this pointer to ECx.
Finally, let's make a dynamic transformation of the pointer and call it again:
C141 * pt1 = dynamic_cast <c141 *> (PT );
Pt1-> Foo ();
The Assembly command corresponding to the Code in line 1 is as follows:
01 00423fbd cmp dword ptr [EBP + fffff73ch], 0
02 00423fc4 je 00423fd7
03 00423fc6 mov eax, dword ptr [EBP + fffff73ch]
04 00423fcc add eax, 5
05 00423fcf mov dword ptr [EBP + fffff014h], eax
06 00423fd5 JMP 00423fe1
07 00423fd7 mov dword ptr [EBP + fffff014h], 0
08 00423fe1 mov ECx, dword ptr [EBP + fffff014h]
09 00423fe7 mov dword ptr [EBP + fffff730h], ECx
Here, we actually made a judgment on whether the PT is zero. The 4th commands shifted the address pointed to by PT to 5 bytes, and finally assigned it to pt1. In this way, pt1 points to the address location of the right parent class, that is, the starting position of c141.
The Assembly command corresponding to the 2nd line of code is:
01 00423fed mov eax, dword ptr [EBP + fffff730h]
02 00423ff3 mov ECx, dword ptr [eax]
03 00423ff5 mov edX, dword ptr [ECx + 4]
04 00423ff8 mov eax, dword ptr [EBP + fffff730h]
05 00423ffe mov ECx, dword ptr [eax]
06 00424000 mov eax, dword ptr [EBP + fffff730h]
07 00424006 add eax, dword ptr [ECx + 4]
08 00424009 mov ECx, dword ptr [EBP + fffff730h]
09 00000000f mov edX, dword ptr [ECx + EDX]
10 00424012 mov ESI, ESP
11 00424014 mov ECx, eax
12 00424016 call dword ptr [edX]
13 00424018 cmp esi, ESP
14 0042401a call 0041ddf2
Because the operation is performed by the offset value pointer, the value of ECx and EDX is the same as that of the previous call through the PT pointer, which is also a correct polymorphism.
Virtual inheritance of diamond structure (3)
Finally, let's take a look at what happens if the subclass and the left and right parent Classes define their own virtual functions based on the previous example.
Struct c140: Public Virtual c041
{
C140 (): C _ (0x02 ){}
Virtual void Foo () {C _ = 0x11 ;}
Char C _;
};
Struct C160: Public Virtual c041
{
C160 (): C _ (0x02 ){}
Virtual void Foo () {C _ = 0x12 ;}
Virtual void f160 () {C _ = 0x12 ;}
Char C _;
};
Struct c161: Public Virtual c041
{
C161 (): C _ (0x03 ){}
Virtual void Foo () {C _ = 0x13 ;}
Virtual void f161 () {C _ = 0x13 ;}
Char C _;
};
Struct c170: Public C160, public c161
{
C170 (): C _ (0x04 ){}
Virtual void Foo () {C _ = 0x14 ;}
Virtual void f170 () {C _ = 0x14 ;}
Char C _;
};
First, run the following code to check the memory layout.
Print_size_detail (c041)
Print_size_detail (C160)
Print_size_detail (c161)
Print_size_detail (c170)
Result:
The size of c041 is 5
The detail of c041 is F0 B2 45 00 01
The size of C160 is 18
The detail of C160 is 84 B3 45 00 88 B3 45 00 02 00 00 00 80 B3 45 00 01
The size of c161 is 18
The detail of c161 is 98 B3 45 00 9C B3 45 00 00 03 00 00 00 94 B3 45 00 01
The size of c170 is 28
The detail of c170 is B0 B3 45 00 C8 B3 45 00 02 AC B3 45 00 BC B3 45 00 03 04 00 00 00 00 A8 B3 45 00 01
The c170 object layout is:
| C160, 9 | c161, 9 | c170, 1 | zero, 4 | c041, 5 |
| VP, 4 | op, 4, 19 | M, 1 | VP, 4 | op, 4, 10 | M, 1 | M, 1 | VP, 4 | M1 |
(Note: I used the abbreviation to keep the rows straight. OP indicates the offset value pointer, M indicates the member variable, and VP indicates the virtual table pointer. The first number is the size of the region, that is, the number of bytes. Only the offset value pointer has a second number, and the second number is the size of the Offset Value pointed to by the offset value pointer .)
Since the left and right parent Classes define their own new virtual functions, they all have their own virtual table pointers. The strange thing is that although the subclass also defines its own new virtual function, we can see in the above layout that it does not have its own virtual table pointer. It should share a virtual table with the top-level class or a parent class. We can trace the call to find the answer later.
Another strange thing is that the offset value pointer pointing to in the left and right parent classes is not the offset of the grandfather class, but the offset of the 4-byte 0 value before the grandfather class. At the same time, we mentioned in the eighth article above that the first four bytes of the address pointed by the offset value pointer are zero, and the next four bytes are the real offset. In this example, the first four bytes are no longer 0, but 0 xfffffffc, that is, integer-4.
As shown in the following example, we will call it through an object.
C170 OBJ;
Print_obj_adr (OBJ );
OBJ. Foo ();
Result:
OBJ's address is: 0012f54c
The Assembly command corresponding to the last line of call is:
003665b8 Lea ECx, [EBP + fffff687h]
003665be call 0041d122
The value in ECx (that is, the value of this pointer) is 0x0012f563, which is the starting part pointing to the grandfather class like above. Similarly, the commands in the function locate the correct member variable address by this-5 bytes. The Assembly commands of the function are not listed here.
Let's look at calling its own new virtual function.
OBJ. f170 ();
The corresponding Assembly command is:
003665c3 Lea ECx, [EBP + fffff670h]
00100005c9 call 0041d127
To my surprise, the value of this pointer is 0x0012f54c. It is the same as the previous object address output, that is, pointing to the starting position of the entire object. This is very strange. The this pointer passed by the compiler for the two virtual functions called on the same object is different.
Let's follow the function to see how it gets the correct member variable address.
01 00366f80 push EBP
02 00366f81 mov EBP, ESP
03 00366f83 sub ESP, 0cch
04 00366f89 push EBX
05 0020.f8a push ESI
06 00366f8b push EDI
07 0020.f8c push ECx
08 00366f8d Lea EDI, [EBP + ffff34h]
09 0020.f93 mov ECx, 33 H
10 00366f98 mov eax, 0 cccccccch
11 00366f9d rep STOs dword ptr [EDI]
12 0020.f9f pop ECx
13 0020.fa0 mov dword ptr [ebp-8], ECx
14 0020.fa3 mov eax, dword ptr [ebp-8]
15 0020.fa6 mov byte PTR [eax + 12 h], 14 h
16 0020.faa pop EDI
17 00w.fab pop ESI
18 0020.fac pop EBX
19 0020.fad mov ESP, EBP
20 0020.faf pop EBP
21 00366fb0 RET
You can see from row 3 that 18 bytes (in hexadecimal 12 h) are directly added to this pointer to locate the member variable of the subclass.
Because the commands in the function locate the member variables of the subclass in this way, even if we call them through pointers, the difference is how to locate the function address, the value of this pointer will certainly not change. Let's verify it.
C170 * PT = & OBJ;
Pt-> f170 ();
The Assembly command corresponding to the second line of code is as follows:
01 003475da mov eax, dword ptr [EBP + fffff664h]
02 003475e0 mov edX, dword ptr [eax]
03 00100005e2 mov ESI, ESP
04 003475e4 mov ECx, dword ptr [EBP + fffff664h]
05 0020.5ea call dword ptr [edX + 4]
06 00100005ed cmp esi, ESP
07 003475ef call 0041ddf2
In the first row, the starting address of the entire object is put in eax. In the second row, eax is used as the pointer and the indicated address is put in EDX. The starting address of the object is also the virtual table pointer in the left parent class. When calling row 5th, it is to move the address pointed by EDX to 4 bytes and then take the value as the function address. This answers the previous question. The subclass does not have a virtual table, and its virtual table is actually merged into the virtual table of the Left parent class. The left parent class defines its own virtual function, the first entry in the virtual function table is occupied, and the virtual function of the subclass occupies the second entry. Therefore, 4 bytes must be added for addressing. The this pointer value in ECx is the starting address of the entire object, just as we previously estimated.
Finally, let's look at how to get the grandfather Class address.
Pt-> c041: C _ = 0x33;
The corresponding Assembly command is:
01 003475f4 mov eax, dword ptr [EBP + fffff664h]
02 003475fa mov ECx, dword ptr [eax + 4]
03 00100005fd mov edX, dword ptr [ECx + 4]
04 00424600 mov eax, dword ptr [EBP + fffff664h]
05 00424606 mov byte PTR [eax + EDX + 8], 33 H
First, assign the starting address of the object to eax. Row 3 assigns the address pointed to by the pointer obtained after eax + 4 bytes to ECx, which is the address pointed to by the offset value pointer. Sure enough, the second row took the value after adding 4 bytes and then assigned it to EDX. At this time, the value of edX is 13 H. It should be an offset to the grandfather class area, but it is actually only the 4-byte 0 value listed in the object layout, that is, the first four bytes of the real grandfather class start address. We mentioned this issue before discussing the object layout of c170. So we can see that 8 bytes are added when Row 3 locates the member variable to skip the virtual table pointer of the 4-byte 0 value 4-byte grandfather class, instead of adding only 4 bytes to skip the virtual table pointer. In the c150 object, we can see that the offset value directly skips the 4-byte 0 value and locates the grandfather class start address.
We have never clearly explained the four byte 0 values before the grandfather class and the semantics of the offset value pointer pointing to the first 4 byte of the address. It may be because of compatibility, or it may provide some note information for the compiler. Furthermore, the topology of Object Inheritance after virtual inheritance can be much more complex than the diamond structure we have discussed. These two values may also be used to process more complex inheritance structures. It is too difficult to figure out the motives for using them through representations.
(To be continued)