C ++ object model details the memory layout of C ++ objects and the Object Model details
In the C ++ object model description section on the memory layout of C ++ objects, the paper analyzes in detail various member variables and member functions for a class (without any inheritance) the impact of the object's memory distribution, and explains in detail how to traverse the object's memory, including the virtual function table. If you haven't read the C ++ object model overview text about the memory layout of C ++ objects before reading this article, read it first. This article mainly discusses the impact of inheritance on the memory distribution of objects, including: the layout of the inherited Class Object members, the impact of inheritance on the virtual function table, how to implement the virtual function mechanism, and runtime type identification. Because the inheritance relationships in C ++ are complex, this article will discuss the following inheritance situations: 1) single inheritance 2) multiple inheritance 3) repeated inheritance 4) single virtual inheritance 5) diamond virtual inheritance
In addition, when a class acts as a Base class, its destructor should be a virtual function so that the following code can correctly run Base * p = new Derived ;... delete p; in this example, to verify the content of the virtual function table, all functions in the virtual function table are traversed and called. However, when the Destructor is virtual, The destructor of the object will be called during the traversal process, so as to perform the Destructor operation on the object, resulting in the next call error. However, the purpose of this article is to analyze and verify the memory layout of the C ++ object, instead of designing a software. The Destructor are non-virtual functions and will not affect our analysis and understanding, because virtual destructor are the same as other virtual functions, but they do different things. In this example, the Destructor are not virtual.
For the convenience of calling, all virtual function prototypes are: the return value is void, and the parameter is also void.
Note: The test environment in the following example is: 32-bit Ubuntu 14.04g ++ 4.8.2. The test results may be different in different environments.
1. traverse the virtual function table based on the pointer to the virtual function table (vptr). Because the virtual function table must be traversed to determine the content of the virtual function table when accessing the object's memory, abstract The functions in this section and write them as a function as follows:
void visitVtbl(int **vtbl, int count){ cout << vtbl << endl; cout << "\t[-1]: " << (long)vtbl[-1] << endl; typedef void (*FuncPtr)(); for (int i = 0; vtbl[i] && i < count; ++i) { cout << "\t[" << i << "]: " << vtbl[i] << " -> "; FuncPtr func = (FuncPtr)vtbl[i]; func(); }}
Code explanation: the parameter vtbl is the address of the first element in the virtual function table, that is, the value of vptr in the object. The count parameter indicates the number of virtual functions in the virtual function table. Because the information stored in the virtual function table is not all the virtual function addresses, and not all virtual function tables use NULL to indicate that the function addresses in the virtual function table have reached the end. Therefore, this parameter is added to make the test program run better.
The virtual function table stores the pointer of the function. If the virtual function table is treated as an array, a dual pointer is required to point to the array, that is, the int ** vtbl In the parameter, obtain the value of the function pointer, that is, the value of the element in the array, which can be obtained through vtbl [I.
Because the type information of an object is stored in the virtual function table, the index (subscript) of the object is saved in the virtual function table to facilitate searching for the type information of the object) is-1, that is, before the address of the first virtual function. In the following example, we can see that all the virtual function tables of a class have the same index-1 values, in this way, both the pointer to the base class and the derived class can correctly distinguish the type of the object indicated by the pointer based on the virtual function table.
2. The code for a single inheritance class is as follows:
class Base{ public: Base() { mBase1 = 101; mBase2 = 102; } virtual void func1() { cout << "Base::func1()" << endl; } virtual void func2() { cout << "Base::func2()" << endl; } private: int mBase1; int mBase2;};class Derived : public Base{ public: Derived(): Base() { mDerived1 = 1001; mDerived2 = 1002; } virtual void func2() { cout << "Derived::func2()" << endl; } virtual void func3() { cout << "Derived::func3()" << endl; } private: int mDerived1; int mDerived2;};
Use the following code for testing:
int main(){ Derived d; int *p = (int*)&d; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; ++p; cout << *p << endl; ++p; cout << *p << endl; ++p; cout << *p << endl; return 0;}
Code explanation: In the test code, the most difficult to understand is the parameters in the following statements: visitVtbl (int **) * (int **) p, 3 ); the int pointer p points to the vptr in the object. Because vptr is also a pointer, p should be a double pointer, and the value of vptr can be obtained by referencing it (* p. However, in the same system, no matter what type of pointer, the occupied memory size is the same (generally 4 bytes in 32-bit systems and 8 bytes in 64-bit systems ), therefore, you can use the following statement to obtain the vptr value: (int **) * (int **) p;
This statement performs three steps: 1) convert the int pointer p to int **, that is, (int **) p; 2) obtain the vptr value by referencing the runtime "*". The type is int *. In fact, vptr is essentially a dual pointer, but all pointers occupy the same memory, so this operation will not cause the truncation of the address value. That is, * (int **) p; 3) Because vptr is essentially a double pointer, vptr is converted into a double pointer again. That is, (int **) * (int **) p;
Note: In many articles, we can see that the author treats the items in the virtual function table as an integer, but I didn't do this in this article. Because the number of pointer bits in different systems (32-bit or 64-bit) is different, in order to make the code compatible with 32-bit and 64-bit systems, here, the items in the virtual function table are treated as pointers.
If similar code appears in subsequent examples, it will be the same principle and will not be explained.
The running result is as follows:
Based on the test output, the memory layout of the Derived-class object can be obtained as follows:
Based on this, the following conclusions can be drawn for a single inheritance: 1) vptr is at the front end of the object. 2) non-static member variables are placed behind the vptr according to their inheritance sequence and declaration sequence. 3) The derived class inherits the virtual function declared by the base class, that is, the virtual function address of the base class is copied to the corresponding item in the virtual function table of the derived class. 4) the newly added virtual function in the derived class follows the inherited virtual function. In this example, the virtual function func3 added to the subclass is added to func2. 5) if the subclass overrides the virtual function of its parent class, the corresponding item in the virtual function table of the subclass will be updated to the address of the new function. In this example, if the subclass overrides the virtual function func2, The func2 item in the virtual function table is updated to the func2 address of the function rewritten by the subclass.
3. The code for multiple inheritance classes is as follows:
class Base1{ public: Base1() { mBase1 = 101; } virtual void funcA() { cout << "Base1::funcA()" << endl; } virtual void funcB() { cout << "Base1::funcB()" << endl; } private: int mBase1;};class Base2{ public: Base2() { mBase2 = 102; } virtual void funcA() { cout << "Base2::funcA()" << endl; } virtual void funcC() { cout << "Base2::funcC()" << endl; } private: int mBase2;};class Derived : public Base1, public Base2{ public: Derived(): Base1(), Base2() { mDerived = 1001; } virtual void funcD() { cout << "Derived::funcD()" << endl; } virtual void funcA() { cout << "Derived::funcA()" << endl; } private: int mDerived;};
Use the following code for testing:
int main(){ Derived d; int *p = (int*)&d; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; ++p; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; ++p; cout << *p << endl; return 0;}
The running result is as follows:
Based on the test output, the memory layout of the Derived-class object can be obtained as follows:
Based on this, the following conclusions can be drawn for multiple inheritance:
1) under multi-inheritance, a subclass has n-1 extra virtual function tables, and n indicates the number of base classes on the previous layer. That is to say, under multi-inheritance, a derived class will have n virtual function tables. One is the primary instance, which is shared with the first base class (such as Base1 in this example), the other is a secondary instance, and is related to other base classes (such as Base2 in this example. 2) Place the virtual function declared by the subclass in the virtual function table of the primary instance. In this example, the virtual function table shared with Base1 is declared by the subclass. 3) The sub-objects of each parent class are kept unchanged in the subclass object and arranged in the declared order in sequence. 4) if the subclass overrides the virtual function, all virtual functions with the same signature in the parent class will be rewritten. In this example, if the subclass overrides the funcA function, the funcA function items in the two virtual function tables are updated to the address of the function rewritten by the subclass. In this way, the actual function can be called to resolve the pointer of different parent classes pointing to the same subclass instance.
4. Repeated inheritance the so-called repeated inheritance means that a parent class is indirectly inherited multiple times.
The code of the class is as follows:
class Base{ public: Base() { mBase = 11; } virtual void funcA() { cout << "Base::funcA()" << endl; } virtual void funcX() { cout << "Base::funcX()" << endl; } protected: int mBase;};class Base1 : public Base{ public: Base1(): Base() { mBase1 = 101; } virtual void funcA() { cout << "Base1::funcA()" << endl; } virtual void funcB() { cout << "Base1::funcB()" << endl; } private: int mBase1;};class Base2 : public Base{ public: Base2(): Base() { mBase2 = 102; } virtual void funcA() { cout << "Base2::funcA()" << endl; } virtual void funcC() { cout << "Base2::funcC()" << endl; } private: int mBase2;};class Derived : public Base1, public Base2{ public: Derived(): Base1(), Base2() { mDerived = 1001; } virtual void funcD() { cout << "Derived::funcD()" << endl; } virtual void funcA() { cout << "Derived::funcA()" << endl; } private: int mDerived;};
Use the following code for testing:
int main(){ Derived d; int *p = (int*)&d; visitVtbl((int**)*(int**)p, 4); ++p; cout << *p << endl; ++p; cout << *p << endl; ++p; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; ++p; cout << *p << endl; ++p; cout << *p << endl; return 0;}
The running result is as follows:
Based on the test output, the memory layout of the Derived-class object can be obtained as follows:
Based on this, the following conclusions can be drawn for repeated inheritance:
1) after repeated inheritance, the parent class Base at the top of the hierarchy inherits the subcategory Base1 and Base2 respectively and is inherited by the class Derived. Therefore, the Class Object in D contains the sub-object of Base1 and the sub-object of Base2. Both sub-objects have the Base sub-object, so the Base Sub-object (member mBase) there are two copies in Derived. 2) Reasons for ambiguity. Because there are two parent class members in the subclass object, the following statement is used in the Derived class: mBase = 1; will produce ambiguity. Because there are two variables in this object named mBase, the compiler cannot determine which member variable to use. Therefore, when accessing members in the Base, you need to add the domain operator to specify the member of which subclass, for example, Base1: mBase = 1;
Repeated inheritance may not be what we want. C ++ provides virtual inheritance to solve this problem. The following describes virtual inheritance in detail.
5. The specific code of a single virtual inheritance is as follows (the implementation of the class is the same as the code in the repeated inheritance, but the inheritance relationship of Base1 is changed to virtual inheritance ):
class Base { ...... }; class Base1 : virtual public Base { ...... };
Use the following code for testing:
int main(){ Base1 b1; int *p = (int*)&b1; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; ++p; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; return 0;}
The running result is as follows:
Based on the test output, we can obtain the memory layout of the B1-class object as follows:
Compared with the normal single inheritance, we can know that the memory layout of a single virtual inheritance is obviously different from that of a single inherited object. It is manifested in the following aspects: 1) Order of members. In normal single inheritance, the base class member is located before the member of the derived class. In a single virtual inheritance, it is first a member of its common base class, then a member of the derived class, and finally a member of the virtual base class. 2) Number of vptr instances. In normal single inheritance, a derived class has only one virtual function table, so its object has only one vptr. In a single virtual inheritance, the virtual function tables of the derived classes have n (n is the number of virtual base classes) Additional virtual function tables, that is, there are always n + 1 virtual function tables. 3) the virtual function table derived from the derived class of the virtual base class does not contain virtual functions in the virtual base class, but the virtual function overwritten by the derived class will be updated in all the virtual function tables. In this example, the first virtual function table does not contain the Base: funcX function address.
Note: In the test code, I pass the value of count to 3, but only two functions are called in the result. Obviously, the count parameter does not limit the traversal of the virtual function table.
If a class contains one or more virtual base class sub-objects, such as Base1, it is divided into two parts: one unchanged area and one shared area. Data in the unchanged area always has a fixed offset (starting from the beginning of the object) regardless of the subsequent changes. Therefore, this part can be directly accessed. The shared area corresponds to the virtual base sub-object.
6. Diamond virtual inheritance
The Code is as follows (the implementation of the class is the same as the code in the repeated inheritance, but the inheritance relationship between Base1 and Base2 is changed to virtual inheritance ):
class Base { ...... }; class Base1 : virtual public Base { ...... };class Base2 : virtual public Base { ...... };class Derived : public Base1, public Base2 { ...... };
Use the following code to test the object memory layout:
int main(){ Derived d; int *p = (int*)&d; visitVtbl((int**)*(int**)p, 3); ++p; cout << *p << endl; ++p; visitVtbl((int**)*(int**)p, 2); ++p; cout << *p << endl; ++p; cout << *p << endl; ++p; visitVtbl((int**)*(int**)p, 2); ++p; cout << *p << endl; return 0;}
The running result is as follows:
Based on the test output, the memory layout of the Derived-class object can be obtained as follows:
After virtual inheritance is used, only one Base sub-object exists in the object of the derived class, thus avoiding ambiguity. Due to multi-inheritance and a Base class, the Derived class has three virtual function tables, and its objects have three vptr. As shown in, the first virtual function table is the primary instance shared with the first base class (Base1) due to multiple inheritance, and the second virtual function table is shared with other base classes (Base2) for secondary instances, the third is the virtual function table of the virtual base class.
The members of the Derived class are arranged in the same order as the members in Base1. First, they are arranged in the declared order as the members of their common base class, then the members of the Derived class, and finally the members of the virtual base class.
The virtual function table derived from the derived class of the virtual base class does not contain virtual functions in the virtual base class. The virtual function rewritten by the derived class is updated in all the virtual function tables.
In an object of the Derived class, the Base (virtual Base class) Sub-object part is the shared area, and the other part is the unchanged area.
7. Description of virtual destructor in the above example, we have not defined a virtual destructor to make the test program run normally, however, this does not mean that it is not discussed in this article.
If the base class declares a virtual destructor, The destructor of its derived class will update the Destructor items in all its virtual function tables, update the function address in this item to the Destructor address of the derived class. Because when the destructor of the base class is virtual, if the user does not provide a destructor, the compiler will automatically synthesize one, therefore, if the base class declares a virtual destructor, a virtual destructor must exist in the derived class, And the virutal destructor is used to update the virtual function table.
8. type information in C ++, you can use the keyword typeid to obtain the type information of an object. For example, the following code: Base * p ;...... cout <typeid (* p ). name () <endl;
Since p is a pointer, it can point to a Base object. If it is a Base derived class, how do we know the type of object p refers?
By observing the output of examples in section 2-6, we can find that no matter how many virtual functions a class has, the value of the item whose subscript is-1 (that is, the value of the type information, is an integer) are equal. Therefore, you can obtain its actual information by pointing to the base class pointer of the derived class or the pointer of the derived class to obtain its type information. For example, for the following test code (the relationship and implementation of classes are diamond-type virtual inheritance in section 5th ):
int main(){ Derived d; Base *basePtr = &d; Base1 *base1Ptr = &d; Base2 *base2Ptr = &d; Derived *derivedPtr = &d; cout << typeid(*basePtr).name() << endl; cout << typeid(*base1Ptr).name() << endl; cout << typeid(*base2Ptr).name() << endl; cout << typeid(*derivedPtr).name() << endl; return 0;}
The output result is as follows:
From the above operation, we can see that the object of a derived class can correctly obtain the actual type of the object referred to by typeid regardless of its pointer to any base class.
Running result explanation: To understand the running result, you need to understand what kind of behavior will happen when you assign a pointer to a derived class object to its base class pointer. When a base class pointer is used to point to an object of A derived class, the compiler inserts the corresponding code to adjust the pointer pointing, point the base class pointer to the starting position of the base class sub-object corresponding to the derived class object.
Therefore, through the pointer assignment in the test code, the following result is generated: basePtr points to the starting point of the Base Sub-object address in Object d, that is, to Base :: vptrbase1Ptr points to the starting point of the Base1 sub-object address in Object d, that is, to Base1: vptr
Base2Ptr points to the starting point of the Base2 sub-object address in Object d, that is, to Base2: vptr
DerivedPtr points to the starting point of the Object d address, that is, to Base1: vptr, that is, these pointers now point to the corresponding type of sub-objects, and all of them include a vptr, therefore, you can obtain the type information through the information of item-1 in the virtual function table, which is the same and indicates that the object is a Derived-like object, therefore, the type information can be correctly output.
9. We know the principle of virtual function calling. In C ++, using a pointer or reference to an object can trigger the call of a virtual function and generate a multi-state result. For example, for the following code snippet: Base * p;... p-> vfunc (); // vfunc is the virtual function declared in Base.
Since the pointer p can point to a Base object or a Base derived class object, the compiler does not know what the real object p points to during compilation, so how can we determine?
We can see from the memory distribution of various C ++ objects that although the virtual function address in the virtual function table may be updated (the derived class overrides the virtual function of the base class) or add a new item (the derived class declares the new virtual function), but the index value of a virtual function with the same signature remains unchanged in the virtual function table. Therefore, no matter whether p points to the Base object or the Base derived class object, the index of its virtual function vfunc in the virtual function table is unchanged (both 1 ).
After learning about the memory layout of the C ++ object, you can easily answer this question. During compilation, the compiler does not need to determine what the specific object p points to. Instead, it calls the function based on the virtual function table in the Base sub-object of the object pointed to by p. The compiler may change the code called by the virtual function to the following pseudo code: (* p-> vptr [1]) (p ); // assume that the index value of the vfunc function in the virtual function table is 1, and the parameter p is the this pointer.
If p points to a Base object, call the function with the index value of 1 in the Base virtual function table. If p points to the object of a Base-derived class, it calls the function with the index value of 1 in the Base sub-object virtual function table of the Base-derived class Object of the Base. In this way, polymorphism is achieved. This function is implemented based on the virtual function table of the object referred to by the pointer p. during compilation, the actual object referred to by the pointer p cannot be determined, therefore, you cannot determine which function to call. It is dynamically determined by the object indicated by pointer p at runtime. Therefore, virtual functions are dynamically bound at runtime, rather than static binding at compilation.