C ++ virtual function table and object Layout

Source: Internet
Author: User

[Reprint]

Each class containing virtual functions has a virtual function table (vtbl). Each item in the table points to the address of a virtual function and is implemented as an array of function pointers.

The virtual function table has both inheritance and polymorphism. The vtbl of each derived class inherits the vtbl of each base class. If the base class vtbl contains one item, the vtbl of the derived class also contains the same item, however, the values of the two items may be different. If the derived class override loads the corresponding virtual function, the value of vtbl In the derived class points to the overloaded virtual function. If it is not overloaded, the base class value is used.

In the memory layout of class objects, the vtbl pointer of the class is first followed by the object data. When a virtual function is called through an object pointer, the code generated by the compiler first obtains the vtbl pointer of the object class, and then calls the corresponding item in vtbl. When an object pointer is called, you cannot determine whether the pointer points to a base class object, a derived class object, or a derived class object during compilation. However, when the call statement is executed during running, it is determined that the compiled call code can obtain the correct vtbl based on the specific object and call the correct virtual function to realize polymorphism. The essence of the problem is that the object pointer that calls a virtual function lacks more information during compilation, however, there is enough information during the operation, but it is no longer bound at that time. How can we make a transition between the two? Record the information required for binding with a common data structure, which can be associated with object pointers. during compilation, you only need to use this data structure for abstract binding, during the running, you will get a real binding. The data structure is vtbl. As you can see, the abstraction and polymorphism required for implementing users need to be bound later, while the compiler implements post-binding through abstraction and polymorphism.

Next we will talk about multi-inheritance. If two base classes with multiple inheritance inherit the same class, the derived class is equivalent to inheriting the class twice, and vtbl inherits the same class twice. In the object layout, the data of this class has two copies, and the vtbl pointer has two, pointing to the inherited vtbl two times respectively. However, when a derived class reloads a virtual function of this class, it can only be reloaded once. Which of the following is the reloaded function address occupied by vtbl? Through program writing testing, I think it should appear in the corresponding positions of the two inherited vtbl at the same time and need further verification.

Speaking of the virtual function mechanism, the type conversion of object pointers should also be clarified. There is another question about this pointer. When calling a virtual function, you also need to pass the this pointer. This is not surprising, but the this pointer implies a problem. It must be consistent with the actually called virtual function, that is, the this pointer must also implement polymorphism. In the case of multiple inheritance, this problem is not that simple. For details, refer to [design and evolution of c ++ language P203].

------------------------------

Deep Analysis of virtual function tables --

 

I felt very good after hearing the C ++ lecture from Mr. Peng yesterday, but I left a question about the mechanism of the virtual function table. It seems that I have not been able to discuss it with Mr. Peng in my class.

My questions are as follows:
1: How does a virtual function table work? For Class or for Object
2: If the for class is used, the base class and the derived class share one table or each table has its own (physically)
3: If a table is shared, it will always overwrite the previous function address. Isn't that easy to confuse?

With these three questions, I searched for the dasm article on the virtual function table. Of course, I found several articles for VC compiler.
Preliminary conclusion:
1: Virtual tables (virtual function tables) are for class
2: The base class and the derived class have their own tables, that is, their physical addresses are separated. The unique association between the base class and the virtual table of the derived class is: when the derived class is not implemented

When a base-class virtual function is overloaded, the derived class directly writes the function address value in its own table to the base-class function address value.
3: any class with a virtual table cannot be empty in its virtual table during instantiation-> the object cannot be initialized in the pure virtual class.
4: the class with virtual table points a pointer to the virtual table address of the class in the object constructor. Here I name it VP;
5: only for the VC and BC compilers, if the class has a virtual table, the first address of the class object is the virtual table address, which is also the this pointer pointing to the virtual table

Here I will use the IDE Borland C ++ Builder 6.0 SP4 and the compiler version Borland C ++ 5.5 to verify it:

First open bcb6 to create a console program and write the following Backup classes
# Include <conio. h>
# Include <stdio. h>
# Pragma hdrstop
# Pragma argsused

Class
{
Public:
_ Stdcall ()
{
}
Virtual void _ stdcall output ()
{
Printf ("class ");
}
Virtual void _ stdcall output2 ()
{
}
};

Class B: public
{
Public:
Void _ stdcall output ()
{
Printf ("class bn ");
}
};

Class C: public
{
Public:
Void _ stdcall output ()
{
Printf ("class CN ");
}
};

Several Classes are simple. B and C are derived from.

Next, we will first write a child master program to verify the existence of the virtual table:
Int main (INT argc, char * argv [])
{

B;
Printf ("% d", sizeof (B ));
}
The result is 8.

I remove the two VMS of Class A and run them again.
Result 4

This indicates that the virtual object has 32 more bits than the non-virtual object. In Win32, the 32 bits are exactly an address, so this address should point to the virtual table.

It seems that the virtual table actually exists. When will the virtual table pointer be generated? Let me change the main function.
Int main (INT argc, char * argv [])
{
A * pA;
B;
C;
A;
Pa = & B;
Pa-> output ();
Getch ();
Return 0;
}
This should be an example of polymorphism in a classic textbook. If there is a virtual output class B, if there is no virtual output class

Now let's take a look at the decompilation code of this Code. I open the full debug mode of bcb6 and set a breakpoint at B ;.
Image

We can see that after B executes the base class constructor
MoV edX, 0x0040c114
MoV [ebp-0x0x], EDX
These two sentences have been verified, but they do not exist when there is no virtual keyword. let's remember the address 0x0040c114 first.
[Ebp-0x0x] is the this pointer, we currently guess this section is to write the address of the virtual table into this pointer

Let's look at the decompilation code after C.
MoV eax, 0x0040c0f8
MoV [ebp-0x14], eax
It seems that different classes have different virtual table addresses, that is, tables of different classes are physically different.

Let's discuss the working principle of the virtual table.
Let's compare the difference between Pa-> output () when there is no virtual modifier.
MoV eax, [ebp-0x04]
Push eax
MoV edX, [eax]
Call dword ptr [edX]
This is virtual.

Push dword ptr [ebp-0x04]
Call a: output ();
This is not virtual

We can analyze the ASM code to obtain the virtual table process. First, we can obtain the virtual table address based on this address, and then access the function pointer address stored in the virtual table item.

The corresponding function. If there are multiple virtual functions that call the nth virtual function, the above call command will be changed to the form of call dword ptr.

[Edx-4 * (N-1)])

First, we make some speculation about the dasm code, and we will further verify these

We carefully read the decompilation results and found that in the dasm results of a;, it seems that there is no VP initialization step. I checked other documents for the dasm knot of VC Compiler

It is found that the dasm result of the VC compiler has a step to initialize the VP, similar
004010e8 mov dword ptr [eax], offset derive: 'vftable' (0042201c)

Now I come to the conclusion that in the BC compiler, it is very likely that the object constructor of the base class has made such optimization, that is, by default, this pointer is directed

The virtual table address, so we cannot see such dasm results

I also found that for class constructor processing, VC and BC compilers are also different.
If we do not write constructor in the class, VC will automatically add a constructor for us, such
Class base {
Public:
Void _ stdcall output (){
Printf ("class basen ");
}
};
We get the dasm as follows:
004010d9 pop ECx
004010da mov dword ptr [ebp-4], ECx
004010dd mov ECx, dword ptr [ebp-4]
004010e0 call @ ILT + 30 (base: Base) (00401023)
The automatically generated constructor address is displayed.

But in BC, we don't see such code
After we delete the constructor in Class A above, this is the dasm of Class;
MoV edX, 0x0040c0f0
MoV [ebp-0x04], ECx
No constructor shadow is found. I guess this is also the optimization made by the compiler for constructor.

I do not evaluate the two compilers for this problem. I continue to go back to the question and verify the correctness of our conclusion.
According to our estimation, 0x0040c114 is the virtual table address.

In this case, we can access the output function by accessing the first function address in the content of the virtual table address, and the virtual table address is the this address.

? I made up another main function.

Int main (INT argc, char * argv [])
{
A * pA;
B;
C;
A;
// Pa = & B;
// Pa-> output ();
// Printf ("% d", sizeof (B ));

Typedef void (_ stdcall * PF) (void );
Void * pthis = & B;
PF pF = (PF) (* (unsigned int *) pthis );
Printf ("% x", Pf );
Printf ("N ");
PF = (PF) (* (unsigned int *) PF );
PF ();

Getch ();
Return 0;
}

Let's explain the code.
Typedef void (_ stdcall * PF) (void );
Declared function pointer with output
Void * pthis = & B;
Used to obtain the this address of B, which is directed to the virtual table address
PF pF = (PF) (* (unsigned int *) pthis );
Used to obtain the content of this address, that is, the virtual table address.
Then we output the virtual table address.
PF = (PF) (* (unsigned int *) PF );
Used to obtain the content of the first item in the virtual table, that is, the output address (Table 1 Project address = TABLE address)
PF (); call a function

Let's look at the results.

Success !!!
Although we didn't write output () in the Code, the execution result is the output result.
In addition, the output virtual table address is 0x0040c114, which is the earliest imaginary virtual table address !!!

I will change the code. According to our speculation, if the first address of the table is offset by 32 bits, it should be the second address of the table, and the second item should be

Verify the output2 address:

Typedef void (_ stdcall * PF) (void );
Void * pthis = & B;
PF pF = (PF) (* (unsigned int *) pthis );
Printf ("% x", Pf );
Printf ("N ");
PF = (PF) (* (unsigned int *) pf-0x04 ));
PF ();

As expected, the output is Class A output2.

At this point, we should be clear about the virtual table mechanism. Each class has its own virtual table, and each object generated by each class points this to the virtual table address of the class, respectively, if this class does not

If there is a virtual function that loads the base class, the item in the virtual table will be written as the content of the item in the base class. when calling the virtual table, the appropriate offset will be made based on the virtual table address to get

The corresponding virtual function address before calling.

After analyzing this, I will modify the virtual table address and apply the hook to the virtual table to continue the analysis.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.