Analysis of virtual function calls in C ++ and the internal layout of objects (using assembler to deeply understand the underlying implementation mechanism of C ++ virtual functions)

Source: Internet
Author: User

In my article "This pointer in C ++", I analyzed the implementation method of this pointer by analyzing the compilation code generated after C ++ code compilation. This time, I still analyze the compilation code generated after C ++ code compilation to illustrate the implementation of virtual function calls in C ++. By the way, I also explain the internal layout of objects in C ++. All the Assembly codes below are compiled using vc2005. Although different compilers may compile different results, the internal layout of objects is also different. However, as long as the compiler meets the C ++ standard, the compilation result is similar to the internal layout of the object.
First, there are two classes with simple inheritance relationships:

Class cbase
{
Public:
Virtual void vfun1 () = 0;
Virtual void vfun2 () = 0;
Void fun1 ();
};

// This is only used to generate the assembly code of the function, so the function body is empty.
Void cbase: fun1 ()
{
}

Class cderived: Public cbase
{
Public:
Virtual void vfun1 ();
Virtual void vfun2 ();
Void fun2 ();
PRIVATE:
Int m_ivalue1;
Int m_ivalue2;
};

// This is only used to generate the assembly code of the function, so the function body is empty.
Void cderived: vfun1 ()
{
}

// This is only used to generate the assembly code of the function, so the function body is empty.
Void cderived: vfun2 ()
{
}

// This is to analyze the internal layout of the object, so it is only to assign values to the member variables
Void cderived: fun2 ()
{
M_ivalue1 = 13;
M_ivalue2 = 13;
}

Use the following code to call a member function:

Cderived derived;

// Call a virtual function with an object
Derived. vfun1 ();
Derived. vfun2 ();
// Call non-virtual functions with objects
Derived. fun1 ();
Derived. fun2 ();

// Call a virtual function with a pointer to the base class of the derived class to realize Polymorphism
Cbase * Ptest = & derived;
Ptest-> vfun1 ();
Ptest-> vfun2 ();

The following is the compilation code generated after the above Code is compiled using vc2005:

Cderived derived;
0041195e Lea ECx, [derived]
00411961 call cderived: cderived (411177 H)

// Code segment 1
Derived. vfun1 ();
00411966 Lea ECx, [derived]
00411969 call cderived: vfun1 (411078 H)
Derived. vfun2 ();
0041196e Lea ECx, [derived]
00411971 call cderived: vfun2 (4111b8h)
Derived. fun1 ();
00411976 Lea ECx, [derived]
00411979 call cbase: fun1 (411249 H)
Derived. fun2 ();
0041197e Lea ECx, [derived]
00411981 call cderived: fun2 (4111bdh)

// Code segment 2
Cbase * Ptest = & derived;
00411986 Lea eax, [derived]
00411989 mov dword ptr [Ptest], eax
Ptest-> vfun1 ();
0041198c mov eax, dword ptr [Ptest] // Row 1
0041198f mov edX, dword ptr [eax] // Row 2
00411991 mov ESI, ESP
00411993 mov ECx, dword ptr [Ptest]
00411996 mov eax, dword ptr [edX] // Row 3
00411998 call eax // Line 4
0041199a cmp esi, ESP
0041199c call @ ILT + 495 (_ rtc_checkesp) (4111f4h)
Ptest-> vfun2 ();
004119a1 mov eax, dword ptr [Ptest]
004119a4 mov edX, dword ptr [eax]
004119a6 mov ESI, ESP
004119a8 mov ECx, dword ptr [Ptest]
004119ab mov eax, dword ptr [edX + 4] // Row 5
004119ae call eax
004119b0 cmp esi, ESP
004119b2 call @ ILT + 495 (_ rtc_checkesp) (4111f4h)

By observing code segment 1, we can find that: the virtual member functions of the object call class are the same as those of the non-virtual member functions. (for the analysis of the assembly code that calls the member functions, see my article "This pointer in C ++". ). That is to say, using objects cannot achieve polymorphism.
The following section mainly analyzes the code segment 2 that implements polymorphism.
Line 1: Put the content of the first two words (4 bytes, that is, the size of a pointer in a 32-bit System) pointed by the Ptest pointer to the eax register as a pointer.
Row 2: Put the pointer value in the eax register into the edx register.
Row 3: place the pointer value in the DEX Register into the eax register.
Row 4: Call the function pointed to by the eax register
This analysis does not seem very clear about how to call the virtual function vfun1 () of the object derived. Let's take a look at the figure below:

This figure shows the internal layout of a hypothetical object derived in the memory. The pointer Ptest points to the object derived, and the first four bytes of the object derived are a virtual table pointer pointing to the virtual function table.
Looking at this figure, the above assembly code will be much clearer:
Row 1: Get the virtual table pointer value and put it into the eax register.
Row 2: Get the value of the virtual table pointer and put it in the edX register.
Row 3: Get the address value (vfun1) pointed to by the virtual table pointer and put it in the eax register.
Row 4: Call the function pointed to by the eax register
Row 5 proves the assumptions in the preceding figure on the virtual function table. The address of the second virtual function vfun2 () is obtained by adding 4 (the size of a pointer in a 32-bit system) to the address of the first virtual function vfun1.
Through the above analysis, we can obtain the calling method of the virtual function in C ++: first, obtain the virtual table pointer in the object; then, find the corresponding virtual table through the virtual table pointer; finally, you can call the function by finding the offset in the virtual table.
The following uses the non-virtual member function fun2 () of the analysis class cderived to prove the existence of the virtual function table pointer in the above figure.

 

Void cderived: fun2 ()
{
004118f0 push EBP
004118f1 mov EBP, ESP
004118f3 sub ESP, 0cch
004118f9 push EBX
004118fa push ESI
004118fb push EDI
004118fc push ECx
004118fd Lea EDI, [ebp-0CCh]
00411903 mov ECx, 33 H
00411908 mov eax, 0 cccccccch
0041190d rep STOs dword ptr es: [EDI]
0041190f pop ECx
00411910 mov dword ptr [ebp-8], ECx
M_ivalue1 = 13;
00411913 mov eax, dword ptr [this] // Row 6
00411916 mov dword ptr [eax + 4], 0dh // Row 7
M_ivalue2 = 13;
004111_mov eax, dword ptr [this]
00411920 mov dword ptr [eax + 8], 0dh
}
00411927 pop EDI
00411928 pop ESI
00411929 pop EBX
0041192a mov ESP, EBP
0041192c pop EBP
0041192d RET

 

The above is the compilation code of the non-virtual member function fun2 () of the class cderived. As you can see, Row 6 places the address pointed to by this into the eax register, while Row 7 assigns values to the address pointed to by this pointer plus the address 4 (specific analysis, see this pointer in C ++). The address contains the first member variable of the cderived class. We know that this pointer points to the first address of the object. Why do we need to move four bytes backward when assigning values to the first member variable? The answer is that the first four bytes of an object are used to store virtual table pointers.
The following code is the c ++ code of the class without virtual functions and the compiled assembly code in the analysis of this pointer in C ++:

 

Class ctest
{
Public:
Void setvalue ();

PRIVATE:
Int m_ivalue1;
Int m_ivalue2;
};

Void ctest: setvalue ()
{
M_ivalue1 = 13;
M_ivalue2 = 13;
}

Void ctest: setvalue ()
{
004117e0 push EBP
004117e1 mov EBP, ESP
004117e3 sub ESP, 0cch
004117e9 push EBX
004117ea push ESI
004117eb push EDI
004117ec push ECx
004117ed Lea EDI, [ebp-0CCh]
004117f3 mov ECx, 33 H
004117f8 mov eax, 0 cccccccch
004117fd rep STOs dword ptr es: [EDI]
004117ff pop ECx
00411800 mov dword ptr [ebp-8], ECx
M_ivalue1 = 13;
00411803 mov eax, dword ptr [this] // row 8
00411806 mov dword ptr [eax], 0dh // row 9
M_ivalue2 = 13;
0041180c mov eax, dword ptr [this]
0041180f mov dword ptr [eax + 4], 0dh
}
00411816 pop EDI
00411817 pop ESI
00411818 pop EBX
00411819 mov ESP, EBP
0041181b pop EBP
0041181c RET

 

By comparing row 8, row 9, Row 6, and row 7, we can see that the first four bytes of the ctest-like object are their first member variables; the cderived-like object stores its first member variable starting from 5th bytes. Its first four bytes are used to store virtual table pointers. This proves the correctness of the internal layout of the objects in the preceding figure.

PS:

This article is a sequel to the analysis of this pointer in C ++. Finally, I will explain why I used this method to analyze C ++, it is also a reply to the comments from netizens in the article "This pointer in C ++.
Dch4890164 suggested that I check the inside C ++ object model, while hacker47 said: "Kong Yiji said: There are three writing methods for returning words, do you know ?"; The most direct one is wengch, which directly asks me: "Is it meaningful to use assembler to analyze C ++ ?". What I want to say is that I have read the book inside the C ++ object model. It is indeed a very good book to explain the underlying C ++. However, since the underlying implementation is seldom concerned when writing C ++ code, I was not very impressed after reading this book. The use of assembly code to analyze C ++ is also due to a very accidental event: it is mentioned in the article "This pointer in C ++" that a class NULL pointer can be used to call member functions. I found that my c ++ knowledge could not explain that phenomenon. When I debug the code, I switched to the assembly code to find the answer. Later, I wrote my analysis result as the "this pointer in C ++". To be honest, this is the first time I have been familiar with the Assembly Language in Windows. The analysis in this article is based on the data. Some people may think that this method is not worth mentioning, but I have learned more about the underlying implementation of C ++ through this method. If the netizens think they have gains, I will be satisfied. Haha ~~

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.