Exploring the implementation of C + + virtual function in g++ (dynamic polymorphism) _ Virtual function table anatomy

Source: Internet
Author: User

Exploring the implementation of C + + virtual function in g++

This is my experience in the process of tracing a strange core problem, the background of the company project and the specific conditions removed, only the general knowledge of C + + virtual function Implementation of some of the information recorded here.

Before you begin, forgive me for borrowing a picture to black C + +:

"Invincible" C + +

If you are also writing C + +, be careful ... At the very least, you have to understand: what is g++ writing when you are writing virtual functions?

Let's write an example.

In order to explore the implementation of C + + virtual functions, we first write several classes for testing, the code is as follows:

C++

#include <iostream>using namespace Std;class base1{public:    virtual void F () {        cout << "Base1::f ()" & lt;< Endl;    }}; Class Base2{public:    virtual void g () {        cout << "base2::g ()" << Endl;    }}; Class Derived:public Base1, public base2{public:    virtual void F () {        cout << "Derived::f ()" << endl;< c8/>}    virtual void g () {        cout << "derived::g ()" << Endl;    }    virtual void H () {        cout << "derived::h ()" << Endl;    }; int main (int argc, char *argv[]) {    Derived ins;    Base1 &b1 = ins;    Base2 &b2 = ins;    Derived &d = ins;    B1.f ();    B2.G ();    D.F ();    D.G ();    D.h ();}

Code adopts multiple inheritance, is to more analysis of the implementation of the essence of g++, with UML simple picture inheritance relationship:

Sample Code UML diagram

The output of the code is consistent with the expected, and C + + implements the virtual function override function, and the code output is as follows:

Derived::f () derived::g () derived::f () derived::g () derived::h ()

Start analysis!

The main point of my writing is to try to explain how the g++ compiler implements virtual function overrides and dynamic bindings at the bottom, so I assume you already understand the concepts of the basic virtual function and the concept of virtual function tables (VTBL) and virtual function table pointers (VPTR) and the role they assume in inheriting implementations, if you are not sure of these concepts , it is recommended that you review the relevant knowledge before continuing to read the following analysis, Chenhao's "C + + virtual function Table parsing" series is a good choice.

Through this article, I will try to answer these three questions:

    1. g++ How to implement dynamic binding of virtual functions?

    2. When was the VTBL created? When was the vptr initialized?

    3. In C + + program virtual memory running in Linux, where does vptr and VTBL reside in virtual storage?

First, the first question:

g++ How to implement dynamic binding of virtual functions?

This problem is simple at first glance, we all know through vptr and VTBL realization, then let us look into the question, g++ how to use vptr and VTBL to achieve.

The first step is to export the g++ generated class memory structure using the-fdump-class-hierarchy parameter:

Vtable for base1base1::_ztv5base1:3u entries0 (int (*) (...)) (Int (*) (...))    (& _zti5base1) 8 base1::fclass Base1 size=4 align=4 base size=4 base align=4base1 (0xb6acb438) 0 Nearly-empty Vptr= ((& base1::_ztv5base1) + 8u) Vtable for base2base2::_ztv5base2:3u entries0 (int (*) (...)) (Int (*) (...))    (& _zti5base2) 8 base2::gclass Base2 size=4 align=4 base size=4 base align=4base2 (0xb6acb474) 0 Nearly-empty Vptr= ((& Base2::_ztv5base2) + 8u) Vtable for derivedderived::_ztv7derived:8u entries0 (int (*) (...)) (Int (*) (...)) (& _zti7derived) 8 derived::f12 derived::g16 derived::h20 (int (*) (...)) -0x00000000424 (Int (*) (...)) (& _zti7derived) derived::_zthn4_n7derived1gevclass Derived size=8 align=4 base size=8 base align=4derived (0 xb6b12780) 0 vptr= ((& derived::_ztv7derived) + 8u) Base1 (0xb6acb4b0) 0 nearly-empty primary-for Derived (0xb 6b12780) Base2 (0XB6ACB4EC) 4 Nearly-empty VPtr= ((& derived::_ztv7derived) + 28u) 

If you do not understand these chaotic output, it's OK (can understand better), the above output into a diagram of the form is clear:

Vptr and VTBL

A few of these are particularly noteworthy:

    1. The machine I used to test was a 32-bit machine, all vptr accounted for 4 bytes, and the function pointer in each vtbl was also 4 bytes

    2. The primary (Primal) Vptr of each class is placed in the starting position of the class memory space (because I do not declare any member variables, I may not see clearly)

    3. In multiple inheritance, the vptr corresponding to each base class are placed in the class memory space in order of inheritance, and the subclasses share the same vptr with the first base class.

    4. A virtual function declared in a subclass, in addition to a pointer to the corresponding function of each base class, adds an additional copy of the vptr to the first base class (which represents a common meaning)

With the memory layout in place, let's look at how g++ is dynamically bound on such a memory layout.

g++ a pointer to each class or reference object, if it is a virtual function in its class declaration, uses the vptr located at its first address in the memory space to find the VTBL and get the address of the function. If the parent class declares a virtual function that is not covered by the subclass, it is addressed using the vptr of the corresponding parent class.

First to verify, use Objdump-s to get B1.F () assembly instructions:

Assembly (x86)

B1.F (); 8048734:       8b    0x24 (%ESP),%eax    # Get Base1 object address 8048738:       8b    (%eax) MOV,%eax        # vptr on the object's first address is referenced to get VTBL address 804873a:       8b                   mov    (%eax),%edx # dereference The        address of the first virtual function on VTBL 804873c:       8b    0x24 mov (%ESP),%eax 8048740:                mov    %eax, (%ESP) 8048743:       ff d2< C19/>call   *%edx              # Call function

The process and our analysis are exactly the same, smart you may have found, B2 how to do? Derived class instance memory vptr on the first address is not BASE2 class Ah! The answer is actually because g++ in the reference assignment statement Base2 &b2 = Ins:

Assembly (x86)

Derived ins; 804870D:       8d 1c             Lea    0x1c (%ESP),%eax 8048711:                mov    %eax, (%ESP) 8048714:       E8 C3   80488dc <_ZN7DerivedC1Ev>    Base1 &b1 = ins, 8048719:       8d 1c             Lea    0x1c (%ESP),%eax 804871d:    %eax,0x24 (%esp)    Base2 &b2 = ins; 8048721:       8d 44 24 1c             Lea    0x1c (%ESP),%eax   # Get the INS instance address 8048725:       c0         Add $0x4,%eax # Adds an offset to a pointer 8048728:    %eax,0x28 mov (%ESP)   # Initialize reference    Derived &d = ins; 804872c:       8d 1c             Lea    0x1c (%ESP),%eax 8048730:       2c             mov    %eax,0x2c (%ESP)

Although it is a reference to the same instance, depending on the reference type, the g++ compiler assigns different addresses to different references. For example, B2 obtains a pointer offset, thus guaranteeing the correctness of the vptr.

PS: We also prove that the real identity of a reference in C + + is a pointer ...

Next go to the second question:

When was the VTBL created? When was the vptr initialized?

Since we already know how g++ is using vptr and VTBL to achieve virtual function magic, then when did Vptr and VTBL be created?

Vptr is a relatively easy problem to think about, because Vptr explicitly belongs to an instance, so the assignment of vptr should be placed in the constructor of the class. g++ for each class that has a virtual function implicitly adds an operation that assigns a value to vptr at the end of the constructor.

Also validated by the generated assembly code:

Assembly (x86)

Class Derived:public Base1, public base2{80488dc:55 push%EBP 80488dd:89 e5                MOV%esp,%ebp 80488df:83 EC Sub $0x18,%esp 80488e2:8b 45 08          mov 0x8 (%EBP),%eax 80488e5:89 mov%eax, (%ESP) 80488e8:e8 D3 FF FF FF                Call 80488c0 <_ZN5Base1C1Ev> 80488ed:8b mov 0x8 (%EBP),%eax 80488f0:83 C0 04          Add $0x4,%eax 80488f3:89 mov%eax, (%ESP) 80488f6:e8 D3 FF FF FF  Call 80488ce <_ZN5Base2C1Ev> 80488fb:8b mov 0x8 (%EBP),%eax 80488fe:c7       8a Movl $0x8048a48, (%eax) 8048904:8b mov 0x8 (%EBP),%eax 8048907:                      C7 5c 8a Movl $0x8048a5c,0x4 (%eax) 804890e:c9 leave 804890F:C3 Ret

As you can see in the code, the constructor of the derived class assigns an initial value to the two vptr of the instance, but the two initial values are actually immediately counted! Count Now! Count Now! This shows that the VTBL generation is not run-time, but in the compilation period has been determined to store at these two addresses!

This address is not expected to belong to the. Rodata (read-only data segment), using Objdump-s-j. Rodata to extract the corresponding memory observations:

80489e0 03000000 01000200 00000000 42617365 ...... Base 80489f0 313a3a66 28290042 61736532 3a3a6728 1::f (). Base2::g (8048a00 29004465 72697665 643a3a66 28290044). Derived::f (). D 8048a10 65726976 65643a3a 67282900 44657269 erived::g (). Deri 8048a20 7665643a 3a682829 00000000 00000000 ved::h () .... 8048a30 00000000 00000000 00000000 00000000 ......... 8048a40 00000000 a08a0408 34880408 68880408 ..... 4...h ... 8048a50 94880408 fcffffff a08a0408 60880408 ......... 8048a60 00000000 c88a0408 08880408 00000000 ......... 8048a70 00000000 d88a0408 dc870408 37446572 ...... 7Der 8048a80 69766564 00000000 00000000 00000000 ived ...... 8048a90 00000000 00000000 00000000 00000000 ......... 8048AA0 889f0408 7c8a0408 00000000 02000000 ..... | ........... 8048ab0 d88a0408 02000000 c88a0408 02040000 ......... 8048AC0 35426173 65320000 a89e0408 c08a0408 5base2 .... 8048ad0 35426173 65310000 a89e0408 d08a0408 5base1 ....

Because the program runs the machine is a small end machine, after a simple conversion can get the first vptr point to the memory of the first data is 0x80488834, if the data interpreted as a function address to the assembly file lookup, you will get:

Assembly (x86)

08048834 <_zn7derived1fev>:};class derived:public Base1, public base2{public:    virtual void F () {8048834:
  
   55                      push   %ebp 8048835:       e5                   mov    %esp,%ebp 8048837: $       EC/                Sub    $0x18,%esp
  

Bingo! g++ determines the content of VTBL for each class at compile time, and adds the appropriate code to the constructor to enable Vptr to point to the address of the VTBL that is already filled.

This also answers the third question for us:

In C + + program virtual memory running in Linux, where does vptr and VTBL reside in virtual storage?

Look directly at the picture:

The location of virtual functions in the dummy memory

The gray part of the diagram should be familiar to you, and the colored parts and associated arrows describe the process of virtual function calls (the figure shows the case of creating an instance of the heap with new, which is different from the sample code, a small mistake, not mind): When a virtual function is called, The instance address located in the heap is first found through the pointer to the instance in the stack, and then through the vptr at the beginning of the instance memory, locate the VTBL in the. Rodata segment, locate the function address that you want to call based on the offset, and finally jump to the function address in the code snippet to execute the target function.

Summarize

The cause of the study of these problems is because the company code has a very wonderful behavior, after tracing to the virtual function table out of the problem, so there is a chance to down-to-earth virtual function implementation to explore.

Perhaps you will think, even if I do not understand these underlying principles, but also the normal use of virtual functions, also can write very good face object code AH?

This is nothing wrong, but, C + + as the most complex programming language in the universe, it provides a very powerful function, is tantamount to martial arts novels in the Sharp Dragon blades. But the martial arts is not good rookie if random yellow orchids x blades, but it is easy to anti-injury. Only understand the basic principles and mechanisms of C + +, we can make the C + + this dragon blades use more handy, change out more gorgeous moves, become a real martial arts master.

Related articles:

virtual function of C #

Describes the underlying virtual function classes for inheritance and polymorphism in C + +

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.