Seeing for reality (1): Implementation of the basic concepts of C ++ In the Compiler

Source: Internet
Author: User
Document directory
  • 1. Object Space and virtual functions
  • 1.2 vptr and vtable
  • 2. Structure and Analysis
  • 3. Different implementations
  • Appendix 1 incremental links and ILT
  • Appendix 2 Name mangling/demangling of C ++
  • Appendix 4 generate mapfile in G ++
Seeing for reality (1): Implementation of the basic concepts of C ++ In the Compiler

Many programmers are familiar with the c ++ object model. This article attempts to use a simple example to demonstrate the implementation of some basic concepts of C ++ In the compiler, in order to achieve the desired effect.

The demo of this article (http://www.fmddlmyy.cn/cpptest.zip) can be downloaded from my personal homepage. The program package contains projects created using vc6, vc7, BCB, Dev-C ++, and mingw. The printed output and assembly code below are mainly derived from the vc6 environment.

1. Object Space and virtual function 1.1: Object Space

When we allocate a space for an object, for example:

CChild1 *pChild = new CChild1();

What is in this space?

When cchild1 does not have a virtual function, the cchild1 object space stores the non-static members of its base class and its own non-static members in sequence. Objects without any non-static members have a placeholder of bytes.

If cchild1 has a virtual function, the vc6 compiler adds a pointer at the beginning of the object space. This is the virtual function table pointer (vptr: virtual function table pointer ). Let's look at this piece of code:

Class cmember1 {
Public:
Cmember1 () {A = 0x5678; printf ("construct cmember1/N ");}
~ Cmember1 () {printf (" cmcmember1/N ");}
Int;
};

Class cparent1 {
Public:
Cparent1 () {parent_data = 0x1234; printf ("construct cparent1/N ");}
Virtual ~ Cparent1 () {printf (" cpcparent1/N ");}
Virtual void test () {printf ("Call cparent1: Test ()/n ");}
Void real () {printf ("Call cparent1: Test ()/n ");}
Int parent_data;
};

Class cchild1: Public cparent1 {
Public:
Cchild1 () {printf ("construct cchild1/N ");}
Virtual ~ Cchild1 () {printf (" cchild1/N ");}
Virtual void test () {printf ("Call cchild1: Test ()/n ");}
Void real () {printf ("Call cchild1: Test ()/n ");}
Cmember1 member;
Static int B;
};

What is the size of the cchild1 object? The following is the print output of the demo program:

----> Derived class Object
Object address 0x00370fe0
Object size 12
Object content
00370fe0: 00410104 00001234 00005678
Vptr content
00410104: 004016a0 00401640 00401f70

The size of the cchild1 object is 12 bytes, including: vptr, the base class member variable parent_data, and the derived class member variable member. The virtual table pointed to by vptr is an array composed of virtual function addresses.

1.2 vptr and vtable

If we use the dumpbin plug-in provided by VC to decompile the debug output program:

dumpbin /disasm test_vc6.exe>a.txt

You can find the following in a.txt:

?test@CChild1@@UAEXXZ:
    00401640: 55 push ebp
...
??_ECChild1@@UAEPAXI@Z:
    004016A0: 55 push ebp

The two addresses in the vtable point to the cchild1 destructor and the cchild1 member function test respectively. These two functions are virtual functions of cchild1. If you print the content of two cchild1 objects, you can find that their vptr is the same, that is, each class with a virtual function has a vtable, And the vptr of all objects in this class points to this vtable.

The function name here is not a bit strange. Appendix 2 briefly introduces the name mangling of C ++.

1.3 static member variables

In C ++, static variables of the class are equivalent to adding global variables for access control without occupying the object space. Their addresses are determined when the link is compiled. For example, if we select "generate mapfile" in the Link settings of the project, after building, we can see in the generated map file:

0003:00002e18 ?b@CChild1@@2HA 00414e18 test1.obj

From the print output, we can see that the cchild1: B address is 0x00414e18. In fact, the declaration of variable B in the class definition is only declaration. If we do not define this variable outside the class definition (fully local), this variable does not exist at all.

1.4 call virtual functions

By setting breakpoints in the VC debugging environment and switching to the Assembly display mode, we can see the assembly code that calls the virtual function:

16: pChild->test();
(1) mov edx,dword ptr [pChild]
(2) mov eax,dword ptr [edx]
(3) mov esi,esp
(4) mov ecx,dword ptr [pChild]
(5) call dword ptr [eax+4]

Statement (1) Place the object address in the register EDX. Statement (2) load the vptr at the object address into the register eax. Statement (5) jumps to the address of the second entry of the vtable pointed by the vptr, that is, the member function test.

Statement (4) Put the object address in the register ECx, which is the implicit this pointer passed into non-static member functions. The non-static member function uses the this pointer to access non-static member variables.

1.5 virtual and non-virtual functions

In the demo, we printed the member function address:

Printf ("cparent1: Test address 0x % 08 P/N", & cparent1: test );
Printf ("cchild1: Test address 0x % 08 P/N", & cchild1: test );
Printf ("cparent1: real address 0x % 08 P/N", & cparent1: Real );
Printf ("cchild1: real address 0x % 08 P/N", & cchild1: Real );

The following output is displayed:

Cparent1: Test address 0x004018f0
Cchild1: Test address 0x004018f0
Cparent1: real address 0x00401460
Cchild1: real address 0x00401670

The addresses of two non-virtual functions are easy to understand. They can be found in the output of dumpbin:

?real@CParent1@@QAEXXZ:
    00401460: 55 push ebp
...
?real@CChild1@@QAEXXZ:
    00401670: 55 push ebp

Why are the "addresses" of the two virtual functions the same? In fact, the address of the thunk code is printed here. By viewing the output of dumpbin, we can see that:

??_9@$B3AE:
(6) mov eax,dword ptr [ecx]
(7) jmp dword ptr [eax+4]

If we put the object address in the register ECx before jumping to this Code, Statement (6) will load the vptr at the object address into the register eax, Statement (7) jump to the address of the second vtable entry pointed to by vptr, that is, the member function test. The virtual functions of the base class and the derived class vtable are arranged in the same order, so you can share a piece of thunk code.

This thunk code is used to call virtual functions through function pointers. If we do not take the function address, the compiler will not generate this code. Do not confuse the thunk code in this section with the virtual function address in vtable. The thunk Code determines which function to call based on the input Object Pointer. The virtual function address in vtable is the real function address.

1.6 pointer to a virtual function

Let's try to call a virtual function through a pointer. Non-static member function pointers must be called through object pointers:

Typedef void (parent: * pmem )();
Printf ("/n ----> call by function pointer/N ");
Pmem PM = & parent: test;
Printf ("function pointer 0x % 08 P/N", PM );
(Pparent-> * PM )();

The following output is displayed:

----> Call through function pointer
Function pointer 0x004018f0
Call cchild1: Test ()

We re-compile this assembly code from the VC debugging environment:

13: (pParent->*pm)();
(8) mov esi,esp
(9) mov ecx,dword ptr [pParent]
(10) call dword ptr [pm]

Statement (9) puts the object pointer into the register ECx. Statement (10) calls the thunk code pointed to by the function pointer, which is the statement in Section 6 ). What will happen below, as mentioned above.

1.7 Implementation of Polymorphism

After the previous analysis, the implementation of polymorphism should be obvious. When calling a virtual function with a base class pointer pointing to a derived class object, because the vptr of the derived class object points to the vtable of the derived class, the function of the derived class is called.

To call a virtual function through a function pointer, you must determine the virtual function address through vtable. Therefore, polymorphism occurs, that is, the virtual function in the current object vtable is called.

2. constructor and constructor 2.1

The following statement:

Printf ("----> construct a derived class Object/N ");
Cchild1 * pchild = new cchild1 ();

Generate the following output:

----> Construct a derived class object and construct cparent1
Construct cmember1
Construct cchild1

The compiler adds some code to the User-Defined constructor: First calls the constructor of the base class, then constructs each member object, and finally the constructor code in the Program (hereinafter referred to as the user code ). The following assembly code is the constructor of the cchild1 class modified by the compiler:

??0CChild1@@QAE@XZ:

004014D0    push ebp
            ...
(11)        call CParent1::CParent1 (004013b0)
            ...
(12)        call CMember1::CMember1 (00401550)
(13)        mov eax,dword ptr [this]
(14)        mov dword ptr [eax],offset CChild1::`vftable' (00410104)
(15)        push offset string "/xb9/xb9/xd4/xec CChild1/n" (004122a0)
            call printf (004022e0)
            ...
            ret

Statement (11) calls the constructor of the base class, Statement (12) constructs the member object, and statement (15) is the user code. Statement (13) and (14) are also worth mentioning: Statement (13) Put the object address to the Register eax, Statement (14) Put the vtable pointer of the cchild1 class into the object address (eax). They are built on the vptr of the object.

If the object is constructed using the new operator, the compiler will first call the new function to allocate the object space and then call the above constructor.

2.2 destructor

Deleting a pointer to an object in a derived class produces the following output:

----> Delete the base class pointer pointing to the object of the derived class
Analysis of cchild1
Analysis of cmember1
Analysis of cparent1

The compiler will add some code to the User-Defined destructor: that is, the user code is called first, then each member object is parsed, and finally the constructors of the base class are parsed. The following assembly code is the destructor of the cchild1 class modified by the compiler:

??1CChild1@@UAE@XZ:
00401590    push ebp
            ...
            push offset string "/xce/xf6/xb9/xb9 CChild1/n" (004122c0)
            call printf (004022e0)
            ...
(16)        call CMember1::~CMember1 (00401610)
            ...
(17)        call CParent1::~CParent1 (004013f0)
            ...
            ret

The preceding is the user code. Statement (16) calls the destructor of the member object, and statement (17) calls the destructor of the base class. Careful friends will find that the address of the Destructor here is different from the address of the destructor in the previous vtable. In fact, their names are different. They are two functions:

??_ECChild1@@UAEPAXI@Z:
004016A0    push ebp
            ...
(18)        call CChild1::~CChild1 (00401590)
            ...
(19)        call operator delete (004023a0)
            ...
            ret 4

If you look at the debugger (or demangling using the DEM tool), the second destructor is named cchild1: 'scalar deleting destructor', and the previous destructor is named cchild1 :: ~ Cchild1. Function cchild1: 'scalar deleting destructor 'Call the preceding destructor in statement (18) and call the delete function in statement (19) to release the object space.

When deleting an object pointer through Delete, you need to release the object space after the destructor, so the compiler synthesize the second destructor. Calling the Destructor through vtable is certainly caused by the delete Object Pointer, so the second destructor is placed in vtable. Only the first destructor can be called when you describe the objects on the stack.

2.3 virtual destructor

Do not confuse destructor with virtual functions. No matter whether the Destructor is a virtual function or not, the compiler will synthesize the Destructor according to section 2.2. Setting the Destructor as a virtual function is to call the destructor of the derived class when the object of the derived class is deleted through the base class pointer. If the Destructor is not a virtual function and the derived class object does not have vptr, the compiler will call the destructor of the base class (determined at compilation ).

In this way, the code entered in the destructor of the derived class will not be called, and the destructor of the member object of the derived class will not be called. However, the object space of the derived class will still be correctly released, and the heap administrator knows how much space the object is allocated.

3. Different implementations

The purpose of this article is to enhance the understanding of the basic concepts of C ++ through an appropriate understanding of the internal implementation of the compiler. Our code should not rely on internal mechanisms that may change. In fact, the implementations of different compilers for the same mechanism are also quite different. For example, the location of a vptr may have multiple solutions:

  1. The VC compiler places the vptr in the object header.
  2. BCB compiler puts vptr In the first object header with vptr in the inheritance system
  3. The Dev-C ++ compiler previously put vptr at the end of the first object with vptr in the inheritance system

The latest version of Dev-C ++ (4.9.9.2) also puts vptr in the object header. In fact, there is a small problem in the 1st solution: if the base class object does not have vptr, and the derived class object has vptr, when the base class Pointer Points to the derived class object, the compiler has to adjust the base class pointer address, point it to the non-static member of the base class after the vptr. In the future, if you use the base class pointer to delete an object from a derived class, an error will occur because the delete address is different from the allocated address. Readers can find the code to study this problem in the demo program (in fact, it is a problem of a netizen on csdn ). Place the vptr in the other two positions, because you can avoid this problem without adjusting the base class pointer.

The program generated by the G ++ Compiler (v3.4.2) outputs the following when printing the virtual function address:

Cparent1: test IP address 0x00000009
Cchild1: Test address 0x00000009

When calling a function through a function pointer, the compiler will find the virtual function test in the virtual function table of the object through the number 9.

Appendix 1 incremental links and ILT

To simplify the expression, the "link incrementally" option is disabled in the vc6 Project Settings (Debug) of the demo program. If this option is enabled, the compiler indirectly calls a function through an array called ILT. Each element of the array ILT is a 5-byte JMP command, for example:

@ILT+170(?test@CChild2@@QAEXXZ):
    004010AF: E9 1C 10 00 00 jmp ?test@CChild2@@QAEXXZ

When the compiler calls a function:

call @ILT+170(?test@CChild2@@QAEXXZ)

Jump to the actual address of the function through ILT. In this way, when the function address changes, the compiler only needs to modify the ILT table, instead of modifying each statement that references the function. ILT is a variable name from the compiler developer. According to cody2k3, it may be the abbreviation of incremental linking table.

Appendix 2 Name mangling/demangling of C ++

The C ++ compiler converts the variable name and function name in the program into an internal name. This process is called name mangling, and the reverse process is called name demangling. The internal name contains more information about variables or functions. For example, what can be seen by the compiler? G_var @ 3ha, you will know that this is:

int g_var

"3 H" indicates a global variable of the int type. What does the compiler see? Test @ cchild2 @ qaexxz. You know this is:

public: void __thiscall CChild2::test(void)

Generally, compiler vendors do not publish mangling rules because these rules may change as needed. However, Microsoft provides a demangling function undecoratesymbolname. I used this function to write a small tool called "dem". You can get the declaration information of variables or functions from the internal name. Readers can download this tool (http://www.fmddlmyy.cn/dem.zip) from my personal homepage ).

For more information about "C ++ name mangling/demangling", see http://www.kegel.com/mangle.html.

Appendix 3 thunk

It is said that a Algol-60 programmer used the word "thunk" for the first time, and its original semantics originated from "Thought of (thunked )". The main semantics of this word is "address conversion and replacement program", which generally refers to the transfer of another function through a small piece of assembly code. When the caller calls the thunk code, he thinks he is calling a function. The thunk code transfers the control to a function of its choice. For example, each element of the ILT array described in Appendix 1 is a small thunk code.

Appendix 4 generate mapfile in G ++

When you call the link program LD indirectly through gcc/g ++, "-wl," must be added before all LD options ,". Therefore, for G ++ to generate mapfile, the compilation parameter "-wl,-map, and mapfile" must be added ".

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.