Function calling in the world through compilation

Source: Internet
Author: User

In my other article, I mentioned the need to analyze the truth about virtual function calls through assembly languages. Now we start to embark on this difficult but interesting journey. If you want to talk less about other things, go to the topic. The C ++ code used in this article:

# Include "stdafx. H"
# Include <iostream>

Class cbase {
Public:
Virtual void callme ();
};

Class cderived: Public cbase {
Public:
Virtual void callme ();
};

Void cbase: callme (){
STD: cout <"Hello, I'm cbase." <STD: Endl;
}

Void cderived: callme (){
STD: cout <"Hello, I'm cderived." <STD: Endl;
}

Int _ tmain (INT argc, _ tchar * argv [])
{
Cderived dobj;
Cbase bobj = dobj;
Cbase * pbase = & dobj; pbase-> callme (); dobj. callme ();
(Cbase) dobj). callme ();
(* Pbase). callme ();
(* (Cderived *) pbase). callme ();
Return 0;
}

I will analyze every C ++ statement one by one, and each c ++ statement will be followed by the corresponding assembly code for easy comparison.

The first statement:

; 29: cderived dobj;

Lea ECx, dword ptr _ dobj $ [EBP]
Call ?? 0cderived @ Qae @ xz

From the syntax point of view, we should call dobj's non-parameter constructor.
From the perspective of Assembly, we find two points worth attention:
1. The function to be called has been determined at the compilation stage and is called directly using the function address.
2. The call of member functions of the class adopts the "this" Call constraint, and the implicit this pointer is stored in the exc register.

The second statement:

; 31: cbase bobj = dobj;

Lea eax, dword ptr _ dobj $ [EBP]
Push eax
Lea ECx, dword ptr _ bobj $ [EBP]
Call ?? 0cbase @ Qae @ abv0 @ Z

From the syntax point of view, a base class object is constructed using a derived class object. The copy constructor of the base class object should be called to construct it.
From the assembly point of view, we found another point to note: the second parameter of the constructor is stored in eax. Eax stores the first address of the object to be copied.

Statement 3:

; 36: pbase-> callme ();

MoV eax, dword ptr _ pbase $ [EBP]
MoV edX, dword ptr [eax]
MoV ESI, ESP
MoV ECx, dword ptr _ pbase $ [EBP]
Call dword ptr [edX]
Cmp esi, ESP
Call _ rtc_checkesp

From the syntax perspective, the base class pointer is used to call virtual functions. The actually called functions depend on the virtual function table pointer in the memory space referred to by the base class pointer. This is an example of using a base class pointer to call the virtual function of a derived class.
From the perspective of Assembly, all the secrets of virtual function calls are hidden here:
1. Retrieve the value of the first DWORD element in the content space referred to by the pbase pointer and store it in eax. The actual storage in eax is "virtual function table Pointer ".
2. Obtain the eax value, obtain the address of the virtual function table, retrieve the value of the first element in the virtual function table, and store it in EDX. The real storage in edX is the address of the virtual function to be called. In this example, there is only one virtual function, so the first element in the virtual function table is the function we need. If there are multiple virtual functions in the virtual function table, when using the third virtual function, we need to add an offset to the first address of the virtual function table to obtain the corresponding virtual function address.

Statement 4:

; 34: dobj. callme ();

Lea ECx, dword ptr _ dobj $ [EBP]
Call? Callme @ cderived @ uaexxz; cderived: callme

From the syntax point of view, a class object is used to call a class member function, and this member function is declared as a virtual function.
From the Assembly perspective, the call situation here is the same as that of the first statement. The "this" Call constraint is used to directly obtain the address of the called function during the compilation phase and call this function.

Fifth statement:

; 35: (cbase) dobj). callme ();

Lea eax, dword ptr _ dobj $ [EBP]
Push eax
Lea ECx, dword ptr $ t1758 [EBP]
Call ?? 0cbase @ Qae @ abv0 @ Z
;-----------------------------------
MoV dword ptr tv72 [EBP], eax
MoV ECx, dword ptr tv72 [EBP]
MoV edX, dword ptr [ECx]
MoV ESI, ESP
MoV ECx, dword ptr tv72 [EBP]
Call dword ptr [edX]
Cmp esi, ESP
Call _ rtc_checkesp

From the syntax perspective, dobj is converted to the base class and virtual functions are called.
From the assembly point of view, this is a little more complicated than we guess. After careful analysis, there are two steps:
1. Use dobj to construct a temporary base class variable.
2. Use the address of the temporary base class variable to call the virtual function.
Therefore, the above C ++ code can be written in this form:

Cbase tempbobj = ddobj; cbase * ptempbase = & tempbobj;
Ptempbase-> callme ();

Statement 6:

; 37: (* pbase). callme ();

MoV eax, dword ptr _ pbase $ [EBP]
MoV edX, dword ptr [eax]
MoV ESI, ESP
MoV ECx, dword ptr _ pbase $ [EBP]
Call dword ptr [edX]
Cmp esi, ESP
Call _ rtc_checkesp

From the syntax perspective, the virtual function is called through the objects referred to by pbase.
From the assembly point of view, this is the same as the common standard virtual function call method.

Statement 7:

; 38: (* (cderived *) pbase). callme ();

MoV eax, dword ptr _ pbase $ [EBP]
MoV edX, dword ptr [eax]
MoV ESI, ESP
MoV ECx, dword ptr _ pbase $ [EBP]
Call dword ptr [edX]
Cmp esi, ESP
Call _ rtc_checkesp

From the syntax point of view, the pbase pointer is converted to the pointer of the derived class, and the virtual function is called by referring to the object.
From the assembly point of view, this is the same as the common standard virtual function call method.

By comparing the above various function call methods, we can deeply realize that:
1. virtual functions can be called without being declared as virtual functions. virtual functions are essentially the same as common member functions and have definite function addresses.
2. virtual functions must be "virtual". In essence, there is only one way: Get the virtual function table pointer of the class object through the object address to obtain the address of the virtual function table, indirectly obtain the address of the called virtual function.

In addition, through this study, I also felt like this:
1. This huge secret may sometimes be hidden behind seemingly simple syntaxes.
2. Behind seemingly complex syntaxes, it is sometimes hard to believe that they are simple and "clean"
3. When we have a better understanding of the underlying layer and assembly, we will have a deeper understanding of the syntax and a stronger ability to control the syntax.

In the function calls involved in this article, the called function and the called function are in the same module. In the subsequent articles, I will discuss the function calls in different modules, the most common example is when an application calls a DLL. Let's make a notice here first.

Note:
1. In the VC environment, we can obtain the assembly code of the program in this way:
Open project> select the project to be compiled in Solution Explorer, right-click, select "properties"-"C/C ++-" output file-"assembly output-" and select "assembly with source code ". Compile the project and you can see the file ending with. ASM in the output directory of the project. This is the compilation code corresponding to the C/C ++ source code.

2. During the compilation of C ++, the function name, variable name, and other symbol names are "modified". This is called "name mangling ". The Modification result is that it is difficult for us to identify these symbol names. For example, it is difficult for us to determine ?? 0cderived @ Qae @ xz refers to the function. The VC development kit provides a small tool that can help us "reverse modify" the modified Symbol names. This tool is located in: VS installation directory/VC directory/bin/undname.exe. With this tool, we can use this command to perform "anti-modifier ":

Undname.exe ?? 0cderived @ Qae @ xz

The result is:

Undecoration :-"?? 0cderived @ Qae @ xz"
Is:-"public: _ thiscall cderived: cderived (void )"

V1.1

Since the publication of this article, I have seen some questions about virtual function calling in the csdn forum, when I try to use the two conclusions summarized in the article to analyze these new virtual function calls, I found that the two conclusions I have summarized are still quite satisfactory (satisfied ...), However, some content is missing, which should be added here.

Static call of virtual functions
In the previous content, I spent most of my time analyzing how virtual functions can be called dynamically, but I also clearly pointed out (the first in the two conclusions ): the essence of a virtual function is no different from that of a common function. It also has its own address, which means that a virtual function can be called statically like other common functions. Static call means that you can determine which function is called during the compilation phase, and generate commands to directly call that function. From the previous analysis, we have learned that the dynamic calling of virtual functions must meet the "pointer (or reference) + virtual table" conditions, therefore, we can imagine that all virtual function calls that do not meet these two conditions are static calls. In C ++, we can use the following two methods to call a virtual function statically:
1. Use a class object to directly call a virtual function. This is already shown in the previous example:

Cderived dobj;
Dobj. callme ();

2. You can call a virtual function through a class Object Pointer (or reference). Instead of looking for and calling a virtual function through a virtual table, you can directly specify the called virtual function. In the beginning, this seems to be an extra move, and there is no need: Since you need this function to be dynamically called, why should you declare it as a virtual function? Isn't declared as a common member function solved? For such a doubt, consider the question: how to call the virtual function corresponding to the base class in the virtual function of the derived class? See the following code:

Void cderived: callme (){
// How to call cbase: callme () here?
// This-> callme (); // Error
STD: cout <"Hello, I'm cderived." <STD: Endl;
}

In this case, "This-> callme ();" calls "cderived: callme ()", forming an infinite recursive call. Without a mechanism, we cannot call the virtual functions of the base class in the virtual functions of the derived class. C ++ has come up with this and provides a mechanism to perfectly solve this problem: when calling a virtual function using a pointer, you can specify a class domain to call the virtual function of that class. This mechanism allows the call of virtual functions to skip the process of querying virtual tables and directly "locate" the virtual functions to be called, thus implementing static calls of virtual functions. The code can be written as follows:

Void cderived: callme (){
This-> cbase: callme (); // or cbase: callme ();
STD: cout <"Hello, I'm cderived." <STD: Endl;
}

The corresponding assembly code is:

// This-> cbase: callme ();
MoV ECx, dword ptr _ this $ [EBP]
Call? Callme @ cbase @ uaexxz; cbase: callme

Pure virtual functions
We know that adding "= 0" after the declaration of a virtual function indicates that this virtual function is a pure virtual function, and this class also becomes "abstract base class ":

Class cbase {
Public:
Virtual void foobar () = 0 {
STD: cout <"Hello, I'm abstract virtual function." <STD: Endl;
};
Virtual void callme ();
};

The problem here is that for pure virtual functions, that is to say, what will be done when the compiler sees "= 0" after the virtual function declaration? Through discussion by netizens in the Forum and my own understanding and analysis of this part of content, I have come to the following conclusions:
1. Pure virtual functions can be implemented in the same way as common functions and common virtual functions. In C ++, abstract base classes are not allowed to be instantiated. In many cases, we consider them subjective (or misled by some books and articles) pure virtual functions can have implementations. In fact, this is incorrect.
2. "= 0" in the pure virtual function declaration can only represent: The function address in the item used by the pure virtual function in the virtual table is not the address of the pure virtual function, but 0 (null ). In the VC compiling environment, the compiler fills in the address of the function generated by the compiler in the item corresponding to the pure virtual function in the virtual table, when a pure virtual function is called through a virtual table, the function actually calls that function, and that function throws an exception of "calling a pure virtual function.
3. We know that in a derived class from the abstract base class, the pure virtual function must be implemented. If it is not implemented, the derived class is still an abstract base class. If multiple Derived classes have a public part, you can put this public part in the implementation body of pure virtual functions, in the implementation of the virtual function in the derived class, the implementation body of the pure virtual function is called:

Class cderived: Public cbase {
Public:
Virtual void foobar (){
Cbase: foobar ();
Printf ("Hello, I'm abstract virtual function in cderived class .");
}
Virtual void callme ();
};

From this we can see that pure virtual functions are not mysterious, and they are no different from common virtual functions. The only difference is that virtual tables are not the addresses of pure virtual functions, instead, it stores the address of another special function.

History
10/15/2006 V1.0
First official version of the original article
01/05/2007 V1.1
Add: added static and pure virtual functions based on the original text.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.