Discussion on function calls and parameter transfer mechanisms in C Language (zz)

Source: Internet
Author: User
Function, I believe many people know its importance. A file is usually composed of one or more functions. However, many people may not know some deep problems about function calling, so I wrote this article Article First, I wrote it at the request of a good friend. Second, I hope some friends can learn about the Function calling mechanism from this article. But not everyone can fully understand this article. To fully understand this article, I want to have three conditions:
1. Have a certain understanding of the C language, and at least have a preliminary understanding of the whole;
2. Be able to understand the at&t syntax assembly in Unix/Linux; the differences between at&t assembly and Intel assembly are quite large; some people may not have this condition, however, by reading this article, you may also have a rough understanding of the function call mechanism;
3. Be patient when you see such a long article, and believe how much it will be helpful;
Okay, don't talk nonsense. Go to the topic.
I. Basic Knowledge Framework:
This part mainly talks about some basic things, mainly about stack knowledge. You can continue to read the stack only after you understand the basic content of the stack.
1. Conceptual Knowledge:
The so-called stack is actually Program It is a memory element used to store some data in the memory. I once wrote an article published on this forum and talked about the difference between stack and stack. The stack that I often say is actually a stack, not a stack, so the same is true here. Note that this is different from the data structure stack. Do not mix it together.
2. How the stack works:
How do we store data in the memory? It starts from the low address, and then stores the data one by one to the high address based on the bytes occupied by the data. But the stack is different. Stack works by inserting data into the stack area and deleting data from the stack area. This is a summary. Specifically:
In Unix/Linux, a stack is derived from a high address to a low address. An important thing here is stack pointer ESP. What is a stack pointer? It always points to the top of the stack (but if it is the bottom according to the address value), is it a bit vague about the word at the top? That is to say, if you press the stack and press it into a 4-byte data element, then ESP moves down to 4 bytes, so ESP points to a lower address, so it points to the bottom. You can think of a stack as a cup. If the horizontal line goes up when the pouring water goes into the cup, (the bottom of the cup is assumed to be a high address, and the top is set to a low address ), is the horizontal line falling out of water? It is the same as the method for pushing stacks and loading stacks. It doesn't matter if you haven't understood it. Just draw a picture and compare it carefully. I will not draw any picture if I am lazy.
3. Introduction to the pressure stack and import stack commands:
Stack pressure command: pushx Source
Here, 'x' can be 'w' (indicating words) or 'l' (indicating long words); source can be a value, register value or memory address;
Output stack command: popx des
Similarly, 'x' can be 'W' or 'l', and DES can be a register value or a memory value;
The most basic things have been mentioned almost. Of course there are some other basic things that will be left for you to check the information. This part is about things that are closely related to this article.
Ii. How to use a function stack to solve the problem:
This part is a theoretical understanding of how a function solves function calls and parameter passing through the stack. It is very important that you can analyze the instance only after understanding it. This part is also divided into several parts:
1. Pass parameters through stack operations:
As mentioned above, the basic operations of a stack can be the pressure stack and the out stack, and parameter transmission is implemented in this way. ESP always points to the top of the stack. If an int-type data element is pressed at this time, esp moves four bytes down. At this time, it still points to the top of the stack (note that, the address on the top is lower than the address before moving ). If an int data element is moved out of the stack, esp moves four bytes up. At this time, it points to the top of the stack, but now the address is increased by four bytes. Therefore, if a function needs to pass parameters in the past, you have to press the parameters into the stack before calling the function. I will discuss this in detail later. It doesn't matter if you don't understand it.
2. General Assembly commands for function calls:
There are several Assembly commands for function calling. I will list them in the general order below: # ASM code

Function:
Pushl % EBP
Movl % ESP, % EBP
Subl $8, % ESP
#...
Movl % EBP, % ESP
Popl % EBP
RET

The following describes the meaning and purpose of these statements.
Pushl % EBP # What is the purpose of the Register % EBP stack? Check the following command:
Movl % ESP, % EBP
# The value of register % ESP is given to register % EBP. What is the % ESP register mentioned above? This command points to the top of the stack. Now, % EBP points to the top of the stack. So let's look at the first command, in fact, it is to protect the content originally in the % EBP register # Why should we assign the % ESP value to % EBP? Here comes clever. During function processing, some data may be pushed into the stack. At this time, the original content in the stack will be damaged. If the stack content is damaged, the pointer % ESP points to the top of the stack to an inaccurate address (I don't know if the word "inaccurate" can be used to describe the address ), then more unexpected problems will occur when the stack needs to be cleared. Stack clearing? Leave this word alone, and we will also explain it below. Therefore, the Second instruction aims to ensure that a register always points to the top of the stack without worrying about the problems mentioned above. Now the register % EBP always points to the top of the stack, and % ESP can be moved without fear of data corruption.
Subl $8, % ESP
# Looking at this command, why do we need to subtract the % ESP value from 8 for no reason? That is to say, % ESP moves 8 bytes down. What is the space of these 8 bytes used? These eight bytes are actually reserved for temporary variables. Note that it will set aside different space sizes based on the size of the bytes occupied by the temporary variable, so it is not necessarily 8 bytes, which may be 24 or 36 or even larger space; however, it is not good to have too many temporary variables.
Movl % EBP, % ESP # This Command copies % EBP to % esp. Why? Let % ESP re-point to the top of the stack, so that the stack can be cleared after the function call is completed.
RET # The return command after the function call is completed. In fact, this command also pushes the IP address that the function call has just started to press into the stack. Detailed analysis will be provided below.
The basic theory about how a function solves a problem through stacks is mentioned here. It doesn't matter if you don't understand the above content, the following section describes how to analyze the data through examples to give you a deeper understanding.
Iii. instance analysis of function call and parameter transfer mechanism:
This is the practical analysis part of this article. We will further understand it through examples. I will list C first Code Then, list the disassembly assembly code, and combine the C code to analyze the assembly code. I will try my best to make an analysis on various types of function calls or parameter types, which may seem cumbersome. Do you mind? If you are ready, let's get started:
1. function prototype: void function (void );
// C code

Void function (void)
{
Return;
}

Int main (void)
{
Function ();

Return 0;
}

Disassemble the assembly code. below is the code after GCC disassembly under Linux (Note: it is the disassembly code on my machine ):

Function:
Pushl % EBP
Movl % ESP, % EBP
Popl % EBP
RET

Let's take a look at it. Because function functions have nothing to do, they are directly returned. The above commands are basically the same as the code in section 2nd, or even simpler. refer to the previous analysis:
Let's take a look at the disassembly code of the main function. It's a little more complicated. Let's take a look:
Main:
Pushl % EBP
Movl % ESP, % EBP
Subl $8, % ESP
Andl $-16, % ESP
Movl $0, % eax
Addl $15, % eax
Addl $15, % eax
Shrl $4, % eax
Sall $4, % eax
Subl % eax, % ESP
Call function # function call command
Movl $0, % eax
Leave
RET

Look at the function call command: Call function. There are so many data commands in front of it. What are the instructions? Let me give you an analysis:

Pushl % EBP
Movl % ESP, % EBP
Subl $8, % ESP

These three sentences are not analyzed. Like the previous section 2nd, I forgot to look back. In fact, this also reflects one thing: in fact, the main function is also very common, it is similar to other functions, but its status is slightly higher.
Andl $-16, % ESP
This sentence may scare some people. Andl is the logic and command, while-16 is actually in the form of 0xfffffff0. Why should we perform logic and operation on the % ESP value and-16? Do not underestimate this command. Its role cannot be ignored. % ESP points to the top of the stack. This command forces the value of % ESP to be a multiple of 16. Why is it a multiple of 16? Here we must understand a common sense: the GCC default stack of the compiler in Linux is 16-byte alignment. Some people may ask why alignment is necessary. Alignment actually aims to speed up CPU access efficiency, remember this here.

Movl $0, % eax
Addl $15, % eax
Addl $15, % eax
Shrl $4, % eax
Sall $4, % eax

More people may be scared when I see these statements. Why do I perform so many operations on the % eax register? Indeed, I don't think it is necessary, because taking a closer look at these commands is nothing more than making the value of % eax 0. Check if % eax = 0 at the beginning. After two addl operations, the value of % eax is changed to 30, and the value of 30 is actually 0x11110, the following two commands ensure that the minimum value of % eax is 0. Note that this is only the disassembly instruction on my machine. Different machines may handle this differently, but 1.1 of the commands ensure that the value of % eax is 0. Take a look at the following command:

Subl % eax, % ESP

Check that the % ESP value is subtracted from the % eax value and the result is sent to % esp. Therefore, after this command, the % ESP value is still a multiple of 16, this is why the % eax value is a multiple of 16.

Call Function
Movl $0, % eax

This is simple. Call the function and clear the value of the % eax register to 0 to end the entire main function. This is the simplest function call analysis. It does not involve parameter transfer. Therefore, it is very simple. We will start to talk about parameter transfer. In fact, we have analyzed this example, the following is much simpler.

2. function prototype: int function (int I)

Now we have parameters and returned values, which are more complicated. Here we need to introduce the change of the % ESP register value, otherwise it will be difficult to clearly analyze the problem. If you want to describe it in an image, draw a picture, draw a chart and analyze it based on my data changes. Take a look at a simple C code:

// C code

Int function (int I)
{
Return 2 * I;
}

Int main (void)
{
Int J = function (10 );
Return 0;
}

The reason for this simplicity is only for the convenience of our analysis of the problem. Even if we understand the principle, we can understand it even if it is complicated. Let's start with main:

Main:
Pushl % EBP
Movl % ESP, % EBP
Subl $24, % ESP
Andl $-16, % ESP
Movl $0, % eax
Addl $15, % eax
Addl $15, % eax
Shrl $4, % eax
Sall $4, % eax
Subl % eax, % ESP # this is basically the same as the previous example, so we will not analyze it.
Movl $10, (% ESP)
Call Function
Movl % eax,-4 (% EBP)
Movl $0, % eax
Leave
RET

Take a look at the assembly code above, and do not analyze it as before. But there is a different sentence: subl $24, % ESP; because the main function has two temporary variables I, J; in order to have enough space for temporary variables, we just need to free up 24 bytes of space in the stack. Let's look at the following code:

movl $10, (% ESP) #===> % ESP = 800, (800) = 10
where 800 is the address value we assume, (800) indicates the content of address 800. Here (% ESP) refers to the content in % ESP address.
we just assumed that the value of % ESP is 800, the content with the address 800 is 10. The execution function is called. Note that before calling the function, the address after the call command is first pushed to the stack, that is, the IP value of the command after the call is pushed to the stack, so at this time % ESP =
796; here we need to figure out why we need to push the next command address to the stack. If we do not press the IP value to the stack, how can I find the address used for function calling after the function call is completed? That is to say, if the IP address is not pushed to the stack, the original execution address will not be returned after the function is called, and the execution sequence of the program will be incorrect!
the following code is used to compile a function:

Function:
Pushl % EBP
Movl % ESP, % EBP
Movl 8 (% EBP), % eax
Addl % eax, % eax
Popl % EBP
RET

Pushl % EBP; after this command, % ESP value is reduced by 4, so the % ESP value is 792. The following sentence:

Movl % ESP, % EBP #================>% EBP = 792, % ESP = 792, (792) = % EBP; (792) indicates the content of address 792.

Movl 8 (% EBP), % eax #=======> % eax = 10

Many people may not understand the above sentence. What is 8 (% EBP? 8 (% EBP) equals to: (% EBP + 8). Note that % EBP + 8
Is an address value, and brackets are added to indicate the content stored on the address. Therefore, 8 (% EBP) is actually the value of the address 800, and the value of the preceding address 800 is exactly 10! So this statement is actually to copy 10 to the % eax register.

Addl % eax, % eax #====>%eax = 20

It is equivalent to 2 * % eax, and % eax is equal to 20 at this time, which happens to implement (2 * I) in C code );

Popl % EBP #========> restore % EBP register value, % ESP equals 796 at this time

RET #========> after the function call is completed, the stack is actually popped up with the IP value of the stack. After executing this command, % ESP = 800

#800! When we call a function, % ESP is also 800! This is the implementation of the "Clear stack", that is, to clear the stack where the called function is located!

Now, the compilation code of the function is analyzed. Now let's continue to look at the next instruction in the main function. The following is the sentence:

Movl % eax,-4 (% EBP)

% What is stored in the eax register? Looking at the code of the function, you can know that it is actually the value of (2 * I), so the return value is actually passed through % eax! Passed to-4 (% EBP),-4 (% EBP) = (% EBP-4);-4 (% EBP) What is it? Check the C code and pass the returned value to the variable J. Will-4 (% EBP) be J? The answer is yes! Let's first look at the value of % EBP. Look
The assembly code of the main function can be concluded that % EBP actually points to the bottom of the stack of the main function, but remember the subl $24 mentioned above,
% ESP is the space reserved for temporary variables? -4 (% EBP) is stored in the temporary variable area! That is, variable J. 3. function prototype: int function (int I, Int J );
Now there are two parameters, not one. What should we do with the two? Let's also look at the C program and the corresponding assembly code:
// C code

Int function (int I, Int J)

{

Return (I + J );

}

Int main (void)

{

Function (1, 2 );

}

The main assembly code is listed below, not all of which are listed, because some of the same Code as the previous one has been analyzed, and it is no longer arrogant.

Main:

#......

Movl $2, 4 (% ESP)

Movl $1, (% ESP)

Call Function

No? First, 2 is sent to the stack, and then 1 is pressed to the stack. Let's look at the C code called by the function: function (1, 2); 2 on the right, and 1 on the left, so, when there are multiple parameters, the parameter pressure stack is actually pressed from right to left. When all parameters are pushed to the stack, the function is called.

Function:

Pushl % EBP

Movl % ESP, % EBP

Movl 12 (% EBP), % eax

Addl 8 (% EBP), % eax

Popl % EBP

RET

Look at the assembly code of the function: movl 12 (% EBP), % eax; do you know where 12 came from? Let's take a look at it by drawing a picture, and combine it with the previous analysis! When a function is called, the IP address is first pushed into the stack, and then % EBP is pressed into the stack. Then, the % ESP value is reduced by 8, and the % ESP value is copied to % EBP. That's all!

4. function prototype: char * function (char * s );

As a string function, the truth is similar, and I think it is simpler. For more information, see the code:

// C code

Char * function (char * s)

{
Return S;
}
Int main (void)
{

Char * P = function ("ABCD ");
}
List and analyze the assembly code:
Main:

#......

Movl $. lc0, (% ESP)

Call Function

Here you may ask what $. lc0 is. Let's take a look at the following definition:
. Lc0:

. String "ABCD"
. Lc0 is just a flag, that is, the string "ABCD"; so it is actually very simple. Movl $. lc0, (% ESP); is to send the string to the space where the address is % ESP.

The following describes the assembly code of the called function:
Function:

#......

Movl 8 (% EBP), % eax

#.....
8 (% EBP) is actually a string. We recommend that you draw a picture by yourself. As long as you draw a picture, it will naturally become very clear.

5. function prototype: struct text function (int n );
Now that the type of the function return value is changed to a struct, the difference comes. But the truth is still the same. The essence is the same. Here I will simply say a few words about the struct.
// C code
Struct text {
Int;
};

Struct text function (int n)
{
Struct Text S;
S. A = N;
Return S;
}
Int main (void)
{
Struct text T = function (10 );
Return 0 ;}
The examples here are very simple. Our purpose is to analyze how different function calls and parameter passing methods are implemented. Check the assembly code:
Main:

#......

Leal-20 (% EBP), % eax

Movl $10, 4 (% ESP)

Movl % eax, (% ESP)

Call Function

In this example, we will only analyze the most important commands. You can try to analyze them. At the beginning of the main function, GCC provided 40 bytes for the temporary struct variable on my host!
Subl $40, % ESP; I'm a little surprised. I don't know why I leave such a large space. But we don't care about the data. After all, it's not the most important.
Leal-20 (% EBP), % eax
Look at the above sentence. This command is very important. Copy the address (% EBP-20) to % eax. The value of % eax is actually the address of the member in the struct. Analyze the code below to confirm it.
The following is the assembly code of the function:
Function:

Pushl % EBP

Movl % ESP, % EBP

Subl $16, % ESP

Movl 8 (% EBP), % eax

Movl 12 (% EBP), % edX

Movl % edX,-4 (% EBP)

Movl-4 (% EBP), % edX

Movl % edX, (% eax)

subl $16, % ESP # reserve space for the temporary struct variable in the function.
the preceding and subsequent commands are assigned to the struct variable member based on the address of the variable. If you are interested, let me be lazy.
5. other Instructions
In fact, there are still many types that are not mentioned. Here, the return value is a floating point or the parameter is a floating point. The floating point is not mentioned because it is complicated, to support floating-point numbers and other functions (such as mathematical functions), additional commands and registers are required, all these things are called floating-point unit ---> FPU ). Let's take a look at floating-point numbers and operations with their own commands and registers. This is an article.
recursion is not used as well. The truth is the same as the previous one. It just repeats the call itself and derives from the stack sequence. Then the stack is cleared at the last layer. In addition, there are variable parameter functions. In fact, it is quite simple to analyze the variable parameter functions based on the above. The order of the incoming stack is still from right to left, you just need to know how variable parameter functions are implemented internally. There is a strong skill here. If you are interested, you can check the header file there are several macros about variable parameters, which are extremely skillful!
4. Postscript
my head has risen a bit here, and it took several hours to write it. I would like to talk about struct, variable parameter functions, and so on later, but there is no passion. Sorry. I also don't want to review the article.
due to the rush of writing, there must be many points in the article that are unclear or even incorrect. I hope you can correct them. Finally, I hope this article will help some friends understand the function call and parameter transfer mechanisms.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.