After the Linux Kernel Program Boot/head. s executes the basic initialization operation, it will jump to execute the init/Main. C program. How does the head. s program forward execution control to the init/Main. C program? That is, how does an assembler call and execute a C language program? Here we will first describe the calling Mechanism and Control Transfer Method of the C function, and then describe the method for the head. s program to jump to the C program.
Function call operations include two-way data transfer and execution control transfer between one piece of code and another piece of code. Data transmission is performed using function parameters and return values. In addition, we also need to allocate storage space for the local variables of the function when entering the function, and reclaim this space when exiting the function. Intel 80x86 CPU provides simple commands for control transfer, while data transfer and local variable storage space allocation and recovery are implemented through stack operations.
1. stack frame structure and Transfer Control Method
Most programs on the CPU use stacks to support function call operations. Stack is used to pass function parameters, store returned information, temporarily save original register values for recovery, and store local data. The stack used by a single function call operation is called the stack frame structure. The stack frame structure is generally 3-4. The two ends of the stack frame structure are specified by two pointers. The register EBP is usually used as a frame pointer, while ESP is used as a stack pointer ). In the process of function execution, the stack pointer ESP will move along with the data's inbound and outbound stacks. Therefore, the function's access to most data is based on the frame pointer EBP.
|
Figure 3-4 Frame Structure in stack |
When function a calls function B, the parameter passed to function B is included in the stack frame of function. When a calls B, the return address of function A (the address of the instruction that continues to be executed after the call is returned) is pushed into the stack, the position in the stack also specifies the end Of the stack frame. B's stack frame starts from the subsequent stack part, that is, the place where the frame pointer (EBP) is saved in the figure. Then, it is used to store any stored register value and the temporary value of the function.
Function B also uses stacks to store local variable values that cannot be placed in registers. For example, because the number of CPU registers is limited and it cannot store all the local data of the function, or some local variables are arrays or structures, you must use arrays or structure references for access. In addition, when the C language address operator "&" is applied to a local variable, We need to generate an address for the variable, that is, allocate a space for the address pointer of the variable. Function B uses the stack to save the parameters that call any other function.
Stack is extended to the low (small) Address direction, and ESP points to the elements at the top of the current stack. By using the push and pop commands, we can push data into the stack or pop up from the stack. For the storage space required for data without an initial value, we can do this by decreasing the stack pointer to an appropriate value. Similarly, by adding the stack pointer value, we can recycle the allocated space in the stack.
Command call and RET are used to process function calls and return operations. Call is used to push the return address to the stack and jump to the start point of the called function for execution. The return address is the address followed by the call command in the program. Therefore, when the called function returns, the execution will continue from this position. The returned command RET is used to pop up the address at the top of the stack and jump to the address. Before using this command, you should correctly process the stack content so that the content at the position indicated by the current stack pointer is the return address saved by the previous call command. In addition, if the returned value is an integer or a pointer, the register eax is used by default to pass the returned value.
Although only one function is being executed at a time point, we still need to determine that when a function (caller) calls other functions (called, the caller will not modify or overwrite the register content that the caller will use in the future. Therefore, Intel CPU adopts the Unified Register usage Convention that all functions must comply. This Convention specifies that the content of registers eax, EDX, and ECx must be saved by the caller. When function B is called by function a, function B can use them without saving the register content without destroying any data required by function. In addition, the contents of registers EBX, ESI, and EDI must be protected by caller B. When the caller needs to use any of these registers, the caller must first save the content in the stack and restore the content of these registers when exiting. Because caller A (or some higher-level functions) is not responsible for saving the register content, but the original values may need to be used in future operations. The register EBP and ESP must also follow the second convention.
2. function call example
As an example, we will observe the processing process of function calls in the following C program exch. C. The program exchanges values in two variables and Returns their difference values.
void swap(int * a, int *b){ int c; c = *a; *a = *b; *b = c;}int main(){ int a, b; a = 16; b = 32; swap(&a, &b); return (a - b);}
The swap () function is used to exchange the values of two variables. The main program main () in the C program is also a function (as described below). It returns the result after swap () is called. The stack frame structure of these two functions is shown in 3-5. It can be seen that the swap () function obtains its parameters from the stack frame of the caller main. The position information in the figure is relative to the frame pointer in the register EBP. The number on the left of the stack frame specifies the address offset relative to the frame pointer. In a debugger like GDB, these values are represented by a complement of 2. For example,-4 is expressed as 0 xfffffc, and-12 is expressed as 0xfffffff4.
The stack frame structure of the caller main () includes the storage space of local variables A and B, which is at the-4 and-8 offset relative to the frame pointer. Since we need to generate addresses for these two local variables, they must be stored in the stack rather than simply in registers.
|
Figure 3-5 stack frame structure of the calling function main and swap |
Run the command "gcc-wall-S-O exch. s exch. C "can generate the C language program assembly program exch. s code, as shown below (several lines of pseudo commands irrelevant to the discussion are deleted ).
. Text _ swap: pushl % EBP # Save the original EBP value and set the frame pointer of the current function. Movl % ESP, % EBP subl $4, % ESP # allocates space for local variable C in the stack. Movl 8 (% EBP), % eax # Take the function's 1st parameters. This parameter is a pointer to an integer value. Movl (% eax), % ECx # Get the content at the position indicated by the pointer, and save it to the local variable C. Movl % ECx,-4 (% EBP) movl 8 (% EBP), % eax # Get 1st parameters again, and then take 2nd parameters. Movl 12 (% EBP), % edX movl (% EDX), % ECx # Place the content referred to by 2nd parameters to the position referred to by 1st parameters. Movl % ECx, (% eax) movl 12 (% EBP), % eax # Get 2nd parameters again. Movl-4 (% EBP), % ECx # Put the content in local variable C at the position indicated by this pointer. Movl % ECx, (% eax) leave # restore the original EBP and ESP values (I .e. movl % EBP, % ESP; popl % EBP ;). RET _ main: pushl % EBP # Save the original EBP value and set the frame pointer of the current function. Movl % ESP, % EBP subl $8, % ESP # allocates space for integer local variables A and B in the stack. Movl $16,-4 (% EBP) # assign an initial value to the local variable (A = 16, B = 32 ). Movl $32,-8 (% EBP) Leal-8 (% EBP), % eax # Prepare to call the swap () function, take the address of local variable B, pushl % eax # is used as the call parameter and pushed into the stack. That is, first press the 2nd parameters. Leal-4 (% EBP), % eax # obtain the address of local variable A and use it as the 1st parameter to go to the stack. Pushl % eax call _ swap # Call the function swap (). Movl-4 (% EBP), % eax # Take the value of 1st local variable A, minus the value of 2nd variable B. Subl-8 (% EBP), % eax leave # restore the original EBP and ESP values (I .e. movl % EBP, % ESP; popl % EBP ;). RET
The two functions can be divided into three parts: "setting", initializing the stack frame structure; "subject", performing the actual calculation operation of the function; "end ", stack status is restored and returned from the function. For the swap () function, part of the code is set to 3 ~ Five rows. The first two rows are used to set the frame pointer for saving the caller and the stack frame pointer for this function. The second row allocates space for the local variable C by moving the stack pointer ESP 4 bytes down. 6 ~ The 15 rows are the main part of the SWAp function. 6th ~ Row 8 is used to retrieve the caller's 1st Parameters & A and use this parameter as the address to retrieve the stored content to the ECX register, save it to the space allocated for the local variable (-4 (% EBP )). 9th ~ 12 rows are used to fetch 2nd & B parameters and put the content of the parameter value as the address to the address specified by 1st parameters. 13th ~ Row 15 stores the value stored in temporary local variable C to the address specified by the 2nd parameter. 16th ~ The 17 rows are the end part of the function. The leave command is used to process stack content to prepare for return. It is equivalent to the following two commands:
Movl % EBP, % ESP # restore the original ESP value (pointing to the beginning of the stack frame ). Popl % EBP # restore the original EBP value (usually the caller's frame pointer ). |
This part of the Code restores the original values of ESP and EBP when entering the swap () function, and executes the returned command ret.
19th ~ Row 21 is the setting part of the main () function. After saving and resetting the frame pointer, main () allocates space for local variables A and B in the stack. 22nd ~ 23. The two local variables are assigned values. From 24th ~ The 28 rows show how swap () function is called in main. First, use the leal command (take valid address) to obtain the addresses of variables B and A and press them into the stack respectively, and then call the swap () function. The order in which the variable addresses are pushed into the stack is exactly the same as that in the function declaration Parameter order. That is, the last parameter of the function is first pushed into the stack, and the 1st parameter of the function is pushed into the stack before calling the FUNCTION command call. 29th ~ The 30 th line will subtract two exchanged numbers and put them in the eax register as the return value.
From the above analysis, we can see that the C language temporarily stores the value of the called function parameter on the stack when calling the function, that is, the C language is a value-passing language, no direct method can be used to modify the value of the caller variable in the called function. Therefore, to achieve the purpose of modification, you need to pass the pointer of the variable to the function (that is, the address of the variable ).