Program memory Layout-the point of the function call stack, the memory layout function call

Source: Internet
Author: User

Program memory Layout-the point of the function call stack, the memory layout function call

[Note] This article is a summary of "programmer's self-cultivation", which is mixed with some personal understandings. If there is anything wrong, you are welcome to make a brick.


Program memory Layout


Modern applications run in a virtual memory space. In a 32-bit system, the memory space has a 4 GB addressing capability. Modern applications can directly use 32-bit addresses for addressing. The entire memory is a unified address space, and users can use a 32-bit pointer to access any memory location.

It has different statuses in different address ranges of processes,By default, Windows allocates 2 GB space of the high address to the kernel. By default, Linux allocates 1 GB space of the high address to the kernel.The specific memory layout is as follows:


(1)Code Area: This area stores the code of the binary machine that is loaded and executed, and the processor retrieves and executes the code in this area.

(2)Data zone:Used to store global variables and constants.
(3)Heap Area: The process can dynamically request a certain size of memory in the heap area and return it to the heap area after it is used up. Dynamic Allocation and recovery are characteristic of the heap zone.
(4)Stack Zone: Dynamically stores the relationships between functions to ensure that the called functions are resumed to the master function when they are returned.


Programs Written in advanced languages are compiled with links and eventually become executable files. After an executable file is loaded and run, it becomes a so-called process.
The binary-level machine code contained in the executable code segment will be loaded into the memory code area (. text );
The processor extracts the commands and operands one by one from the memory area and sends them to the logical unit for calculation;
If dynamic memory is requested in the code, a suitable area will be allocated in the heap area of the memory and returned to the Code in the code area for use;
When a function is called, information such as the call relationship of the function is dynamically stored in the memory stack for the processor to return the primary function after executing the code of the called function.

If we think of computers as an orderly factory, we can get the following analogy:
* CPU is a worker.
* Data areas, heap areas, and stack areas are used to store raw materials, semi-finished products, finished products, and other things.
* Commands stored in the code area tell the CPU what to do, how to do it, where to get raw materials, what tools to use, and what warehouse to store finished products after completion.
 


Stack

In a classic operating system, the stack is alwaysDownward Growth. The top of the stack is located by the esp register. The stack operation reduces the stack top address, and the pop-up operation increases the stack top address.


What happens when a function is called?

For example:

int main(void){foo(1,2,3) ;return 0 ;}

When the method main needs to call foo, its standard behavior:

1. In the call stack of the main method, push the foo parameters from right to left to the stack.
2. push the next instruction address (return address) of the current instruction of the main method to the stack. (Hidden in the call command)
3. Use the call command to call the target function body foo.


Note that the above three steps are in the main call stack, where ebp stores its stack bottom, while esp stores its stack top.
NextFoo Function:

1,Push ebp: Push the current value of ebp to the stack to save the ebp.
2,Mov ebp, esp: Assign the esp value to ebp, which means that the call stack of the foo method is entered.
3,[Optional] sub esp, XXX: Allocate XXX bytes of temporary space on the stack. (Raise the stack top) (the compiler determines the size of the temporary space based on the total size of local variables in the function)
4,[Optional] push XXX: Save some register values.

[Note: the value of the push register. This operation can be performed either before or after the temporary space is allocated. "Programmer self-cultivation" is written after the temporary variable is opened]
(The compiler stores the location in the temporary space corresponding to the corresponding variable name)

After the foo method is called, perform the inverse operation in the previous stage:

1,Save the return value:Generally, the return value of the function is stored in the register eax.
2,[Optional] values of some pop registers.
3,Mov esp, ebp: Resumes esp and recycles local variable space. (Restore the original stack top)
4,Pop ebp:Assign the value at the top of the stack to ebp, that is, restore the stack base of the main call stack. (Restore the original stack bottom)
5,Ret: Obtain the previously reserved return address from the top of the stack and jump to this position to continue execution.


The main method first pushes the parameters required by the foo method into the stack, and then changes the ebp to enter the call stack of the foo method.

Therefore, if you need to access those parameters in the foo method, you need to perform the access based on the value in the current ebp and then offset it to the high address-because the high address is the call stack of the main method.

That is to say, the address ebp + 8 stores the 1st parameters of the foo method, the address ebp + 12 stores the 2nd parameters of the foo method, and so on. So what is stored in the address ebp + 4? It stores the return address, that is, the address of the main method instruction that needs to be executed after the foo method returns.

Note]
If you want to save the stored registers (such as ESI and EDI) of the called function in the function, the compiler saves the EBP value while saving the value, or saves it late until the local variable space is allocated. The stack frame does not specify a standard storage location for the space where the called function saves registers.

[Note: several related registers (for details, see Wang Shuang's compilation )]
(1)EspThe extended stack pointer has a pointer in its memory, which always points to the top of the stack frame at the top of the system stack.
(2)Ebp: Extended base pointer, with a pointer in its memory, always pointing to the bottom of a stack frame at the top of the system stack. (Ebp is fixed in the current stack frame, so the function accesses most data based on ebp)
(3)EipExtended instruction pointer, with a pointer in its memory, which always points to the address of the next command to be executed. It can be said that if the content of the EIP register is controlled, the process is controlled-where we direct the eip, and where the CPU executes the command. The eip can be implicitly changed by commands such as jmp, call, and ret (in fact, it has been changing all the time) (the ret command is to play the return value address saved at the top of the current stack to the eip)

The size of the function stack frame is not fixed. It is generally related to the number of local variables of the corresponding function. During function running, the stack frame size is constantly changing.


Call Convention

The caller and the called party of a function must follow the same conventions to call the function correctly. Such conventions are called ** call conventions **.

* Order and method of passing function parameters
Call Convention should specify the parameter pressure stack sequence: from left to right, or from right to left. Some call conventions also allow the use of registers to pass parameters to improve performance.
* Stack Maintenance Method
(Who is responsible for the pop-up of parameters ?)
When the called function returns, all the parameters pushed into the stack need to be popped up so that the stack remains consistent before and after the function call. This pop-up can be completed by the function caller or by the function.
* Name modification rules
To distinguish call conventions during linking, call conventions must modify the names of functions. Different call conventions have different name modification policies.


Call Convention Who is the elastic parameter? Parameter pressure stack direction Name Modification
Cdecl Caller From right to left Underline + function name
Stdcall Transferee From right to left Underline + function name @ parameter number
Pascal Transferee From left to right Complicated
Fastcall Transferee Add the first two parameters to the Register. @ Function name @ parameter number

_ Cdecl

CDeclaration stands for the default function call method of C language: All parameters are pushed to the stack from right to left.Cleared by callerIs called manual stack clearing. The called function does not require the caller to pass many parameters. Too many or too few parameters are transferred by the caller, and even different parameters do not produce compilation errors. (Typical functions such as printf)

_ Stdcall

It is short for "Standard Call" and is the Standard Call method of c ++. All parameters are pushed to the stack from right to left. Parameters in these stacksCleared after the function is returnedThe command used is retn X, which indicates the number of bytes occupied by the parameter. The CPU automatically pops up X bytes of stack space after ret. It is called automatic stack clearing. During compilation, the function must determine the number of parameters, and the caller must strictly control the generation of parameters. There must be no more or fewer parameters. Otherwise, an error will occur after the return.
Almost every windows api function we write is of the _ stdcall type. Because different compilers generate stacks in different ways, callers may not be able to clear the stack normally. If _ stdcall is used, the above problem is solved, and the function solves the clearing work by itself. Therefore, in cross-platform calls, we use _ stdcall (although sometimes it appears as WINAPI ).
However, when the parameters of a function such as printf () are variable and variable, the caller cannot know the length of the parameter in advance, the cleanup operation afterwards cannot be performed normally. Therefore, we can only use \ _ cdecl in this case. Here we have a conclusion that if your program does not involve variable parameters, it is best to use the _ stdcall keyword.

 


Function return value transfer

Generally,Register eax is the channel to pass the returned valueThe function stores the returned values in eax, and then the caller of the function reads the eax.

However, eax only has 4 bytes. How is the returned value greater than 4 bytes passed?

For ~ 8-byte data is generally returned in combination with eax and edx. Eax stores the 4-byte lower returned values and edx stores the 4-byte higher returned values.

For response types that exceed 8 Bytes:

typedef struct big_thing{char buf[128] ;} big_thing ;big_thing return_test();//---------------------------int main(void){big_thing n = return_test() ;}big_thing return_test(){big_thing b ;b.buf[0] = 0 ;return b ;}
Analyze this Code:
First, in the main function, there must be a 128-BYTE Variable n. In the called function return_test, there must be a 128-Byte variable B.
Then how does the called function return a 128-Byte variable? Can I directly copy data from B to n? In this way, you directly change the value of the variable in the main function. It does not seem to comply with the rules for passing values in the return value.

In fact, how does the compiler design the transfer of large-size return values?
* The main function opens up an extra space in the partial Variable Area of its stack and uses a part of it as the temporary object temp that passes the returned value.
* Pass the temp object address as a hidden parameter to the return_test function.
* The return_test function copies the data to the temp object and Use eax to transfer the address of the temp object.
* After return_test is returned, the main function points to Copy the contents of the temp object to n.



(Return_test does not have a real parameter. Only one "pseudo parameter" is passed in by the function caller)

[Summary]
Transfer of function return values: return values smaller than 8 bytes, in the form of ** registers. Larger than 8 bytes, with the same size newly opened in the main function Intermediate variable tempIs a transit.

The C language uses a temporary memory area on the stack as a transit for the return value type with a large size. The returned object will be copied twice. Therefore, do not return large objects easily.


Transfer the return value of the C ++ Function

C ++ processes large return values slightly differently. It may be like C, copying a temporary object to the stack once, and then copying the temporary object to the object storing the returned values.
However, some compilers optimize RVO (Return Value Optimization) for the returned Value. In this way, the object copy will be reduced once, that is, there is no temporary object temp, and it is directly copied to the corresponding object of the main function.
For example:

# Include <iostream> using namespace std; struct cpp_obj {cpp_obj () {cout <"ctor \ n";} cpp_obj (const cpp_obj & c) {cout <"copy ctor \ n";} cpp_obj & operator = (const cpp_obj & rhs) {cout <"operator = \ n"; return * this ;}~ Cpp_obj () {cout <"dtor \ n" ;}}; cpp_obj foo () {cpp_obj B; cout <"before foo return \ n"; return B ;} int main () {cpp_obj n; n = foo (); cout <"before main return \ n"; return 0 ;} // --------- running result --------- ctorctorbefore foo returnoperator = dtorbefore main returndtor
This example is compiled and run in g ++. In this example, no temporary variable temp is set, but the value of the local variable of the called function is directly copied to the main function.

NRV

C ++ has a more radical optimization strategy for return values --NRV(Named Return Value) Optimization of the specified Return Value
This kind of optimization is even connectedThe local variables in the called function are no longer needed.! Operate on objects directly in the main function (reference of objects passed in according to hidden parameters ).
Pay attention to two points about NRV: (if there is something wrong with it, please make a brick)

1. In the called function foo, the local variable declaration is the default constructor that calls the object in the main function. The object definition in main only opens up a space and does not call the constructor at that time.
2. Why does CObj obj = foo () trigger NRV optimization in the main call function?
Separate write: CObj obj; obj = foo (); without NRV optimization?

Because:
The programmer must define a copy constructor for class X to trigger NRV optimization. Otherwise, the program will be executed in a slow way. (Our second method does not involve copying constructors, so NRV optimization is not triggered)
However, even if the current compiler removes the copy constructor in the class, there will also be NRV optimization, but the NRV will only be available when the subfunction is called at object initialization.
(Without NRV optimization, the called function generates a local object, but the local object is directly copied to the corresponding object of the main function, will not generate a temporary variable like C)

If you change the call method in the preceding example to: cpp_obj n = foo ();
The NRV optimization is triggered. The execution result is:

// Cpp_obj n = foo (); changed to: foo (n); // foo is actually changed to: void foo (cpp_obj & _ result) {// call _ result's default constructor _ result. cpp_obj: cpp_obj (); // processing _ result return;} // run the result --------- ctorbefore foo returnbefore main returndtor after NRV
(Note: Only calls in the form of CObj obj = foo (); will have NRV optimization !)

For more information about NRV optimization, see understanding the C ++ object model.

 

Heap

Heap is a huge memory space, which usually occupies the vast majority of the virtual address space. In this space, the program can request a piece of continuous memory and use it freely. The existing program in this space will remain valid until it voluntarily gives up. In C language, we can use the malloc function to apply for space on the stack.

Implementation of malloc:
The operating system kernel manages the address space of a process. It uses a system call. If malloc calls the system call to apply for memory, this work can be done.
However, the performance is poor because system calling is required every time you apply for space release. The overhead of system calling is large, and switching between kernel and user modes is performed.
A better practice is to apply for a suitable heap space from the operating system, and then manage the space by the program itself, managing heap space allocation is usually the running library of the Program (usually the shared library provided by the operating system ).
Malloc is actually packaging the functions in this shared library.

"Wholesale-retail" analogy:
The runtime database is equivalent to wholesale a large heap space to the operating system, and then retail it to the program. The Runtime Library must manage a program retail space and cannot sell a space twice.
When the space is insufficient, the runtime database is then wholesale to the Operating System (calling the corresponding OS system call ).
Note: This Runtime Library is generally provided to us by the operating system or language. It contains the heap space management algorithm and runs in the user mode.
(We can also implement this allocation algorithm by ourselves, but the commonly used allocation algorithms have been implemented countless times by various systems and libraries, and there is no need to reinvent the wheel)
Each process has a default heap when it is created. This heap is created at the process startup and exists until the process ends. In Windows, the default heap size is 1 MB.
(Note: In Windows, the heap does not necessarily grow upwards)

Q: Is the space requested by malloc continuous?
A: If "space" refers to a virtual space, the answer is continuous. That is, the space returned after each malloc allocation can be considered as a continuous address. (There may be multiple heaps in the process, but the maximum heap space that can be allocated at a time depends on the largest heap)
If the space value is physical space, it is not necessarily continuous, because a continuous virtual address space may be pieced together by several discontinuous physical pages.

Heap space management algorithm

* 1. Idle linked list method
Connect the idle blocks in the heap to the linked list, and find the appropriate blocks when the user requests them.

* 2. Bitmap (this idea is good)
Divide the entire heap into a large number of blocks of the same size. An integer is allocated to the user upon request. We can use the bits of an integer array to record the allocation status.
(Each block has only three States: Header, use, and idle. Two bits can be used to represent a block. Therefore, it is called a bitmap. Header is used to mark the role of the demarcation)




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.