[Dynamic stack memory allocation] inside alloca

Source: Internet
Author: User

Ah, I started cleaning when I got home from work. It wasn't done until the early morning. It was really tired. However, it is planned that this article must be completed today, so you can't sleep without writing it. Let's get into the subject as soon as possible!

We often use functions or operators such as malloc or new to dynamically allocate memory. The memory here refers to heap memory and requires the programmer to manually release the allocated memory. Malloc corresponds to free, and new corresponds to delete. As for mixed use, it is not impossible, as long as you ensure the correctness of the logic and functions, but also within the limits of specifications. Here I want to put aside a question. I personally think that as long as you fully understand something with similar characteristics, you will be very clear about their differences. On this basis, whatever you use, you just need to consider some external factors, such as the specifications mentioned above.

This article discusses the dynamic memory allocation on the stack. The allocated memory is the stack memory. The stack memory has a feature that we do not need to manually release the applied memory. Stack memory is opened and recycled by a stack pointer. stack memory increases from a high address to a low address. When it increases, the stack pointer moves toward a low address, the pointer address value is also reduced accordingly. When recycling, the stack pointer moves to the high address direction, and the address value increases. Therefore, the stack memory development and recovery are only the addition and subtraction of pointers, which can improve the performance compared with the heap memory allocation. With these features, you can also have a better understanding of why the memory is called "stack.

We all know that before the c99 standard, the C language does not support variable-length arrays. If you want to dynamically open up stack memory to achieve the function of variable-length arrays, you have to rely on alloca functions. In fact, in GCC, the variable-length array backend of c99 also relies on alloca to dynamically allocate stack memory. Of course, it cannot be implemented simply by calling alloca, alloca may be optimized and inline (of course you can say this is being called ). Here we will not be entangled in this issue, and this article is not the focus. In reality, alloca functions are not recommended. There are many insecure factors. We will not discuss this issue for the time being. The purpose of this article is to understand the principles, gain awareness, and even become transparent.

Generally, compilers provide CRT libraries, for example, many versions of VC. The differences between CRT libraries are large, the new version of CRT generally has many stricter checks and some security mechanisms. This article takes vs2008 as an example. It provides the corresponding _ alloca function for alloca. the compiler will compile it into the _ alloca_probe_16 function, this function is located in the vc_dir \ Vc \ CRT \ SRC \ intel \ alloca16.asm Assembly source file. It is an assembly version CRT related function provided by Microsoft. In this file, there are two versions, one is 16-byte alignment _ alloca_probe_16, and the other is 8-byte alignment _ alloca_probe_8. The Code is as follows:

.xlist        include cruntime.inc.list

extern  _chkstk:near

; size of a page of memory

        CODESEG

page

public _alloca_probe_8_alloca_probe_16 proc ; 16 byte aligned alloca push ecx lea ecx, [esp] + 8 ; TOS before entering this function sub ecx, eax ; New TOS and ecx, (16 - 1) ; Distance from 16 bit align (align down) add eax, ecx ; Increase allocation size sbb ecx, ecx ; ecx = 0xFFFFFFFF if size wrapped around or eax, ecx ; cap allocation size on wraparound pop ecx ; Restore ecx jmp _chkstkalloca_8: ; 8 byte aligned alloca_alloca_probe_8 = alloca_8 push ecx lea ecx, [esp] + 8 ; TOS before entering this function sub ecx, eax ; New TOS and ecx, (8 - 1) ; Distance from 8 bit align (align down) add eax, ecx ; Increase allocation Size sbb ecx, ecx ; ecx = 0xFFFFFFFF if size wrapped around or eax, ecx ; cap allocation size on wraparound pop ecx ; Restore ecx jmp _chkstk_alloca_probe_16 endp end

The 16-byte alignment version is compiled by default. Take a closer look at the 16-byte alignment here. Lea ECx, [esp] + 8 get the ESP value before entering this function and write it into ECx. The reason for adding 8 is obvious. The first four bytes are the saved ECx value, the last four bytes are the return address of the function. After 8 is added, the ESP value of the function called by the previous layer is obtained. There is no parameter pressure stack, and the parameter is transferred by registers. Therefore, the ECX value can be assumed to be a fixed value (this value is also at least 4 bytes aligned). Then, in the following three Assembly codes, eax refers to the number of bytes in stack memory to be opened for external input. The number of bytes is always 4 bytes aligned. So sub
The results after the statement ECx and eax can be 4-byte aligned and non-16-byte aligned. In this way, after and ECx, (16-1) and add eax and ECx, the value of eax is not 16 bytes aligned. As for the 8-byte alignment version, you can try to determine whether the eax is non-8-byte alignment. This is not a difficult issue.

In this function, we find that the stack memory has not been actually opened, because ESP (that is, the stack pointer mentioned above, that is, the stack top pointer, the value of TOS in the assembly code above, that is, the value of top of stack: Top of stack, has not changed since eax (size of memory applied. Then we noticed that after pop ECx restores the ECX value (because this function requires ECx to assist, we push ECx to save the function, and then pop to restore the function, there is also a JMP jump, jump to _ chkstk, this function is obvious, meaning: Check stack, used to check whether the stack overflows. This function is usually inserted by the compiler into a function header with a certain size. It is used to perform stack memory overflow check when entering the function. For example, you define a large array in a function, in this case, the compiler will force the _ chkstk function to be inserted for check (here, the methods of other compilers are not necessarily the same under a single Vc ).

So here we can guess that the _ alloca_probe_16 function is only responsible for calculating the number of bytes of stack memory allocated after the actual alignment and saving it to eax, because the _ chkstk function also uses the value of eax, the parameters are also transmitted through registers. In addition, we can see that the _ alloca_probe_16 function is closely related to the _ chkstk function, which is directly in the past of JMP.

Okay, let's take a look at the _ chkstk function. This function is located in the previous directory and is also an assembly source file: chkstk. ASM. The Code is as follows:

.xlist        include cruntime.inc.list

; size of a page of memory

_PAGESIZE_      equ     1000h

        CODESEG

page

public _alloca_probe_chkstk proc_alloca_probe = _chkstk push ecx; Calculate new TOS. lea ecx, [esp] + 8 - 4 ; TOS before entering function + size for ret value sub ecx, eax ; new TOS; Handle allocation size that results in wraparound.; Wraparound will result in StackOverflow exception. sbb eax, eax ; 0 if CF==0, ~0 if CF==1 not eax ; ~0 if TOS did not wrapped around, 0 otherwise and ecx, eax ; set to 0 if wraparound mov eax, esp ; current TOS and eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundarycs10: cmp ecx, eax ; Is new TOS jb short cs20 ; in probed page? mov eax, ecx ; yes. pop ecx xchg esp, eax ; update esp mov eax, dword ptr [eax] ; get return address mov dword ptr [esp], eax ; and put it at new TOS ret; Find next lower page and probecs20: sub eax, _PAGESIZE_ ; decrease by PAGESIZE test dword ptr [eax],eax ; probe page. jmp short cs10_chkstk endp end

This function is a little more complex than before, but the code is very clear and easy to understand. Let's explain it. Let's take a look at the lea ECx, [esp] + 8-4 statement. Compared with the _ alloca_probe_16 assembly code, there is an additional 4 reduction, here, we subtract 4 because the JMP is used between the _ alloca_probe_16 function and the _ chkstk function, instead of the call function. Therefore, there is no return address, and only four bytes of the saved ECx value are allowed, therefore, the ESP value can be obtained if the offset of 4 bytes is less. Because the _ alloca_probe_16 function maintains a stack balance and does not change the ESP value, the ESP obtained in the _ chkstk function is the same as the ESP obtained by the _ alloca_probe_16 function. It is also stored in ECx. Like the logic of the _ alloca_probe_16 function, ECx (esp value) minus eax (the size of the stack memory to be allocated has been aligned by the _ alloca_probe_16 function ). After this sentence, the ECX value is the new ESP value. If the stack does not overflow, the ESP will be set to this new value, so the stack memory is allocated successfully.

Continue the analysis, followed by the following three sentences. SBB eax, eax, and SBB are commands for bitwise subtraction. If the first sub ECx and eax have a bitwise (ECx is smaller than eax), then the value of eax after SBB is 0 xffffffff, then, not eax, and eax will change to 0, and then ECx and eax, then ECX will change to 0, which means that the new ESP value is 0. Let's take a look at it and analyze it down later. Looking at the previous steps, there is a borrow space for sub ECx and eax. Why is there such a situation? Does the _ alloca_probe_16 function not check the size of the applied memory? Indeed, he does not care about how many bytes you want to apply for. He just works with _ chkstk to let _ chkstk know that the applied memory is too large, when the value is too large, _ chkstk can be used to check and throw an exception. Let's take a look at how the _ alloca_probe_16 function works with the _ chkstk Function check. This must return to _ alloca_probe_16.
In the function Assembly source code, see the following three sentences:

add     eax, ecx                ; Increase allocation Sizesbb     ecx, ecx                ; ecx = 0xFFFFFFFF if size wrapped aroundor      eax, ecx                ; cap allocation size on wraparound

Eax is the applied size, and ECx is the new ESP value, which is calculated by sub ECx and eax. Combine these three statements of code with the code of the _ chkstk function. If eax is too large (the application space is too large), after adding eax and ECx, it will overflow, that is, the CF bit is 1. Then execute the next SBB ECx and ECx, which is equivalent to: ECx = ECx-cf = 0-1 =-1 = 0 xffffffff. Then, in or eax and ECx, eax is 0 xffffffff, that is, the size of the applied space passed to the _ chkstk function. Then let's look at the previous analysis of the _ chkstk function. If eax is 0 xffffffff, it will certainly overflow sub, so ECx (New ESP value) is finally 0. In another case, if the value of eax in _ alloca_probe_16 is greater than the value of ECx, the sub will overflow.
After ECx, (16-1), add eax and ECx again. It is assumed that no overflow will occur at this moment. After SBB, ECx is 0, and then or eax. ECX will not affect the value of eax, however, at this time, eax is greater than ECx (esp value. When eax is passed in _ chkstk, sub overflows. Like the result of eax 0xffffffff, the ECx (esp value) is 0. Therefore, according to the above two cases, the _ alloca_probe_16 function and the _ chkstk function have some cooperation. It can also be said that the _ alloca_probe_16 function adapts to the _ chkstk check scheme.

Let's continue to analyze _ chkstk down. Let's look at the following two sentences: mov eax and ESP handed over the current ESP value to eax, note that the ESP value here is the ESP after the original ECx value is saved in _ chkstk. This ESP is also the initial Lea ECx, the upper ESP value obtained from [esp] + 8-4 minus 4 (4 bytes occupied by push ECx ). After obtaining the current ESP value, and eax, not (_ pagesize _-1), _ pagesize _ is 0x1000, that is, 4096 bytes (4 kb ), this is one of the windows page memory size rules. This Code removes all the remaining bytes of the current ESP page and ends at the beginning of the next page. This facilitates subsequent stack overflow checks.

Then, there are two tags cs10 and CS20. cs10 starts with determining whether ECx is smaller than eax. At this moment, eax is already at the beginning of a page. If ECx is smaller than the address value stored in this eax, jump to the CS20 tag. The code in the CS20 tag is very simple. When you enter the tag, the eax will be reduced by one page of memory, followed by the statement test dword ptr [eax] And eax, this statement has a memory access. If the memory value stored in eax is unreadable, an exception is thrown. This is exactly what we use. When there is no exception here, we will jump to the cs10 tag to continue the comparison. If it is still small, we will subtract one page before accessing it, until ECx is greater than or equal to eax or an exception is thrown. Think about the above analysis logic. If the applied space is too large and the ECX value is 0, then in CS20, the value 0 will always be less than eax, in this way, eax will be reduced by 4 K until eax is 0, and an exception will be thrown if it is not reduced to 0. When eax is reduced to a certain value
The dword ptr [eax] And eax statements throw a stack overflow exception, for example:

If the execution continues, an access exception occurs. If the applied size does not cause stack overflow, when the eax is reduced to a certain value, ECx is greater than or equal to eax, or ECX is greater than or equal to eax when the first entry, the logic of the normal open space is entered:

mov     eax, ecx                ; yes.pop     ecxxchg    esp, eax                ; update espmov     eax, dword ptr [eax]    ; get return addressmov     dword ptr [esp], eax    ; and put it at new TOSret

The first line is to assign ECx (New verified ESP) to eax, then restore the ECX value, and the third line is to exchange the current ESP value with eax. ESP is the new value after the space is opened. It must be smaller than the value of eax at the moment (the stack is extended to the lower address ). Then there is the 4th sentence. At this time, eax is the ESP value after pop ECx, that is, the call _ alloca_probe_16 function is pushed into the ESP value after the return address. Therefore, after the fourth sentence is executed, the value of eax is the return address of the _ alloca_probe_16 function. We are going to return it to the upper layer. The upper layer here is not the _ alloca_probe_16 function, because they are not called, but JMP, the returned address does not exist. The upper layer is the upper layer of the _ alloca_probe_16 function. Row 3 stores eax in the memory directed by the current esp. Because the next instruction RET is about to read this address and return it to the upper layer, see Inline for the Principles.
Hook (monitor any function), this article has the same usage.

The whole process is like this. In fact, alloca is useful in many practical projects written in C language. Personally, I think that no matter what advantages and disadvantages he has, as long as he understands these features, he can completely circumvent his shortcomings and take advantage of them. In addition, it is indeed possible to dynamically allocate a proper amount of stack space to achieve some performance. This article is only for the purpose of introducing its principles and details, and does not discuss disruptive topics here.

If you want to use alloca, it can be very simple to use, as shown below:

void func( void ){    int* p = ( int* )alloca( 4 );    *p = 100;}

You do not need to manage the release by yourself. When the function ends, esp balances. In addition, the compiler may make some adjustments in the background based on the size change applied by alloca. For example, when the applied memory is small, alloca is directly compiled into _ chkstk, instead of calling the _ alloca_probe_16 function, this is also a small optimization. For another example, in vs2003, alloca is directly compiled into _ chkstk regardless of the size of the requested space. Because the CRT of vs2003 does not provide the implementation of the _ alloca_probe_16 function.

The alloca mentioned above is actually a macro definition in the vc crt, # define alloca _ alloca. In addition, there are some CRT macro definitions, such as _ malloca. This macro definition is also encapsulated in a layer. In debug, _ malloca calls malloc, and in release, when the applied size is smaller than a certain value, alloca is called; otherwise, malloc is called. Therefore, you need to call _ freea to release the memory. _ freea will determine whether it is malloc allocated or alloca allocated based on the mark. If it is malloc allocated heap memory, it will call free, the stack memory allocated by alloca is not released. The Code is as follows:

// _ Malloca definition # If defined (_ Debug) # If! Defined (_ crtdbg_map_alloc) # UNDEF _ malloca # DEFINE _ malloca (size) \__ Pragma (warning (Suppress: 6255) \ _ markallocas (malloc (size) + _ alloca_s_marker_size), _ alloca_s_heap_marker) # endif # else # UNDEF _ malloca # DEFINE _ malloca (size) \__ Pragma (warning (Suppress: 6255 )) \ (size) + _ alloca_s_marker_size) <= _ alloca_s_threshold )? \ _ Markallocas (_ alloca (size) + _ alloca_s_marker_size), _ alloca_s_stack_marker): \ _ markallocas (malloc (size) + _ alloca_s_marker_size), _ percent )) # endif // _ freea definition _ crtnoalias _ inline void _ crtdecl _ freea (_ inout_opt _ void * _ memory) {unsigned int _ marker; If (_ memory) {_ memory = (char *) _ memory-_ alloca_s_marker_size; _ marker = * (unsigned int *) _ memory; If (_ marker = _ alloca_s_he Ap_marker) // determines whether the heap flag is {free (_ memory) ;}# if defined (_ asserte) else if (_ marker! = _ Alloca_s_stack_marker) {_ asserte ("upted pointer passed to _ freea", 0 ));} # endif }}// _ markallocas definition _ inline void * _ markallocas (_ out_opt _ crt_typefix (unsigned int *) void * _ PTR, unsigned int _ marker) {If (_ PTR) {* (unsigned int *) _ PTR) = _ marker; // mark, _ alloca_s_stack_marker or _ alloca_s_heap_marker _ PTR = (char *) _ PTR + _ alloca_s_marker_size;} return _ PTR ;}

[Extension]

Here, we can extend the usage of a play, that is, when writing a C language program, multiple function parameters are pointers and the number of parameters is the same. The pointer parameters of these functions are of different types, there is a template in C ++, but not in C. To implement something similar to this function, we can use alloca to apply for the parameter space and then call the function. The Code is as follows:

# Include <stdio. h> # include <malloc. h> void func (char * P) {printf ("% s \ n", P);} void chk (void * Arg) {If (void **) arg-& Arg! = 1) // check whether the parameter location is close to the memory address of Arg _ asm int 3 // if it is close, after the CHK is executed, ESP refers to the space applied by} // alloca. Therefore, when fun is called, The typedef void (* functor) (void) parameter is provided; int main (void) {char * STR = "12345"; int * Arg = (int *) alloca (4); functor fun = (functor) func; * Arg = (INT) STR; chk (ARG); (* Fun) (); Return 0 ;}

Here is just a simple example. Since the space applied by alloca will be recycled at the end of the function, and the call of the fun pointer is not pushed into the parameter, therefore, no add ESP exists after fun ends. The func function is a _ cdecl call Convention and does not balance the stack internally. Therefore, the entire stack frame is balanced.

PS: This example is purely fun. I only know the principle of this example. It is more complex and has not been tested or in-depth.

It may have been half past three am without knowing how it works. This article may be awkward for anyone familiar with the compilation. you can skip the analysis and I should go to bed! Welcome!

* **************** If You Need To reprint it, please indicate the source: Region **********************


 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.