Analysis of process invocation of C language from the perspective of Assembly

Source: Internet
Author: User

? More technical dry please poke: listen to the Cloud blog

Definition of basic terms

1. The system stack is a memory area located at the end of the process address space.

2. When the data is stacked, the stack is increased from top to bottom, and the memory area is used to provide memory for local variables of the Function. It also supports passing arguments when a function is Called.

3. If a nested process is called, the stack will grow from top to bottom and accept a new activity record (activation Record) to hold all the data needed for a process.

4. The active record of the current execution process is defined by the frame pointer at the top of the tag (frame point) and the stack pointer (stack point) at the bottom of the Marker.

5. While the process is executing, its top limit is fixed, but the bottom limit is extensible (when more memory space is required).

Analyze stack frames (analysis Below)

The 2nd stack frame is analyzed as Follows:

1, at the top of the stack frame is the return address, as well as the saved old frame pointer. The return address specifies the memory address of the control flow of the code at the end of the current process, while the old frame pointer is the frame pointer of the previous active Record. At the end of the current procedure, the value of the frame pointer can be used to reconstruct the stack frame of the calling procedure, which is important when trying to debug the call stack Backtracking.

2. The main part of the activity record is the memory space allocated for the procedure call local Variable. In c, this variable is also called an automatic variable (automatic variable).

3. When a function is called, the value passed to the function as a parameter is stored at the bottom of the Stack.

4, All common computer architectures are provided with the following two stack operation Instructions:

    • The push instruction places a value on the stack and subtracts the stack pointer esp from the number of memory bytes consumed by that Value. The end of the stack is moved down to a lower address;

    • The pop instruction pops a value from the stack and increments the value of the stack pointer esp, which means that the end of the stack is moved up.

5. General Architecture also provides two instructions for calling and exiting functions (automatically returning to the calling process), and they automatically manipulate the Stack:

    • The call instruction presses the current value of the instruction pointer to the stack and jumps to the starting address of the called Function. Call Command: at the/t assembly, call Foo (foo is a label) is equivalent to the following assembly Directives: pushl%eip, MOVL f,%eip;

    • The return command pops back the address from the stack and jumps to that address.  The implementation of the process must take Rerurn as the last instruction, and the return address placed on the stack by call is at the bottom of the stack (actually at the bottom of the previous activity record, at the top of the current activity record). RET directive: in the/t assembly, ret is equivalent to the following assembly Instructions: Popl%eip

Procedure call two constituent steps

1. Create a list of parameters in the Stack. The first argument passed to the called function is last in the stack (right to left). This enables C to pass a variable number of parameters and then eject it from the stack (pop).

2. Call calls, which pushes the current value of the instruction pointer (the next instruction after call), and the control of the code flows to the called Function. The called procedure is responsible for managing the frame pointer ebp, which needs to perform the following steps:

    • The previous frame pointer presses the stack, so the stack pointer moves down.

    • Copy the current value of the stack pointer to the frame pointer, marking the starting position of the current executing Function's stack Area.

    • Executes the code for the current Function.

    • At the end of the function, the old frame pointer is stored at the bottom of the Stack. Its value pops from the stack to the frame pointer register (ebp), which points to the start position of the stack area of the previous Function. The return address of the stack is now low on the stack when the call instruction is executed on the current Function.

    • Call return, which pops the return address from the STACK. The CPU shifts to the return address, and the control flow of the code is returned to the calling Function.

Specific examples of C language analysis

At first glance, This approach seems a bit confusing, so let's start with a simple C language example:

On the IA-32 system, The assembly code itself must be given by The-t Notation.

It is sufficient to summarize the following 5 rules in the Assembly grammar.

1. Registers are referenced by prefixing the name with a percent sign (%) prefix. Example: in order to use the EAX register,%eax will be used in the Assembly Code. (if you assemble inline in c, the C code must specify a climbed semicolon to form a percent semicolon in the output to the assembler).

2. The source register is always specified before the destination register. example, in the Mov statement, this means that the Mov a, B will copy the value in register a to register B.

3. The length of the operand is specified by the suffix of the Assembly Statement. B Mobility Byte,w represents word,l on behalf of Long. On IA-32, moving a long integer from the EAX register to the EBX register requires the MOVL%EAX,%EBX to be Specified.

4. Indirect Memory Reference (pointer Dereference) requires that the register be enclosed in parentheses, EXAMPLE:MOVL (%eax),%ebx The long copy in the memory address of the register eax to the EBX register.

5.offset (register) Specifies that the register value is combined with an offset to add the offset to the actual value of the Register. Example:8 (%eax) Specifies that the eax+8 be used as one operand. This notation is primarily used for memory access, such as specifying an offset to the stack pointer or frame pointer to access certain local Variables.

Let's analyze the MAIN.S assembly code:

1. Start the analysis from main Function. In the IA-32 system, the EBP register is used for the frame pointer (top of the stack), PUSHL%EBP pushes the value in the EBP register to the lowest position on the system stack, which causes the stack top pointer to move down 4byte, This is because 4byte is required on the IA-32 system to represent a pointer (the suffix L in pushl, which represents a long in the/t assembly).

2. Line 3rd, MOVL%ESP,%EBP Copy the value of the ESP (stack pointer) register into the EBP (frame pointer) register, and use the current stack pointer as the frame pointer for this Function.

3. Line 4th, Subl $24,%esp from the stack pointer minus 0x18 byte, so that the stack pointer moves down, the stack of space to increase the 0x18=24byte;

    • Adjusts the stack pointer to reserve space for local Variables. Local variables must be placed on the stack, in C code, A and b two local variables, both are integer variables, in memory requires 4 Bytes.

    • Because the first 4 bytes of the stack hold the old value of the frame pointer (the last active record), the compiler allocates the next two 4byte memory to the two local variables.

    • EBP-0XC the value of local variable A is 3; ebp-0x8 has the value of local variable B 4 (here you can see that the parameter is pressed from right to left).

4. Line 5th, Line 6th movl $0x3, -0xc (%ebp) movl $0x4, -0x8 (%ebp): in order to set the initial value to the allocated memory space (corresponding to the initialization of local variables in c), The compiler uses the processor pointer dereference Option. This two-day directive informs the compiler that referring to "frame pointer minus 12" the resulting value points to the location in Memory. Use the MOV instruction to write the value 3 to the Location.

    • The compiler then processes the 2nd local variable in the same way, with a slightly lower position on the stack, ebp-0x8 (ebp-8byte), and a value of 4.

5. Line 7th, Line 8th set 2nd parameter (b), 9th line, 10th Line is responsible for setting 1th parameters (a). Movl-8 (%ebp),%eax; movl%eax, 4 (%esp); movl-12 (%ebp),%eax; movl%eax, (%esp)

    • Local variables A and B must be used as parameters for the call to the add procedure that is about to be called. The compiler creates a list of parameters by placing the appropriate values at the end of the Stack.

    • As mentioned earlier, the first parameter is at the lowest part. The stack pointer is used to find the end of the Stack.

    • The corresponding location in memory is determined by the pointer dereference. Read the values of the two local variables on the stack into the eax register, and then write the value of the EAX to the corresponding position in the parameter List. (general)

6. Describes the state of the stack before and after the Add () function Call. You can now invoke the Add () function using the call Directive. The call instruction presses the EIP (instruction pointer register) into the stack, and the code control flow resumes execution at the beginning of the add Routine.

    • According to the calling convention, the routine first presses the previous frame pointer (ebp) into the stack and assigns the stack pointer (esp) to the frame pointer (ebp).

    • The parameters of the procedure can be found based on the frame pointer (ebp). The compiler knows that the parameter is at the end of the active record of the calling function, and that at the beginning of the current active record, two 4byte values (return address, old frame pointer) are Stored. Therefore the parameters can be accessed by ebp+8 and ebp+12.

    • The add instruction is used for addition, while the EAX register is used as the Workspace. The result value is stored in the register so that it can be passed to the calling function (this is main ()).

    • To return to the calling function, you need to perform the following two actions: <a> pops the stored frame pointer (ebp) from the stack to the EBP register using pop. The top of the stack frame reverts to the setting of main (), and the <b>ret returns the address from the stack to the EIP (instruction pointer) register, which controls the flow to that Address.

7. Since main () also uses another local variable (ret) to store the return value of the add () function, it is necessary to copy the value of the EAX register to the position of the RET on the STACK.

Summarize

About Us

enter Directive
in the/t assembly, enter is equivalent to the following assembly Directives:
PUSHL%EBP # WILL%EBP Press Stack
MOVL%ESP%EBP # Saves%esp to%ebp, which is the standard start of a function
leave Directive
in the/t assembly, leave is equivalent to the following assembly Directives:
Movl%ebp,%esp
POPL%EBP
call Command
in the/t assembly, call foo (foo is a label) is equivalent to the following assembly Directives:
Pushl%eip
MOVL F,%eip
RET Directive
in the/t assembly, ret is equivalent to the following assembly Directives:
Popl%eip

(personal Understanding) The assembly can be summed up in one sentence: the assembly is to move data between (register and Register) or (register and memory), meaning: data flows back and forth between memory and registers, and the more frequent the flow, the more complex the program, such as large software such as Office.

Analysis from the C language level:

EBP-XX is generally a local variable

Ebp+xx are generally parameters

Ebp+4 return address, commanding heights, Many attacks are attacking here, antivirus software, here is the focus will be scanned.

The space allocated in the C function stack is not zeroed, so the local variable must initialize the assignment when writing C Code.

The pass-through form of the parameter, the order of delivery, is not fixed (different function calling conventions).

About the difference between registers and memory:

Registers are located inside the CPU and perform fast, but more Expensive.

The memory speed is relatively slow, the cost is low, so the capacity can do very big.

There is no essential difference between registers and memory, which are the containers used to store data, all of which are fixed widths.

8 common registers commonly used in registers: eax,ecx,edx,ebx, ESP, EBP, ESI, EDI.

Several commonly used units of measurement in a computer: byte, word, dword:byte (bytes) = 8bit; Word (word) = 16bit; DWORD (double word) =32bit;

The amount of memory is particularly large, and it is not possible to name each memory unit with a number Instead.

We call the computer CPU is 32bit or 64bit, there are many books that the reason is called 32bit computer because the width of the register is 32bit, this is not accurate, because there are many registers is greater than 32bit.

Original Link: http://blog.tingyun.com/web/article/detail/1132

Analysis of process invocation of C language from the perspective of Assembly

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.