This article mainly observes the assembly code generated after the main function is compiled. For the sake of simplicity, the content of the main function is empty. (Note: The following content is from the Sina Blog)
The experiment method is as follows: first, compile the source code in different environments to collect the generated executable files. Then, use IDA Pro (Version 5.5. Here we will like the powerful Ida !) Disassembly; finally, observe the assembly code of the main function (all assembly code grids are intel-style) for analysis and comparison.
This article focuses on some of the most basic concepts, helping readers get familiar with the Assembly codes generated in various environments and better perform binary analysis. Note that, at the C language level, the main function is the initial entry of the program, but in fact for executable files, the First Command actually executed by the CPU is often not the First Command of the main function assembly code. Here, we only analyze the assembly code of the main function, and ignore other parts of the executable file.
Simple main code:
int main(){ return 0;}
Under wvin7 + vs2008 + release
--- d:\coding\helloworld\testc\main.c ------------------------------------------int main(){return 0;00231000 xor eax,eax }00231002 ret
The assembly code of the main function above shows that the content is very simple.
The first command, XOR eax, is an exclusive or operation on eax, which is a common form of assigning 0 values to registers. It is usually agreed to put the return value of the function in eax to return (32-bit, 16 bits in Ax), so we can see that this is in preparation for the return 0; statement; the second command retn is the process near (near) return command, from the stack pop-up return address to the EIP, in contrast, the far command retf is returned. The EIP is first popped up and then CS is displayed. (In fact, for modern operating systems, each process has its own logical address space. The segment register value is set and fixed by the operating system, and the related Assembly commands are rarely used ), based on the proc pseudo command, the command RET automatically determines whether the returned result is near or far (of course, the pseudo command is not visible from the executable file ).
In wvin7 + vs2008 + release, this situation is much more complicated.
--- D: \ coding \ helloworld \ testc \ main. c optional int main () {00331370 push EBP 00331371 mov EBP, esp 00331373 sub ESP, 0c0h 00331379 push EBX 0033137a push ESI 0033137b push EDI 0033137c Lea EDI, [ebp-0C0h] 00331382 mov ECx, 30 h 00331387 mov eax, 0 cccccccch 0033138c rep STOs dword ptr es: [EDI] Return 0; 0033138e XOR eax, eax} 00331390 pop EDI 00331391 pop ESI 00331392 pop EBX 00331393 mov ESP, EBP 00331395 pop EBP 00331396 RET --- no source file -----------------------------------------------------------------------
Vc2010debug
Next, observe the assembly code of the executable files generated in the vc2010debug mode. As you can see, it is much more complicated than the release mode. The reason for this difference is that the debug mode contains debugging information, not optimized. The release mode optimized some execution processes.
The following briefly explains the meaning of the Code:
Because the main function is also a function, it is the same as the function Execution Process: Pass the function parameters before calling (no parameters in this example), and allocate space for the local variables of the function at the entry, and release the space when you exit. Here we will introduce the concept of stack frame, also known as activation record ), it is the stack space reserved for the passed parameters, the return address of the subroutine, the local variables, and the stored registers. The two ends of the stack frame are bounded by two pointers. The register EBP acts as the frame pointer to indicate the bottom of the stack frame, which is equal to the value of the stack top pointer before the function call, when a function is called, its value is not changed. When the function call ends, the stack frame space can be released through the frame pointer value. The register ESP is the stack top pointer for running the stack, it also indicates the top of the stack frame, which can be changed during runtime (SEE ).
The first assembly code push EBP first saves the value of EBP, because it will be immediately used as a frame pointer; the Second Assembly Code mov EBP, esp assigns EBP to the current top pointer of the stack, that is, the frame pointer. Starting from this moment, EBP is used as the base address pointer to address all child routine parameters. The third assembly code sub ESP, 0c0h is used to stack (also stack frame) the size of 0c0h is increased, but the content is not filled in at this time. This is usually done to leave space for local variables. There is no local variable here, how does the 0c0h size run out? I will explain it later.
According to the Convention, the values of eax, EDX, and ECx are stored by the caller, that is, the three registers inside the function can be used at will; the EBX, ESI, and EDI values are stored by the caller. The original values must be saved to the stack before they are used, this is also why the next three codes push these three registers into the stack.
The following commands are used for debugging. Lea EDI, [EBP + var_c0] Actually stores the address in EDI, and the address value is the lowest position in the area with a 0c0h size just now. Then, assign a value of 30 h to ECx; assign a value to eax to 0 cccccch. Finally, execute the command rep stosd, which means to repeat the command stosd for ECx (30 h) times, the meaning of the stosd command is to copy the eax value (0 cccccccch) to the memory. The memory address is ES: EDI. After each execution, EDI changes, this command is used together to set ES: EDI as the starting address and all the bytes in the memory of ECx * 4 to 0cch, that is, to set the 0c0h (30 h
* 4 = 0c0h.
This command is used to facilitate debugging: 0cch is the binary code of the INT 3 Assembly command. This command indicates that when the service program is interrupted on the 3 th, a breakpoint is generated, if you want to see the actual running effect, use the following code:
Int main ()
{
_ Asmint 3;
Return 0;
}
The significance of setting a large area as a breakpoint is: if a program has a vulnerability, the content in this area may be mistakenly executed during execution, because the content in this area is 0cch, an error is reported at runtime to facilitate vulnerability discovery. To put it bluntly, a trap is attached to useful data in the stack. A correct program execution will never step into the trap.
After this process is completed, it is the previously introduced XOR eax and eax. If this main function has other statements, the compiled code will be between rep stosd and xoe eax and eax. Then, the values of the EDI, ESI, and EBX registers are restored.
At this point, the function execution has basically ended, and the mission of the previously opened stack frame has ended. mov ESP and EBP restore ESP to the status before the function call, then restore EBP, and finally return, the whole process is over.
In addition, due to the widespread creation and release of stack frames, Intel provides two simplified Assembly commands: enter and leave. Among them, enter IMM, 0 and push EBP; MoV EBP, esp; sub ESP, Imm equal price; leave and mov ESP, EBP; pop EBP equal price.
In GCC Mode
The following is an assembly in GCC mode. It is also made by the simple main function [lrao @ localhost UNIX] $ GCC test5.c-S-o t. S
.file "test5.c" .text.globl main .type main, @functionmain: pushl %ebp movl %esp, %ebp movl $0, %eax popl %ebp ret .size main, .-main .ident "GCC: (GNU) 4.4.6 20120305 (Red Hat 4.4.6-4)" .section .note.GNU-stack,"",@progbits
Summary
This article shows the assembly code of the main function in some common environments and analyzes it in a simple way. In fact, as long as you fully understand the concept of stack frames, no matter what kind of function assembly code you will encounter in the future, you can easily break through various chaotic operations and find the key content, which is exactly the original intention of this article. After further study, I will try to completely parse the executable file content of each operating system, not just an empty main function.
I found a Windows tool on the Internet and checked the droplr corresponding to the binary code.
Reference: http://blog.sina.com.cn/s/blog_55a8a96d0100lib3.html