On the Linux platform, you can use GDB for disassembly and debugging.
If you are on a Linux platform, you can use GDB for disassembly and debugging. (Transfer)
2. Simplified C code analysis
To simplify the problem, analyze the compilation code generated by the simplest C code:
# Vi test1.c
Int main ()
{
Return 0;
}
Compile the program to generate a binary file:
# GCC test1.c-O test1
# File test1
Test1: Elf 32-bit LSB executable 80386 version 1, dynamically linked, not stripped
Test1 is an executable file in the ELF format of 32-bit little endian. It is dynamically linked and the symbol table is not removed.
This is a typical executable file format on the Unix/Linux platform.
With MDB disassembly, you can observe the generated assembly code:
# MDB test1
Loading modules: [libc. so.1]
> Main: DIS; disassemble the main function. The command format of MDB is <address>: dis
Main: pushl % EBP; EBP register content pressure stack, that is, save the stack base address of the upper-level function called by the main function
Main + 1: movl % ESP, % EBP; the ESP value is assigned to EBP and the stack base address of the main function is set.
Main + 3: subl $8, % ESP
Main + 6: andl $0xf0, % ESP
Main + 9: movl $0, % eax
Main + 0xe: subl % eax, % ESP
Main + 0x10: movl $0, % eax; Set function return value 0
Main + 0x15: Leave; assign the EBP value to ESP, pop the base address of the upper-level function stack in the previous stack to EBP, and restore the base address of the original Stack
Main + 0x16: ret; the main function returns to the upper-level call
>
Note: The syntax format of the assembly language obtained here is very different from that of Intel's manual. Unix/Linux uses the at&t Assembly format as the syntax format of the assembly language.
For details about at&t assembly, refer to: Linux at&t Assembly Language Development Guide.
Q: Who calls the main function?
At the C language level, the main function is the initial entry point of a program. In fact, the entry point of the elf executable file is not main but _ start.
MDB can also disassemble _ start:
> _ Start: DIS; disassembly starts from the address of _ start
_ Start: pushl $0
_ Start + 2: pushl $0
_ Start + 4: movl % ESP, % EBP
_ Start + 6: pushl % edX
_ Start + 7: movl $0x80504b0, % eax
_ Start + 0xc: testl % eax, % eax
_ Start + 0xe: je + 0xf <_ start + 0x1d>
_ Start + 0x10: pushl $0x80504b0
_ Start + 0x15: Call-0x75 <atexit>
_ Start + 0x1a: addl $4, % ESP
_ Start + 0x1d: movl $0x8060710, % eax
_ Start + 0x22: testl % eax, % eax
_ Start + 0x24: je + 7 <_ start + 0x2b>
_ Start + 0x26: Call-0x86 <atexit>
_ Start + 0x2b: pushl $ 0x80506cd
_ Start + 0x30: Call-0x90 <atexit>
_ Start + 0x35: movl + 8 (% EBP), % eax
_ Start + 0x38: Leal + 0x10 (% EBP, % eax, 4), % edX
_ Start + 0 x 3C: movl % edX, 0x8060804
_ Start + 0x42: andl $0xf0, % ESP
_ Start + 0x45: subl $4, % ESP
_ Start + 0x48: pushl % edX
_ Start + 0x49: Leal + 0xc (% EBP), % edX
_ Start + 0x4c: pushl % edX
_ Start + 0x4d: pushl % eax
_ Start + 0x4e: Call + 0x152 <_ init>
_ Start + 0x53: Call-0xa3 <__ fpstart>
_ Start + 0x58: Call + 0xfb <main>; the main function is called here.
_ Start + 0x5d: addl $ 0xc, % ESP
_ Start + 0x60: pushl % eax
_ Start + 0x61: Call-0xa1 <exit>
_ Start + 0x66: pushl $0
_ Start + 0x68: movl $1, % eax
_ Start + 0x6d: lcall $7, $0
_ Start + 0x74: hlt
>
Q: Why do I use the eax register to save the function return value?
In fact, ia32 does not specify which register to use to save the returned value. However, if you disassemble the Solaris/Linux binary files, you will find that the function return values are saved using eax.
This is not accidental. It is determined by the operating system's Abi (Application binary interface.
The ABI of the Solaris/Linux operating system is sytem v Abi.
Concept: SFP (stack frame pointer) stack framework pointer
To understand SFP correctly, you must understand:
Concept of ia32 Stack
Functions of 32-bit register ESP/EBP in CPU
How the push/pop command affects the stack
How call/RET/leave and other commands affect the stack
As we know:
1) The ia32 stack is used to store temporary data, and it is lifo, that is, the later, first, first, and foremost. Stack growth direction is from high address to low address, by byte.
2) EBP is the pointer of the stack base address, always pointing to the bottom of the stack (high address), ESP is the stack pointer, always pointing to the top of the stack (low address ).
3) push a long data, in bytes as the unit of data into the stack, from high to low by byte data into the ESP-1, ESP-2, ESP-3, ESP-4 address unit.
4) pop a long type of data, the process and push the opposite, in turn the ESP-4, ESP-3, ESP-2, ESP-1 from the stack popped up, put a 32-bit register.
5) The call command is used to call a function or process. At this time, the next instruction address is pushed into the stack for resuming execution of the next instruction when the return result is returned.
6) The RET command is used to return from a function or process. The next instruction address saved in the previous call will pop up from the stack to the EIP register, and the program will execute the next instruction before the call.
7) enter is the stack framework of the current function, which is equivalent to the following two commands:
Pushl % EBP
Movl % ESP, % EBP
8) leave is the stack framework for releasing the current function or process, which is equivalent to the following two commands:
Movl EBP ESP
Popl EBP
If you disassemble a function, it is often found that there are Assembly statements similar to the following forms in the function entry and return:
Pushl % EBP; EBP register content pressure stack, that is, save the stack base address of the upper-level function called by the main function
Movl % ESP, % EBP; the ESP value is assigned to EBP and the stack base address of the main function is set.
..........; The preceding two commands are equivalent to enter 0, 0.
...........
Leave; assign the EBP value to esp. the base address of the upper-level function stack in the pop stack is given to EBP to restore the base address of the original stack.
RET; the main function returns to the upper-level call
These statements are used to create and release a function or process stack framework.
The compiler automatically inserts statements for creating and releasing stack frameworks at the function entry and exit.
When a function is called:
1) EIP/EBP becomes the boundary of the new function Stack
When a function is called, The EIP returned is first pushed into the stack. When a stack framework is created, the EBP of the upper-level function stack is pushed into the stack, and the EIP works together to form the boundary of the new function stack framework.
2) EBP becomes the stack framework pointer SFP, which is used to indicate the boundary of the new function stack.
After the stack framework is established, the content of the stack that EBP points to is the EBP of the upper-level function stack. As you can imagine, through EBP, You can traverse the stacks that call function layers, the debugger uses this feature to implement the backtrace function.
3) ESP always points to the top of the stack as a stack pointer to allocate stack space
Stack allocation space to the function of local variables is usually the statement to ESP minus a constant value, for example, to assign an integer data is ESP-4
4) function parameter transfer and local variable access can be achieved through SFP or EBP.
Because the stack framework pointer always points to the stack base address of the current function, access to parameters and local variables is usually in the following form:
+ 8 + XX (% EBP); function entry parameter access
-XX (% EBP); Function Local variable access
If function a calls function B and function B calls Function C, the function stack framework and call relationship are shown in:
+ ------------------------- + ----> High address
| EIP (the address returned by the upper-level function) |
+ ------------------------- +
+ --> | EBP (EBP of the upper-level function) | -- + <------ EBP of the current function a (that is, SFP framework pointer)
| + ------------------------- ++ --> Offset
| Local variables |
| ...... | -- + <------ ESP points to the new local variable assigned by function A. local variables can be accessed through EBP-offset a of function.
| F + ------------------------- +
| R | arg N (nth parameter of function B) |
| A + ------------------------- +
| M | arg. (The. Parameter of function B) |
| E + ------------------------- +
| Arg 1 (the 1st parameters of function B) |
| O + ------------------------- +
| F | arg 0 (0th parameters of function B) | -- + <------ parameters of function B can be accessed by EBP + offset B of function B.
| + ------------------------- ++ --> Offset B
| A | EIP (return address of function a) |
| + ------------------------- + -- +
+ --- | EBP (EBP of function a) | <-- + <------ EBP of function B (SFP framework pointer)
+ ------------------------- + |
| Local variables |
| ...... ||||<------ ESP points to the newly allocated local variable of function B
+ ------------------------- + |
| Arg n (n parameter of Function C) |
+ ------------------------- + |
| Arg. (The. Parameter of Function C) |
+ ------------------------- + --> Frame of B
| Arg 1 (the 1st parameters of Function C) |
+ ------------------------- + |
| Arg 0 (0th parameters of Function C) |
+ ------------------------- + |
| EIP (return address of function B) |
+ ------------------------- + |
+ --> | EBP (EBP of function B) | -- + <------ EBP of Function C (SFP framework pointer)
| + ------------------------- +
| Local variables |
| ...... | <------ ESP points to the newly allocated local variable of Function C.
| + ------------------------- + ----> Low address
Frame of C
Figure 1-1
Analyze the meaning of the remaining statement in the test1 disassembly result:
# MDB test1
Loading modules: [libc. so.1]
> Main: DIS; disassemble the main function
Main: pushl % EBP
Main + 1: movl % ESP, % EBP; create a stack frame)
Main + 3: subl $8, % ESP; allocate 8-byte stack space via ESP-8
Main + 6: andl $0xf0, % ESP; alignment the stack address in 16 bytes
Main + 9: movl $0, % eax; meaningless
Main + 0xe: subl % eax, % ESP; meaningless
Main + 0x10: movl $0, % eax; set the return value of the Main Function
Main + 0x15: Leave; Undo stack frame (stack frame)
Main + 0x16: ret; the main function returns
>
The following two statements seem meaningless. Are they true?
Movl $0, % eax
Subl % eax, % ESP
Recompile test1.c with GCC's O2-level optimization:
# Gcc-O2 test1.c-O test1
# MDB test1
> Main: dis
Main: pushl % EBP
Main + 1: movl % ESP, % EBP
Main + 3: subl $8, % ESP
Main + 6: andl $0xf0, % ESP
Main + 9: xorl % eax, % eax; set the main return value. Use xorl exception or command to set eax to 0.
Main + 0xb: Leave
Main + 0xc: Ret
>
The new disassembly results are more concise than the original results. The previously considered useless statements were optimized and further verified.