Address: http://www.cppblog.com/liu1061/articles/53762.html
If you are on a Linux platform, you can use GDB for disassembly and debugging. (Transfer)
2. The simplest CCodeAnalysis
To simplify the problem, analyze the compilation code generated by the simplest C code:
# Vi test1.c
Int main ()
{
Return 0;
}
CompileProgramTo generate binary files:
# GCC test1.c-O test1
# File test1
Test1: Elf 32-bit LSB executable 80386 version 1, dynamically linked, not stripped
Test1 is an executable file in the ELF format of 32-bit little endian. It is dynamically linked and the symbol table is not removed.
This is a typical executable file format on the Unix/Linux platform.
With MDB disassembly, you can observe the generated assembly code:
# MDB test1
Loading modules: [libc. so.1]
> Main: DIS; disassemble the main function. The command format of MDB is <address>: dis
Main: pushl % EBP; EBP register content pressure stack, that is, save the stack base address of the upper-level function called by the main function
Main + 1: movl % ESP, % EBP; the ESP value is assigned to EBP and the stack base address of the main function is set.
Main + 3: subl $8, % ESP
Main + 6: andl $0xf0, % ESP
Main + 9: movl $0, % eax
Main + 0xe: subl % eax, % ESP
Main + 0x10: movl $0, % eax; Set function return value 0
Main + 0x15: Leave; assign the EBP value to ESP, pop the base address of the upper-level function stack in the previous stack to EBP, and restore the base address of the original Stack
Main + 0x16: ret; the main function returns to the upper-level call
>
Note: The syntax format of the assembly language obtained here is very different from that of Intel's manual. Unix/Linux uses the at&t Assembly format as the syntax format of the assembly language.
For details about at&t assembly, referArticle:Linux at&t Assembly Language Development Guide
Q: Who calls the main function?
At the C language level, the main function is the initial entry point of a program. In fact, the entry point of the elf executable file is not main but _ start.
MDB can also disassemble _ start:
> _ Start: DIS; disassembly starts from the address of _ start
_ Start: pushl $0
_ Start + 2: pushl $0
_ Start + 4: movl % ESP, % EBP
_ Start + 6: pushl % edX
_ Start + 7: movl $0x80504b0, % eax
_ Start + 0xc: testl % eax, % eax
_ Start + 0xe: je + 0xf <_ start + 0x1d>
_ Start + 0x10: pushl $0x80504b0
_ Start + 0x15: Call-0x75 <atexit>
_ Start + 0x1a: addl $4, % ESP
_ Start + 0x1d: movl $0x8060710, % eax
_ Start + 0x22: testl % eax, % eax
_ Start + 0x24: je + 7 <_ start + 0x2b>
_ Start + 0x26: Call-0x86 <atexit>
_ Start + 0x2b: pushl $ 0x80506cd
_ Start + 0x30: Call-0x90 <atexit>
_ Start + 0x35: movl + 8 (% EBP), % eax
_ Start + 0x38: Leal + 0x10 (% EBP, % eax, 4), % edX
_ Start + 0 x 3C: movl % edX, 0x8060804
_ Start + 0x42: andl $0xf0, % ESP
_ Start + 0x45: subl $4, % ESP
_ Start + 0x48: pushl % edX
_ Start + 0x49: Leal + 0xc (% EBP), % edX
_ Start + 0x4c: pushl % edX
_ Start + 0x4d: pushl % eax
_ Start + 0x4e: Call + 0x152 <_ init>
_ Start + 0x53: Call-0xa3 <__ fpstart>
_ Start + 0x58: Call + 0xfb <main>; the main function is called here.
_ Start + 0x5d: addl $ 0xc, % ESP
_ Start + 0x60: pushl % eax
_ Start + 0x61: Call-0xa1 <exit>
_ Start + 0x66: pushl $0
_ Start + 0x68: movl $1, % eax
_ Start + 0x6d: lcall $7, $0
_ Start + 0x74: hlt
>
Q: Why do I use the eax register to save the function return value?
In fact, ia32 does not specify which register to use to save the returned value. However, if you disassemble the Solaris/Linux binary files, you will find that the function return values are saved using eax.
This is not accidental. It is determined by the operating system's Abi (Application binary interface.
The ABI of the Solaris/Linux operating system is sytem v Abi.
Concept: SFP (stack frame pointer) stack framework pointer
to understand SFP correctly, you must understand:
ia32 stack concept
role of 32-bit register ESP/EBP in CPU
how push/pop commands affect stack
call/RET/ how leave and other commands affect stack
as we know:
1) The ia32 stack is used to store temporary data and is lifo, that is, the latter is first-in-first-out. Stack growth direction is from high address to low address, by byte.
2) EBP is the pointer to the stack base address, always pointing to the stack bottom (high address), ESP is the stack pointer, always pointing to the stack top (low address ).
3) when pushing a long data, data is pushed to the stack in bytes, store data in byte from high to low to address units in ESP-1, ESP-2, ESP-3, and ESP-4.
4) pop a long type of data, the process is opposite to push, in turn the ESP-4, ESP-3, ESP-2, ESP-1 pop-up from the stack, put a 32-bit register.
5) The call command is used to call a function or process. In this case, the next instruction address is pushed into the stack to resume the execution of the next instruction when the return result is returned.
6) the RET command is used to return data from a function or process. The next instruction address saved by the call will pop up from the stack to the EIP register, run the following command before the program is transferred to call
7) enter is the stack framework of the current function, which is equivalent to the following two commands:
pushl % EBP
movl % ESP, % EBP
8) leave is the stack framework for releasing the current function or process, which is equivalent to the following two commands:
movl ebp esp
popl EBP
If you disassemble a function, it is often found that there are Assembly statements similar to the following forms in the function entry and return:
pushl % EBP; the content pressure stack of the EBP register stores the base stack address of the upper-level function called by the main function
movl % ESP, % EBP; the ESP value is assigned to EBP, set the stack base address of the main function
...........; the preceding two commands are equivalent to enter 0, 0
...........
Leave; assign the EBP value to ESP, pop the base address of the upper-level function stack in the previous stack to EBP, and restore the base address of the original stack
ret; the main function returns, back to the upper-level call
These statements are used to create and release a function or process stack framework.
The compiler automatically inserts statements for creating and releasing stack frameworks at the function entry and exit.
When a function is called:
1) EIP/EBP becomes the boundary of the new function Stack
When a function is called, The EIP returned is first pushed into the stack. When a stack framework is created, the EBP of the upper-level function stack is pushed into the stack, and the EIP works together to form the boundary of the new function stack framework.
2) EBP becomes the stack framework pointer SFP, which is used to indicate the boundary of the new function stack.
After the stack framework is established, the content of the stack that EBP points to is the EBP of the upper-level function stack. As you can imagine, through EBP, You can traverse the stacks that call function layers, the debugger uses this feature to implement the backtrace function.
3) ESP always points to the top of the stack as a stack pointer to allocate stack space
Stack allocation space to the function of local variables is usually the statement to ESP minus a constant value, for example, to assign an integer data is ESP-4
4) function parameter transfer and local variable access can be achieved through SFP or EBP.
Because the stack framework pointer always points to the stack base address of the current function, access to parameters and local variables is usually in the following form:
+ 8 + XX (% EBP); function entry parameter access
-XX (% EBP); Function Local variable access
If function a calls function B and function B calls Function C, the function stack framework and call relationship are shown in:
+ ------------------------- + ----> High address
| EIP (the address returned by the upper-level function) |
+ ------------------------- +
+ --> | EBP (EBP of the upper-level function) | -- + <------ EBP of the current function a (that is, SFP framework pointer)
| + ------------------------- ++ --> Offset
| Local variables |
| ...... | -- + <------ ESP points to the new local variable assigned by function A. local variables can be accessed through EBP-offset a of function.
| F + ------------------------- +
| R | arg N (nth parameter of function B) |
| A + ------------------------- +
| M | arg. (The. Parameter of function B) |
| E + ------------------------- +
| Arg 1 (the 1st parameters of function B) |
| O + ------------------------- +
| F | arg 0 (0th parameters of function B) | -- + <------ parameters of function B can be accessed by EBP + offset B of function B.
| + ------------------------- ++ --> Offset B
| A | EIP (return address of function a) |
| + ------------------------- + -- +
+ --- | EBP (EBP of function a) | <-- + <------ EBP of function B (SFP framework pointer)
+ ------------------------- + |
| Local variables |
| ...... ||||<------ ESP points to the newly allocated local variable of function B
+ ------------------------- + |
| Arg n (n parameter of Function C) |
+ ------------------------- + |
| Arg. (The. Parameter of Function C) |
+ ------------------------- + --> Frame of B
| Arg 1 (the 1st parameters of Function C) |
+ ------------------------- + |
| Arg 0 (0th parameters of Function C) |
+ ------------------------- + |
| EIP (return address of function B) |
+ ------------------------- + |
+ --> | EBP (EBP of function B) | -- + <------ EBP of Function C (SFP framework pointer)
| + ------------------------- +
| Local variables |
| ...... | <------ ESP points to the newly allocated local variable of Function C.
| + ------------------------- + ----> Low address
Frame of C
Figure 1-1
Analyze the meaning of the remaining statement in the test1 disassembly result:
# MDB test1
Loading modules: [libc. so.1]
> Main: DIS; disassemble the main function
Main: pushl % EBP
Main + 1: movl % ESP, % EBP; create a stack frame)
Main + 3: subl $8, % ESP; allocate 8-byte stack space via ESP-8
Main + 6: andl $0xf0, % ESP; alignment the stack address in 16 bytes
Main + 9: movl $0, % eax; meaningless
Main + 0xe: subl % eax, % ESP; meaningless
Main + 0x10: movl $0, % eax; set the return value of the Main Function
Main + 0x15: Leave; Undo stack frame (stack frame)
Main + 0x16: ret; the main function returns
>
The following two statements seem meaningless. Are they true?
Movl $0, % eax
Subl % eax, % ESP
Recompile test1.c with GCC's O2-level optimization:
# Gcc-O2 test1.c-O test1
# MDB test1
> Main: dis
Main: pushl % EBP
Main + 1: movl % ESP, % EBP
Main + 3: subl $8, % ESP
Main + 6: andl $0xf0, % ESP
Main + 9: xorl % eax, % eax; set the main return value. Use xorl exception or command to set eax to 0.
Main + 0xb: Leave
Main + 0xc: Ret
>
The new disassembly results are more concise than the original results. The previously considered useless statements were optimized and further verified.
Tip: some statements generated by the compiler may be useless in the actual semantics of the program. You can use optimization options to remove these statements.
Q: Why is xorl used to set the eax value?
Note that in the optimized code, the eax return value is changed from movl $0, % eax to xorl % eax, % eax, because the ia32 command is in, xorl is faster than movl.
Concept: aligned stack aligned
So what are the roles of the following statements?
Subl $8, % ESP
Andl $0xf0, % ESP; Use andl to set the minimum 4 bits to 0 to ensure 16-byte alignment of stack addresses
On the surface, the most direct consequence of this statement is to make the last 4 bits of the ESP address 0, that is, 16-byte alignment. Why?
It turns out that some commands of the ia32 series CPU run faster when they are 4, 8, and 16 bytes aligned. Therefore, the GCC compiler improves the speed of code generation on ia32, the generated code is 16 bytes aligned by default.
The meaning of andl $0xf0 and % ESP is obvious. So what about subl $8 and % ESP?
Assume that the stack is 16-byte aligned before entering the main function. after entering the main function, the EIP and EBP are pushed to the stack, the last 4 binary digits of the stack address must be 1000, and ESP-8 exactly makes the last 4 binary digits of the stack address 0000. It seems that this is to ensure 16-byte alignment of the stack.
If you check the GCC manual, you will find the stack alignment parameter settings:
-Mpreferred-Stack-boundary = N; the stack is expected to be aligned according to the byte boundary of 2 N times. The value range of N is 2-12.
By default, n is equal to 4. That is to say, by default, GCC is 16-byte alignment to meet the requirements of most ia32 commands.
let's use-mpreferred-Stack-boundary = 2 to remove the stack alignment command:
# gcc-mpreferred-Stack-boundary = 2 test1.c-O test1
main: DIS
main: pushl % EBP
main + 1: movl % ESP, % EBP
main + 3: movl $0, % eax
main + 8: leave
main + 9: Ret
as you can see, the stack alignment command is gone because the ia32 stack itself is 4-byte alignment and no additional commands are needed for alignment.
so, is the stack framework pointer SFP required?
# gcc-mpreferred-Stack-boundary = 2-fomit-frame-pointer test1.c-O test
main: DIS
main: movl $0, % eax
main + 5: Ret
we can see that-fomit-frame-pointer can remove SFP.
question: What are the disadvantages of SFP removal?
1) increase the difficulty of debugging
the SFP cannot be used without SFP because it is used in the DEBUG command of the debugger backtrace.
2) reduce the readability of Assembly Code
function parameters and local variables can only be accessed in the + XX (ESP) mode without EBP, however, it is difficult to distinguish between the two methods to reduce the readability of the program.
question: What are the advantages of SFP removal?
1) saves stack space
2) Simplifies the code after the instructions for creating and revoking the stack framework are reduced
3) make the EBP idle, use it as a general register to increase the number of General registers
4) the above three points make the program run faster
concept: calling convention call Convention and application binary interface
How do Functions find its parameters?
How does a function return results?
where does the function store local variables?
which hardware register is the starting space?
which hardware register must be reserved in advance?
The calling convention call Convention specifies the above issues. Calling Convention is also part of Abi.
Therefore, operating systems that comply with the same abi specifications make it possible to implement binary code interoperability between them.
For example, because both Solaris and Linux comply with the ABI of System V, Solaris 10 provides the function of directly running the Linux binary program.
For details, see the article:Note: 10 new changes in Solaris 10
3. Summary
This article introduces the following concepts through the simplest C program:
SFP stack framework pointer
Aligned stack aligned
Calling Convention call Convention and ABI (Application binary Interface) Application binary Interface
In the future, we will explore these concepts through further experiments. By mastering these concepts, the core dump generated by the Assembly-level debugging program and the advanced C language debugging skills become possible.
End