Disassembly analysis of simple C language applet under Linux

Source: Internet
Author: User

Hann
Original works reproduced please indicate the source
"Linux kernel Analysis" MOOC course http://mooc.study.163.com/course/USTC-1000029000

Written in the beginning, this article is due to participate in MOOC related courses and written homework, if there are omissions, please also point out.

Chose a Linux kernel Analysis course, because reading kernel code is more or less involved in the reading of the AT/T assembly code, so here is a simple C command-line program to write the disassembly analysis process, on the one hand to complete the job, on the other hand as practiced hand. Start here:

1, write our C language small program

Here we use the simple example, the code is as follows:

1#include <stdio.h>2 3 intExG (intx)4 {5     returnX +5;6 }7 8 intExF (intx)9 {Ten     returnExG (x); One } A  - intMainvoid) - { the     returnExF (Ten) +2; -}

Write the above code using an editor such as Vim, save to MAIN.C, and then use the following command to generate the assembly source file:
x86 system:
$GCC-S-o main.s main.c
x64 system:
$GCC-m32-s-o main.s main.c
Because we are using a 32-bit platform as an example, we need to add-m32 on the x64 machine to enable GCC to generate 32-bit assembly source files.

2. Processing source files

After executing the above command, there will be a main.s file in the current directory, using VIM to open, unwanted link information [with "." Line] to get the following assembly code:

1 ExG:2Pushl%EBP3MOVL%esp,%EBP4Movl8(%EBP),%eax5Addl $5, %eax6Popl%EBP7 ret8 ExF:9Pushl%EBPTenMOVL%esp,%EBP OnePushl8(%EBP) A Call ExG -Addl $4, %ESP - Leave the ret - Main: -Pushl%EBP -MOVL%esp,%EBP +PUSHL $Ten - Call ExF +Addl $4, %ESP AAddl $2, %eax at Leave -Ret

You can see in this file is the GCC help us generate the assembly code, here need to explain the next and the format and the Intel format, both formats GCC can be generated, if you want to generate the Intel format assembly code, only need to add the-masm=intel option, But Linux under the default is to use the-T format to write assembly code, Linux kernel code is also the T format, we have to slowly get used to the T-format to write assembly code. The most important points to note here are the differences between the and Intel assembly formats:

The assembly instruction in the-T format is "the source operand is in front, the destination operand is after", and the Intel format is reversed, namely the following:
-T format: MOVL%eax,%edx
Intel format: mov edx, eax
means that the contents of the EAX register are placed in the edx register. It is important to note that the L-movl in the-t format indicates that the operands of the instruction are 32-bit, similar or MOVB,MOVW,MOVQ, representing 8-bit, 16-bit, and 64-bit operands, respectively. For more specific t/t assembly syntax please follow Google or consult the relevant books.

3. Assembly Code Analysis

The following begins the analysis of the assembly code, after running the program, C runtime will be a series of preparations to let us have the EIP point to our main function to start execution, so here from main analysis:

First enter the GDB debugging environment:
Enter the following command on our machine to generate the elf file with debug information and then go to GDB for debugging:
$GCC-m32-g-o main main.c
$gdb Main-tui-q
After entering GDB, enter layout asm to switch to the disassembly view and the breakpoint at the main function:
(GDB) Layout asm
(GDB) B main

Then we use
(GDB) Si
To execute each instruction and observe the change of register ,

For the main function:

Instruction-by-clause execution
(GDB) Si

PUSHL    %ebpmovl    %esp,%EBP
...

These two are prolog, which include saving the current stack environment to ensure that the function returns correctly and opens up new stack space for the current function. The execution effect of these two sentences is to put the current EBP value into the stack, and then put the value in the ESP after the stack into EBP. At this point, both ESP and EBP point to the same memory address.
What needs to be explained here is the stack and the stack operation, on Intel's x86 architecture, the stack is growing from a high address to a low address, so:

The stack is equivalent to: 1, the ESP is moved down to leave the corresponding space, 2, the corresponding value into the space just left to complete the stack

The stack is equivalent to: 1, from the current ESP point to the memory out of the value; 2. ESP moves up to free space

The situation in the stack is as follows: [ starting from here, each space represents 4 bytes of memory space ]

Figure 1

Continue the execution of each instruction

1 pushl    $2call    exF3 addl    $4,%ESP 4 .....

PUSHL $ $, the current ESP is reduced by 4, and then the value 10 with a width of 4 is placed in the memory currently pointed to by the ESP.

Call ExF, the function invocation instruction, first put the value of the current EIP [the current EIP point to the third instruction, namely Addl $4,%esp] into the stack, and then jump to the ExF function of the first instruction to start execution.
The situation in this stack is as follows:
Figure 2

For the EXF function:

Instruction-by-clause execution
(GDB) Si

1 pushl    %ebp2 movl    %esp,%ebp3 pushl    8(%ebp)  4call    exG5 addl $4,%esp6 ....

Here the previous instruction and the main function of the first two instructions, save the current stack environment, for the EXF function to open up new stack space

PUSHL 8 (%EBP), which adds the value in the current EBP to 8 as the memory address, and puts the value "ten" in the memory space pointed to by the memory address into the stack. [ refer to Figure 2 can be found in fact is the call function is passed in the parameter into the stack ]
Call ExG, the function invocation instruction, the current EIP into the stack, jump to the ExG function of the first instruction execution.

The situation in this stack is as follows:

Figure 3

For the ExG function:

Instruction-by-clause execution
(GDB) Si

1 pushl    %ebp2 movl    %esp,%ebp3 movl    8(%EBP) ,%eax4 addl    $5,%eax5 popl    %ebp6 ret

The first is still the function preface (Prolog), save the stack environment, open up new stack space

The situation in this stack is as follows:

Figure 4

MOVL 8 (%EBP),%eax the instruction takes the value in the current EBP plus 8 as the memory address, and puts the value "ten" in the memory space pointed to by the memory address into the EAX register. [ referring to Figure 4 can be found that the calling function is passed in the parameter into the EAX register ]
Addl $,%eax at or T. Assembly language in the $ symbol followed by a number indicating an immediate number, here is the value of the EAX in addition to 5, and then put back to eax, the value of eax at this time.
Popl%ebp, gets the old ESP value from the stack and puts it in the EBP register. [There is no movl%ebp here,%esp is because the value of ESP in the function does not change and still points to the memory space that holds the old ESP value ]
RET is equivalent to the pop Eip, from the top of the current stack, that is, ESP refers to the memory to get the value, as an EIP, and then jump to the EIP stored in the address to continue execution.
In this case the stack
Figure 5

Here, the function ExG has returned and its return value is stored in the EAX register, which is the return value of

Return to function exf

1 ... 2 addl    $4,%ESP3leave4 ret

The program proceeds from the above instructions,
Addl $4,%ESP recycle stack space, stack space shrinks by 4 bytes,
Leave, equivalent to the following two instructions
MOVL%EBP,%esp
Pop%EBP
That is, the function epilogue [EpiLog], releasing the stack space used by the EXF function, when the situation in the stack:
Figure 6

Then is the RET instruction, after the execution of the instruction, the function EXF return, the program goes back to the main function to continue execution, at this time the stack situation

Figure 7

In this case, the return value of the function exf is stored in the eax, i.e. 15

Go back to main function to continue execution

1 ... 2 addl    $4,%ESP3 addl    $2,%eax4leave  5 RET

Addl $4,%esp Stack shrink 4 bytes, recycle stack space
Addl,%eax at this time EAX value is the main function call function EXF The resulting return value, that is, 15, this directive will eax the value of 2 and put back to eax, after the execution of EAX value is
Leave function epilogue, after this instruction is executed, the value of EBP is the value represented by the black old ebp in Figure 7 , and ESP points to the previous memory space in the memory space of the black old ebp in Figure 7 . This is where the memory address that points to the instruction immediately after the CRT calls the main function is stored
The RET main function returns

4. Summary

The process of a computer's work is actually the "take command, execute instruction" loop, the program is loaded into memory at execution time, and the computer starts reading instructions from a location in memory in a logical order until the end of the program. In the process of execution as needed for each module in the program in memory to open up a certain amount of space [such as stack, Heap]. Where the computer starts executing instructions from memory is entirely determined by the values in the instruction pointer register [EIP] in the CPU, and does not differentiate what is in memory where the code snippet is, and where the data segment is.

Disassembly analysis of simple C language applet under Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.