From C to Assembly: Stack is the foundation of computer work. Assembly
Author: r1ce Original Works reprinted please indicate the source of "Linux Kernel Analysis" MOOC course http://mooc.study.163.com/course/USTC-1000029000 on how the computer is working, this is an easy to summarize but difficult to explain the problem. As we all know, the most important feature of the Von norm system is that the CPU, like a big manager, uses various methods to find the commands to be executed in the vast amount of memory, and the data to be used. How does the CPU distinguish between commands and data and determine the order in which commands are executed? Let's first look at the computer. Ordinary users use software on computers. The software is compiled by programmers and generally uses high-level languages, such as Python, C, and Java. These languages are easy for humans to understand, read, and write, however, the computer cannot directly identify it. Whether it is Python or C, the former needs to be executed through the interpreter, and the latter requires the compiler to compile as an executable file. The underlying Implementation of the computer is the recognition of 0 and 1 Based on the circuit, which is also the true appearance of executable files-a lot of 0 and 1 representations. Therefore, the gap between advanced languages 0 and 1 seems to be a great gap. Therefore, assembly language is very important as an intermediary between the two. In the upward direction, advanced languages can be expressed in assembly languages. In the downward direction, commands in each assembly language can be expressed in binary 0 and 1, which is recognized by the computer CPU. After understanding the operating process of the assembly language, you can understand how the computer actually works. What is assembly language? To understand the assembly language, you must first understand the composition of the computer. To simplify the process, only CPU and memory are available. The CPU is a processor, and the memory stores commands and data. The processor is like a butler. It takes commands from the memory for execution, extracts the data, and stores the data. For the CPU, the execution of each program should solve three problems: 1. Where the data to be processed is located; 2. How to process the data; 3. Where the processed data is stored. To solve these three problems, the CPU needs help from some tools, which are various registers. The assembly language is actually processing these registers. In layman's terms, it is to roll over a lot of data in registers and memory, and perform some operations such as copying, adding, downgrading, and downgrading. In fact, it is very easy to learn assembly languages. You only need to remember more than a dozen Assembly commands, various registers, and stack usage. In this article, we get the assembly code through a simple C program disassembly, and analyze the assembly code to understand the foundation of computer work. The C program is like this:
1 int a(int x) 2 { 3 return x + 5; 4 } 5 6 int b(int x) 7 { 8 return a(x); 9 }10 11 int main(void)12 {13 return b(5) - 2;14 }
We can see that there are many function calls and responses in the program. Why do we set it like this? Because the key to computer operation during function calls in a program, analyzing the specific implementation of function calls can help you understand the principle of computer operation. We will write the above Code into the main. c file. Then use
gcc -S -o main.s main.c -m32
Command to generate assembly code. The result is as follows. The append-m32 is used for disassembly Based on the 32-bit method.
We only need to look at the key part of the assembly code. We can delete all the statements starting with a vertex and get the following Assembly command.
1 a: 2 3 pushl %ebp 4 movl %esp, %ebp 5 movl 8(%ebp), %eax 6 addl $5, %eax 7 popl %ebp 8 ret 9 10 b:11 12 pushl %ebp13 movl %esp, %ebp14 subl $4, %esp15 movl 8(%ebp), %eax16 movl %eax, (%esp)17 call a18 leave19 ret20 21 main:22 23 pushl %ebp24 movl %esp, %ebp25 subl $4, %esp26 movl $5, (%esp)27 call b28 subl $2, %eax29 leave30 ret
Next we will analyze how C Code corresponds to assembler, and how assembly language works. Let's take a look at the C program. From the main function, it returns the result of a function B before the computation. Let's take a look at function B, which returns the result of function a, and function a is used to add 5 to the parameter x passed to function B. Therefore, for this program, the final value should be 5 + 5-2 = 8. Let's take a look at the assembly code. We can start with the main function. As soon as we see the push, we know that this operation is performed on the stack. Ebp is the top pointer of the stack, esp is the current position pointer of the stack, and the stack is grown from top to bottom, followed by first-in-first-out. The first ebp pressure stack, in fact, is the first esp-4 and then put ebp to the current position of the stack. There are 1st commands. The first command puts the esp value in ebp, that is, the point of the ebp changes to the point of esp. 3rd instructions will be esp-4. The fourth command moves 5 to the address pointed to by esp. The first command calls function B, which is equivalent to two operations. One is to first import the current eip into the stack. At this time, the eip should be the position of subl $2, % eax, which is recorded as 28. Another operation is to put the address of function B into the eip, that is, the program needs to be executed from 10. The first command is pushl % ebp, which has been mentioned before and will not be repeated together with the seven or eight commands. 9th movl 8 (% ebp) and % eax put the ebp value + the content pointed to by 8 into eax, which is actually eax = 5. 10th commands put the content of eax into the content directed to by esp. The first command calls function a, which is similar to the previous step. The first command is pushl % ebp of function a, which is omitted together with 13 and 14. 15th get 10 for the value + 5 in eax. 16th put the content pointed to by the current esp into ebp, esp + 4, so now ebp = 4. The first command is ret, that is, popl % eip, that is, the current eip is changed to 18. Return to function B and run the command from leave. The leave command contains two Commands: movl % ebp, % esp, and popl % ebp. The first instruction ret is returned to the main function and executed from 28. In the First Command, the content in eax is-2, that is, 8. 21st and 22nd. We can see that the stack is back to its initial position. So far, the assembly code has been analyzed. From the process above, we can see that the most fundamental principle of computer work is to process the stored data, save the results, and then continuously process the data. Commands are the basis for data processing. The specific method is to use the registers in the CPU and the stack in the memory to operate the data according to an agreed step. The computer is actually very simple. It is a dead guy. As long as you determine what to do in each step, it will strictly follow the steps to complete the operation without compromise. Therefore, dealing with computers is much easier than dealing with people.