Author: cook702 Many of you who have studied compilation must be familiar with C, because in the current university curriculum system, it is likely to be the first programming language you have come into contact. Due to lack of understanding of computers, various problems will inevitably occur during learning. Some problems can be solved through thinking, and more problems are impossible for you to think about, as if it is born like this, you just need to remember it. However, this learning method is mechanical and creative. Only when you really understand the C language can you control it. Otherwise, it will always be separated by a window. Although thin, you will never get through it. Why? In fact, the principle is very simple. It is like a company has a ready-made code library that can be called. When some programmers encounter problems, the only thing they can do is to call the functional modules in the code library. After all, everything will go well. Some programmers prefer to implement it on their own if they have time. Even if there is no time to call the functional modules in the code library, he will also think about how to implement this function and whether the modules in the code library are inappropriate, then, its functions are constantly improved and improved. This may be an important difference between professional and non-professional. Programmers who are not good at thinking may become "code workers" in our eyes in the future ". So how can we really understand the C language? The answer is assembly. Assembly commands are the mnemonic expression of machine commands. to be executed by a computer in any advanced language, they must be converted into individual machine commands. They correspond to Assembly commands one by one. By analyzing Assembly commands, we can really understand the operating mechanism of C language in the computer. Only in this way can we really master the C language. However, many friends are not very familiar with or do not know how to analyze C language through assembly instructions. In some people's minds, the C language is directly executed by the CPU, and there is no such thing as compilation. While the compilation is just a little bit of understanding, at this time it can only be a lingering lack of strength. Here, I will use the assembly language to unveil the true promise of passing C language parameters. First, let's write the simplest c language source code, t. c, as follows: Main (){} Then, we generate the executable file t.exe in the integrated turbocdevelopment environment. Then we use the debug command to load the file. After checking the assembly code, we will find that C:/c> debug t.exe -U 0c1c: 0000 ba720c mov dx, 0c72 0c1c: 0003 2E Cs: 0c1c: 0004 8916f801 mov [01f8], DX 0c1c: 0008 b430 mov ah, 30 0c1c: 000a cd21 int 21 ... Although we know the specific meaning of each instruction, we do not understand the real reason for compiling T. C into this assembly code. In fact, this is a strange thing in the integrated development environment of turboc. To generate an EXE file, the C source program also needs to go through two phases: Compilation and linking. In turboc, the corresponding compilers and connectors are tcc.exeand tlink.exe in the turbocroot directory. If we pass through TCC-c t as we have learned in the Assembly. C (here the-C parameter is added to require only compilation, otherwise it will automatically call the link Program), and then tlink t.obj, the same sample will generate t.exe, but the error is not returned correctly during the runtime, the integrated development environment of turboc solves this problem for us. Now that the problem is solved, you must add the corresponding functional code. Therefore The program is not easy to understand. But it doesn't matter. In fact, the function call in C language is equivalent to the call instruction in assembly, and the function name represents the offset address of the function in memory. We only need to print the value of the function name in hexadecimal format. In C, % x is displayed in hexadecimal format. The code T. C is as follows: Main () {printf ("% x", main );} The result displayed after execution is the offset address of the main () function in the memory. The result printed on my computer is 1FA. Therefore, after we load this program using debug, run the u command u 1fa to obtain the following results: -U 1fa 0C1C: 01FA B8FA01 mov ax, 01FA 0C1C: 01FD 50 PUSH AX 0C1C: 01FE B89401 mov ax, 0194 0C1C: 0201 50 PUSH AX 0C1C: 0202 E8B708 CALL 0ABC 0C1C: 0205 59 POP CX 0C1C: 0206 59 POP CX 0C1C: 0207 C3 RET Someone may ask two questions: 1: If 1fa is printed for the first time, can the second loading be ensured at the 1fa position? 2: How are the Assembly commands difficult to understand? In fact, for question 1, you will find that the results will be the same each time after several trials. This is related to the memory management of the operating system. We only need to remember the specific method, because our current problem is to analyze the parameter transfer of C language through the assembly language, the problem is not discussed too much. For question 2, the real reason is that we have called the library function printf. We cannot understand the specific implementation of this function. 0c1c: 01fa b8fa01 mov ax, 01fa 0c1c: 01fd 50 PUSH AX 0c1c: 01fe b89401 mov ax, 0194 0c1c: 0201 50 PUSH AX The specific reason for the four commands. To analyze the problem, we must start with the simplest function. Therefore, we need to write the simplest function that can describe the problem and try not to call the library function. My t.c is as follows: Int add (INT, INT ); Main () { Int; Int B; Int C; A = 4; B = 5; C = add (A, B ); } Int add (int A, int B) { Return A + B; } It includes two functions: main function and addition function. T.exe will be generated when we have been familiar with the f9tool in the turbocintegrated development environment. As mentioned above, the offset address of the main function in the memory is 1fa (the result on my machine is 1fa and may be other values on different machines ), then we load the program to the memory through debug t.exe, and directly jump to the starting position of the main () function through U 1fa. view the corresponding Assembly Code as follows: C:/C> debug t2.exe -U 1fa ------------------------------------------> main () 0c1c: 01fa 55 push BP 0c1c: 01fb 8bec mov bp, SP 0c1c: 01fd 83ec02 sub sp, + 02 0C1C: 0200 56 PUSH SI 0C1C: 0201 57 PUSH DI 0C1C: 0202 BE0400 mov si, 0004 0C1C: 0205 BF0500 mov di, 0005 0C1C: 0208 57 PUSH DI 0C1C: 0209 56 PUSH SI 0C1C: 020A E80B00 CALL 0218 0C1C: 020D 59 POP CX 0C1C: 020E 59 POP CX 0c1c: 020f 8946fe mov [BP-02], ax 0c1c: 0212 5f pop di 0c1c: 0213 5E pop Si 0c1c: 0214 8be5 mov sp, BP 0c1c: 0216 5d pop BP 0c1c: 0217 C3 RET -U -------------------------------------------> int add (int A, int B) 0c1c: 0218 55 push BP 0c1c: 0219 8bec mov bp, SP 0c1c: 021b 8b4604 mov ax, [bp + 04] 0c1c: 021e 034606 add ax, [bp + 06] 0c1c: 0221 eb00 JMP 0223. 0c1c: 0223 5d pop BP 0c1c: 0224 C3 RET After checking the code, we found that 4 and 5 in mov Si, 0004 and mov Di, 0005 exactly correspond to 4 and 5 in T. C. We start from scratch step by step and observe the changes in the elements in the stack as follows: Elements in the push BP Stack are: BP MoV bp and sp bp Save the current top position of the stack, that is, point to the element BP in the stack Sub sp, + 02 SP minus 2, which is equivalent to the inbound stack operation, but the inbound element is the residual font data in the current stack space, which is equivalent to opening up a word space. At this time, the elements in the stack are: BP-residual data Push Si: the element in the stack is BP-residual data-Si. Push di at this time the elements in the stack are: BP-residual data-SI-DI Mov si, 0004 put 4 into Register SI Mov di, 0005 put 5 into Register DI Push di at this time the elements in the stack are: BP-residual data-SI-DI-5 Push si elements in the stack at this time: BP-residual data-SI-DI-5-4 CALL 0218 calls the subroutine, corresponding to c = add (a, B), near transfer within the segment. The first thing the CALL command does is to import the IP address into the stack so that it can return it correctly. The elements in the post-execution stack are: BP residual data-SI-DI-5-4-020D The first assembly code of the push bp function add (). To restore the scene, re-import the BP into the stack. At this point, the element in the stack is: BP-residual data-SI-DI-5-4-020D-BP Mov bp, SP assigns the top position of the current stack to BP Mov ax, [BP + 04] note that [BP + idata] The default segment register should be SS, Because SS: [SP] corresponds to the BP at the top of the stack, while BP = SP, so SS: [BP + 4] should correspond to 4 in the stack. Add ax, [BP + 06] SS can also be exported from the top: [BP + 6] should correspond to 5 in the stack, corresponding to return a + B; put the added result in AX. JMP 0223 jumps to the offset address 0223, and the corresponding command pop bp does not consider why the jump is made. Pop bp restores BP, and the elements in the stack after execution are: BP-residual data-SI-DI-5-4-020D RET returned to the call function (corresponding to the main function) after the execution of the stack element is: BP-residual data-SI-DI-5-4, And the IP value is 020D. Elements in the pop cx execution rear Stack are: BP-residual data-SI-DI-5, CX = 4 Elements in the pop cx execution rear Stack are: BP-residual data-SI-DI, CX = 5 MOV [BP-02], AX due to the main () corresponding to the Assembly command initial stack top offset address to BP, so the SS: [BP] at this time should correspond to the stack element BP, the SS: [BP-2] corresponds to the residual data in the stack, because the Assembly Code add AX, [BP + 06] corresponding to ADD () stores the added results in AX, therefore, the result of adding two values is stored in the space of residual data in the war. This Location corresponds to the storage location of variable c in t. c. Pop di restores DI. After execution, the elements in the stack are: BP-residual data-SI. Pop si restores SI. After execution, the elements in the stack are: BP-residual data. Mov sp, BP to restore SP, the elements in the stack after execution are: BP, because the first two sentences of the Assembly command corresponding to main () push bp, mov bp, SP, and the result is SS: [BP] The element BP in the stack. After restoration, the SS: [BP] still points to the element BP in the stack. Pop BP restores bp, and the elements in the stack after execution are empty. After the RET function is called, the main () function is returned. After the above analysis, we will find that the compilation program generated by the C language after the compilation link is not complex, and every instruction has been learned. From this we know that the parameter transfer between functions and the declaration of local variables within the function are done through stacks. If you understand this, will you suddenly realize that the C language will eventually be executed in this way. We can use the same method to analyze the knowledge points of global variables, pointers, struct, arrays, and other C languages. Then, the C language will be exposed to us naked. In our eyes, it will no longer be mysterious. At this time, we may already have the ability to control it. From the world http://www.asmedu.net/news.jsp in the eyes of assembler programmers? Indexed = 198 |