[Disclaimer: All Rights Reserved. You are welcome to reprint it. Do not use it for commercial purposes. Contact Email: feixiaoxing @ 163.com]
When we look at the C ++ language from the perspective of assembly, how to read assembly code becomes a problem we need to solve. In fact, to be honest, compilation is not difficult. We only need to understand the following issues:
(1) What language is assembly?
(2) What are the main contents in assembly?
(3) How does the Assembly Language correspond to the actual C/C ++ language code one by one?
(1) Languages of Assembly
In fact, assembly language is a mark of the CPU instruction code. Different CPUs have different instruction sets. CPUs on normal PCs generally come from AMD or INTEL, which is the X86 instruction set we are talking about today. Other similar CPUs include POWERPC, which are mainly used by vswitches and vrouters of telecom enterprises; ARM type, which is mainly used by smart terminal or instrument devices; SUN-type, it is mainly used by SUN servers. Because the CPU Instruction Set and binary code are almost one-to-one, the assembly language not only helps us quickly understand the hardware of the machine, but also helps us understand how the program runs on the device.
(2) What are the contents of assembly languages?
There are a lot of content in the assembly language, but there are actually not many content related to our C/C ++ language. In general, you only need to know the basic operations and address access between registers, segment addresses, stacks, and registers.
(3) How does the Assembly Language correspond to the actual language one by one?
We start with an example. Generally, a statement must be split into several Assembly statements. For example:
Int m = 10;
Int n = 20;
Int p = m + n;
Let's assume that m, n, and p are all in a function, so in fact, all three variables are temporary variables. before entering the function, both ebp and esp need to free up space to prepare for these temporary variables. These three statements should be explained in this way.
43: int m = 10;
004012E8 mov dword ptr [ebp-4], 0Ah
44: int n = 20;
004012EF mov dword ptr [ebp-8], 14 h
45: int p = m + n;
004012F6 mov eax, dword ptr [ebp-4]
004012F9 add eax, dword ptr [ebp-8]
004012FC mov dword ptr [ebp-0Ch], eax
We can intuitively see the correspondence between the Assembly Statement and the C language through the code above. In the first sentence, m is assigned a value of 10, and the memory is the downward memory of ebp. In the second sentence, it is similar to the first sentence. In the third sentence, it is a little complicated. Let's analyze it. First we can see that the CPU from the stack m data found out, that is, the [ebp-4] address at the data, then, the CPU uses the same method to find the n data and directly add it to the Register eax. The last step is relatively simple, is to save the eax data on the address at the [ebp-0c. As long as it is a temporary variable inside the function, you will see this form. Temporary variables are obtained by the ebp offset address.
Have you ever wondered if p is a global variable?
45: int m = 10;
004012E8 mov dword ptr [ebp-4], 0Ah
46: int n = 20;
004012EF mov dword ptr [ebp-8], 14 h
47: p = m + n;
004012F6 mov eax, dword ptr [ebp-4]
004012F9 add eax, dword ptr [ebp-8]
004012FC mov [p (0042b0b4)], eax
Seeing the code above, we found that the assignment direction of m and n has not changed. The change is that the value of the last register eax is assigned an absolute address 0x42b0b4. This illustrates a problem. After the program is loaded into the memory, the global variable has an independent address space and will not change with the stack floating.
As we have said before, all variables in the function will be stored in the stack space between ebp and esp. How does the Code work? Can we see such a piece of assembly code?
41: void process ()
42 :{
004012D0 push ebp
004012D1 mov ebp, esp
004012D3 sub esp, 4Ch
004012D6 push ebx
004012D7 push esi
004012D8 push edi
004012D9 lea edi, [ebp-4Ch]
004012DC mov ecx, 13 h
004012E1 mov eax, 0 CCCCCCCCh
004012E6 rep stos dword ptr [edi]
43: int m = 10;
004012E8 mov dword ptr [ebp-4], 0Ah
44: int n = 20;
004012EF mov dword ptr [ebp-8], 14 h
45: int p = m + n;
004012F6 mov eax, dword ptr [ebp-4]
004012F9 add eax, dword ptr [ebp-8]
004012FC mov dword ptr [ebp-0Ch], eax
46 :}
Let's print out the complete code of a function just now. We found that, in fact, before the m operation of the temporary variable, the function has done a lot of preparatory operations, the main purpose is to: (1) prepare space for the temporary variable; (2) store the registers used in function operations. This is because the registers are resources shared by all functions. If the original data is not well recorded, after the function returns, the register will forget the original value and cannot continue to calculate it correctly in the original state. There are 10 sentences between address 0x4012D0 and address 0x4012E6. The first sentence is ebp pressure stack; the second sentence is esp copied to ebp; the third sentence is esp auto-reduced 4C size, which is usually determined by the number of temporary variables defined inside the function; and the fourth sentence, ebx pressure stack; the fifth sentence, esi pressure stack; the sixth sentence, edi pressure stack; the seventh sentence to the Tenth sentence, the [ebp-4C] at the top of the 0x4C bytes all set to CC, edi is the starting address, ecx is the number of cycles 0x13, and dword indicates that four bytes are set each time.
So what does the function do before it returns?
46 :}
004012FF pop edi
00401300 pop esi
00401301 pop ebx
00401302 mov esp, ebp
00401304 pop ebp
00401305 ret
In fact, the content returned by the function is very simple. The first sentence is the edi stack, the second sentence is the esi stack, and the third sentence is the ebx stack, which is in the opposite order of the previous register stack. The last three sentences are particularly important. We can see that ebp is copied to esp, ebp goes out of the stack, and the function returns, so that everything is restored to the status before the function call.
So how does the input parameter handle the function call?
53: process (20 );
0040EFA4 push 14 h
0040EFA6 call @ ILT + 40 (process) (0040102d)
0040 EFAB add esp, 4
The above code is the case when the process function contains a parameter. After the function is called, esp + 4 and the stack is restored. Stack + 4, mainly because the parameter space is 4 bytes. The following figure shows the stack space when a function is called:
| Function parameters |
| Return address |
| Temporary Variable | <------------------------ ebp
| Pressure stack register |
| Stack top | <------------------------- esp
Other knowledge:
(1) There are many global computing cpu registers, such as eax, ebx, ecx, and edx. What we usually call ax, bx, cx, dx refers to their low position.
(2) The segment register stores the code segment, data segment, and stack segment of the program. The code segment stores all program code, the Data Segment stores the code of the full data variable, and the stack is all the stack space.
(3) Currently, the vc compiler supports Embedded Assembly. If you are interested, you can try it in the function. The following code is just an example:
Void process (int * q)
{
_ Asm {
Push eax
Push ebx
Push ecx
Mov eax, 0x10
Mov ebx, 0x15
Add eax, ebx
Mov ecx, q
Mov [ecx], eax
Pop ecx
Pop ebx
Pop eax
}
}
(Complete)