[Disclaimer: All Rights Reserved. You are welcome to reprint it. Do not use it for commercial purposes. Contact Email: feixiaoxing @ 163.com]
Speaking of how to read the Assembly from the perspective of the C ++ LanguageCodeIt becomes a problem we need to solve. In fact, to be honest, compilation is not difficult. We only need to understand the following issues:
(1) What language is assembly?
(2) What are the main contents in assembly?
(3) How does the Assembly Language correspond to the actual C/C ++ language code one by one?
(1) Languages of Assembly
In fact, assembly language is a mark of the CPU instruction code. Different CPUs have different instruction sets. CPUs on normal PCs generally come from AMD or Intel, which is the x86 instruction set we are talking about today. Other similar CPUs include PowerPC, which are mainly used by vswitches and routers of telecom enterprises. Arm type, which is mainly used by smart terminals or devices in the category of devices; Sun or, it is mainly used by Sun servers. Because the CPU Instruction Set and binary code are almost one-to-one, the assembly language can not only help us quickly understand the hardware of the machine, but also help us understandProgramHow is it running on the device.
(2) What are the contents of assembly languages?
There are a lot of content in the assembly language, but there are actually not many content related to our C/C ++ language. In general, you only need to know the basic operations and address access between registers, segment addresses, stacks, and registers.
(3) How does the Assembly Language correspond to the actual language one by one?
We start with an example. Generally, a statement must be split into several Assembly statements. For example:
[CPP] View plaincopy
- IntM = 10;
- IntN = 20;
- IntP = m + N;
Let's assume that M, N, and P are all in a function, so in fact, all three variables are temporary variables. before entering the function, both EBP and ESP need to free up space to prepare for these temporary variables. These three statements should be explained in this way.
[CPP] View plaincopy
-
- 43:IntM = 10;
-
- 004012e8 mov dword ptr [ebp-4], 0ah
- 44:IntN = 20;
-
- 004012ef mov dword ptr [ebp-8], 14 h
-
- 45:IntP = m + N;
-
- 004012f6 mov eax, dword ptr [ebp-4]
-
- 004012f9 add eax, dword ptr [ebp-8]
- 004012fc mov dword ptr [ebp-0Ch], eax
We can intuitively see the correspondence between the Assembly Statement and the C language through the code above. In the first sentence, M is assigned a value of 10, and the memory is the downward memory of EBP. In the second sentence, it is similar to the first sentence. In the third sentence, it is a little complicated. Let's analyze it. First we can see that the CPU from the stack M data found out, that is, the [ebp-4] address at the data, then, the CPU uses the same method to find the n data and directly add it to the Register eax. The last step is relatively simple, is to save the eax data on the address at the [ebp-0c. As long as it is a temporary variable inside the function, you will see this form. Temporary variables are obtained by the EBP offset address.
Have you ever wondered if p is a global variable?
[CPP] View plaincopy
-
- 45:IntM = 10;
-
- 004012e8 mov dword ptr [ebp-4], 0ah
- 46:IntN = 20;
-
- 004012ef mov dword ptr [ebp-8], 14 h
-
- 47: P = m + N;
-
- 004012f6 mov eax, dword ptr [ebp-4]
-
- 004012f9 add eax, dword ptr [ebp-8]
-
- 004012fc mov [P (0042b0b4)], eax
Seeing the code above, we found that the assignment direction of M and N has not changed. The change is that the value of the last register eax is assigned an absolute address 0x42b0b4. This illustrates a problem. After the program is loaded into the memory, the global variable has an independent address space and will not change with the stack floating.
As we have said before, all variables in the function will be stored in the stack space between EBP and esp. How does the Code work? Can we see such a piece of assembly code?
[CPP] View plaincopy
-
- 41:VoidProcess ()
-
- 42 :{
-
- 004012d0 push EBP
-
- 004012d1 mov EBP, ESP
-
- 004012d3 sub ESP, 4ch
- 004012d6 push EBX
-
- 004012d7 push ESI
-
- 004012d8 push EDI
-
- 004012d9 Lea EDI, [ebp-4Ch]
-
- 004012dc mov ECx, 13 H
-
- 004012e1 mov eax, 0 cccccccch
-
- 004012e6 rep STOs dword ptr [EDI]
- 43:IntM = 10;
-
- 004012e8 mov dword ptr [ebp-4], 0ah
-
- 44:IntN = 20;
-
- 004012ef mov dword ptr [ebp-8], 14 h
-
- 45:IntP = m + N;
- 004012f6 mov eax, dword ptr [ebp-4]
-
- 004012f9 add eax, dword ptr [ebp-8]
-
- 004012fc mov dword ptr [ebp-0Ch], eax
-
- 46 :}
Let's print out the complete code of a function just now. We found that, in fact, before the M operation of the temporary variable, the function has done a lot of preparatory operations, the main purpose is to: (1) prepare space for the temporary variable; (2) store the registers used in function operations. This is because the registers are resources shared by all functions. If the original data is not well recorded, after the function returns, the register will forget the original value and cannot continue to calculate it correctly in the original state. There are 10 sentences between address 0x4012d0 and address 0x4012e6. The first sentence is EBP pressure stack; the second sentence is ESP copied to EBP; the third sentence is ESP auto-reduced 4C size, which is generally determined by the number of temporary variables defined inside the function; the fourth sentence is EBX pressure stack; the fifth sentence is ESI pressure stack; the sixth sentence is EDI pressure stack; the seventh to tenth sentences are, set all 0x4 bytes above the [ebp-4C] to CC, EDI as the starting address, ECx as the number of cycles 0x13 times, DWORD indicates that 4 bytes are set each time.
So what does the function do before it returns?
[CPP] View plaincopy
- 46 :}
- 004012ff pop EDI
- 00401300 pop ESI
- 00401301 pop EBX
- 00401302 mov ESP, EBP
- 00401304 pop EBP
- 00401305 RET
In fact, the content returned by the function is very simple. The first sentence is the EDI stack, the second sentence is the ESI stack, and the third sentence is the EBX stack, which is in the opposite order of the previous register stack. The last three sentences are particularly important. We can see that EBP is copied to ESP, EBP goes out of the stack, and the function returns, so that everything is restored to the status before the function call.
So how does the input parameter handle the function call?
[CPP] View plaincopy
- 53: Process (20 );
- 0040efa4 push 14 h
- 0040efa6 call @ ILT + 40 (process) (0040102d)
- 0040 efab add ESP, 4
The above code is the case when the process function contains a parameter. After the function is called, esp + 4 and the stack is restored. Stack + 4, mainly because the parameter space is 4 bytes. The following figure shows the stack space when a function is called:
| Function parameters |
| Return address |
| Temporary Variable | <------------------------ EBP
| Pressure stack register |
| Stack top | <------------------------- ESP
Other knowledge:
(1) There are many global computing CPU registers, such as eax, EBX, ECx, and EDX. What we usually call ax, BX, CX, DX refers to their low position.
(2) The segment register stores the code segment, data segment, and stack segment of the program. The code segment stores all program code, the Data Segment stores the code of the full data variable, and the stack is all the stack space.
(3) Currently, the VC compiler supports Embedded Assembly. If you are interested, you can try it in the function. The following code is just an example:
[CPP] View plaincopy
- VoidProcess (Int* Q)
-
- {
-
- _ ASM {
-
- Push eax
-
- Push EBX
-
- Push ECx
-
- MoV eax, 0x10
-
- MoV EBX, 0x15
- Add eax, EBX
-
- MoV ECx, Q
-
- MoV [ECx], eax
-
- Pop ECx
-
- Pop EBX
-
- Pop eax
-
- }
-
- }
(Complete)
[ notice: the following blog describes assembly languages and pointers .