C ++ from the perspective of assembly (x86 Assembly)

Last Update:2013-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When we look at the C ++ language from the perspective of assembly, how to read assembly code becomes a problem we need to solve. In fact, to be honest, compilation is not difficult. We only need to understand the following issues:

(1) What language is assembly?

(2) What are the main contents in assembly?

(3) How does the Assembly Language correspond to the actual C/C ++ language code one by one?

(1) Languages of Assembly

In fact, assembly language is a mark of the CPU instruction code. Different CPUs have different instruction sets. CPUs on normal PCs generally come from AMD or INTEL, which is the X86 instruction set we are talking about today. Other similar CPUs include POWERPC, which are mainly used by vswitches and vrouters of telecom enterprises; ARM type, which is mainly used by smart terminal or instrument devices; SUN-type, it is mainly used by SUN servers. Because the CPU Instruction Set and binary code are almost one-to-one, the assembly language not only helps us quickly understand the hardware of the machine, but also helps us understand how the program runs on the device.

(2) What are the contents of assembly languages?

There are a lot of content in the assembly language, but there are actually not many content related to our C/C ++ language. In general, you only need to know the basic operations and address access between registers, segment addresses, stacks, and registers.

(3) How does the Assembly Language correspond to the actual language one by one?

We start with an example. Generally, a statement must be split into several Assembly statements. For example:

Int m = 10;

Int n = 20;

Int p = m + n;

Let's assume that m, n, and p are all in a function, so in fact, all three variables are temporary variables. before entering the function, both ebp and esp need to free up space to prepare for these temporary variables. These three statements should be explained in this way.

43: int m = 10;

004012E8 mov dword ptr [ebp-4], 0Ah

44: int n = 20;

004012EF mov dword ptr [ebp-8], 14 h

45: int p = m + n;

004012F6 mov eax, dword ptr [ebp-4]

004012F9 add eax, dword ptr [ebp-8]

004012FC mov dword ptr [ebp-0Ch], eax

We can intuitively see the correspondence between the Assembly Statement and the C language through the code above. In the first sentence, m is assigned a value of 10, and the memory is the downward memory of ebp. In the second sentence, it is similar to the first sentence. In the third sentence, it is a little complicated. Let's analyze it. First we can see that the CPU from the stack m data found out, that is, the [ebp-4] address at the data, then, the CPU uses the same method to find the n data and directly add it to the Register eax. The last step is relatively simple, is to save the eax data on the address at the [ebp-0c. As long as it is a temporary variable inside the function, you will see this form. Temporary variables are obtained by the ebp offset address.

Have you ever wondered if p is a global variable?

45: int m = 10;

004012E8 mov dword ptr [ebp-4], 0Ah

46: int n = 20;

004012EF mov dword ptr [ebp-8], 14 h

47: p = m + n;

004012F6 mov eax, dword ptr [ebp-4]

004012F9 add eax, dword ptr [ebp-8]

004012FC mov [p (0042b0b4)], eax

Seeing the code above, we found that the assignment direction of m and n has not changed. The change is that the value of the last register eax is assigned an absolute address 0x42b0b4. This illustrates a problem. After the program is loaded into the memory, the global variable has an independent address space and will not change with the stack floating.

As we have said before, all variables in the function will be stored in the stack space between ebp and esp. How does the Code work? Can we see such a piece of assembly code?

41: void process ()

42 :{

004012D0 push ebp

004012D1 mov ebp, esp

004012D3 sub esp, 4Ch

004012D6 push ebx

004012D7 push esi

004012D8 push edi

004012D9 lea edi, [ebp-4Ch]

004012DC mov ecx, 13 h

004012E1 mov eax, 0 CCCCCCCCh

004012E6 rep stos dword ptr [edi]

43: int m = 10;

004012E8 mov dword ptr [ebp-4], 0Ah

44: int n = 20;

004012EF mov dword ptr [ebp-8], 14 h

45: int p = m + n;

004012F6 mov eax, dword ptr [ebp-4]

004012F9 add eax, dword ptr [ebp-8]

004012FC mov dword ptr [ebp-0Ch], eax

46 :}

Let's print out the complete code of a function just now. We found that, in fact, before the m operation of the temporary variable, the function has done a lot of preparatory operations, the main purpose is to: (1) prepare space for the temporary variable; (2) store the registers used in function operations. This is because the registers are resources shared by all functions. If the original data is not well recorded, after the function returns, the register will forget the original value and cannot continue to calculate it correctly in the original state. There are 10 sentences between address 0x4012D0 and address 0x4012E6. The first sentence is ebp pressure stack; the second sentence is esp copied to ebp; the third sentence is esp auto-reduced 4C size, which is usually determined by the number of temporary variables defined inside the function; and the fourth sentence, ebx pressure stack; the fifth sentence, esi pressure stack; the sixth sentence, edi pressure stack; the seventh sentence to the Tenth sentence, the [ebp-4C] at the top of the 0x4C bytes all set to CC, edi is the starting address, ecx is the number of cycles 0x13, and dword indicates that four bytes are set each time.

So what does the function do before it returns?

46 :}

004012FF pop edi

00401300 pop esi

00401301 pop ebx

00401302 mov esp, ebp

00401304 pop ebp

00401305 ret

In fact, the content returned by the function is very simple. The first sentence is the edi stack, the second sentence is the esi stack, and the third sentence is the ebx stack, which is in the opposite order of the previous register stack. The last three sentences are particularly important. We can see that ebp is copied to esp, ebp goes out of the stack, and the function returns, so that everything is restored to the status before the function call.

So how does the input parameter handle the function call?

53: process (20 );

0040EFA4 push 14 h

0040EFA6 call @ ILT + 40 (process) (0040102d)

0040 EFAB add esp, 4

The above code is the case when the process function contains a parameter. After the function is called, esp + 4 and the stack is restored. Stack + 4, mainly because the parameter space is 4 bytes. The following figure shows the stack space when a function is called:

| Function parameters |

| Return address |

| Temporary Variable | <------------------------ ebp

| Pressure stack register |

| Stack top | <------------------------- esp

Other knowledge:

(1) There are many global computing cpu registers, such as eax, ebx, ecx, and edx. What we usually call ax, bx, cx, dx refers to their low position.

(2) The segment register stores the code segment, data segment, and stack segment of the program. The code segment stores all program code, the Data Segment stores the code of the full data variable, and the stack is all the stack space.

(3) Currently, the vc compiler supports Embedded Assembly. If you are interested, you can try it in the function. The following code is just an example:

Void process (int * q)

{

_ Asm {

Push eax

Push ebx

Push ecx

Mov eax, 0x10

Mov ebx, 0x15

Add eax, ebx

Mov ecx, q

Mov [ecx], eax

Pop ecx

Pop ebx

Pop eax

}

(Complete)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C ++ from the perspective of assembly (x86 Assembly)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support