Explanation of the Self-added (++) Operator in C language in different Compilers

Last Update:2013-11-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This memo is a question sent by my classmates. I did forget whether I encountered this problem when I was learning. I rarely used it like this, in addition, pure mathematical computation has never been written. It is better to use matlab.

In fact, C language is an exquisite language and I think it is the most comfortable language, but it does not have object-oriented features. The extended complexity of C ++ syntax increases, and various libraries are also quite painful, MFC has become yellow yesterday. I don't know how Object-C works. I think it should be okay for Apple. It would be nice if I had to write a C object-oriented extension superset one day, and I thought about the language name as I understood it (this is the simplest job). Unfortunately, I don't have the skills, who told me that I learned a lot about compilation principles.

Let's start by saying nothing.

The content is very simple, and it has never been used in compilation for many years. It is very unfamiliar. I hope that you will not joke, but just make a memo on your own.

But this question is really good. I have studied it.

The procedure is as follows, which is very simple:

# Include <stdio. h>

# Include <stdlib. h>

Intmain ()

{

Int a = 1, B = 3, c = 0;

A = (++ B) + (++ B );

Printf ("a = % d \ nb = % d \ n", a, B );

Return0;

}

To be accurate, this is a program designed to understand the self-added operators. In fact, this is a very bad piece of code, even though it is a little more efficient, why is it so bad, the reason is that different compilers have different interpretations.

I started to see the results of my classmates running under vc I ate a pound (not fat). It should be said that I should also access similar programs when I was learning TC, however, no special results were found, but they did not run in VC.

So I ran it once in GCC: I found the same result as VC. Of course, these two compilers are different.

When C # Is run once, the result is 15 that is understood by normal people.

What is GCC? The answer is 16; the values of self-added B are the same.

In our normal understanding, it seems to be 4 + 5 + 6 = 15; but why is it 16 under GCC? In addition, VC is 16, and I want to tell you that TC is 18;

I tried python just now and found that this is not an auto-increment operation.

I tried it for a long time and didn't understand what was going on. Forget it. Let's take a look at the compilation code.

Let's take a look at the assembly code. To be honest, LINUX has never used assembly. The 8086 assembly is based on Intel. We know that assembly is a language closely related to hardware. The syntax on different platforms is different, the pseudocode is also different.

There are a lot of assembly code, which can be viewed in VC. The code is much more concise, mainly blocking some underlying things.

We know a piece of C code, through syntax analysis, preprocessing, compilation, links, and finally becomes an executable file. In the memory, in addition to the code you write, there are a series of data structures such as stack segments. It does not work.

We can see the key part: a = (++ B) + (++ B );

First, explain the Assembly. After reading it, The AT&T assembly is used in LINUX (I said it was a bit strange at the beginning), which is different from Intel, most of the pseudo commands are consistent;

Addition, movement, and other operations. The right side is the target operand, and the left side is the source operand, which is opposite to Intel;

ADDL ---- it was a bit confusing at the beginning. Is it added to the left? It is actually ADD. "L" indicates that the operand is a 32-bit LONG type. I will clean it;

$0x3 ---- 0x? 16. What does the above dollar sign mean? What is the number of addresses? Later, I checked the representation of the number immediately. Nima is the mov esp 0x3 in Intel.

% Esp ----- esp, register, front %, ah, don't explain, or a sign, the register below AT&T is started with %, esp and so on there are a total of 8 32bit registers, and edx.

My ability can also explain the Section a = (++ B) + (++ B:

1, the first is addl $0x1, 0x1c (% esp), is to add 1 to the Right Register, 0x1c seems to be the address label

2. The same statement;

3. mov statement: Put the self-added esp value into the eax register;

4. add: add the numbers in eax to itself;

5. addl: Add esp to the auto-increment value of 1.

6. Now add esp to the eax register;

7. Put the value in eax into variable;

We can see the execution process of this expression. First, we add variable B twice !!! Then, add B to the auto-increment, and then add the sum with the preceding sum to get the final result.

How can I add it twice? We know that ++ B is used after self-addition. The key is how to understand the word "use?

A = (++ B) + (++ B );

In C language, syntax analysis adopts the maximum recognition principle, that is, reading characters from left to right until the characters cannot be interpreted.

Then, for (++ B) + (++ B), it is clear that the bracket has the highest level. Read the left side (++ B) into the stack and Add 1 first, then, read the "+" in the middle and find the left parenthesis on the right. Therefore, read the characters. Note that the "+ operation" is not executed at this time, then the second (++ B) operation is followed. The problem is: 5 or 4? The compiler automatically adds the variable directly. Therefore, it is 5, and the addition is started only after the () operation on the right side of +, that is, "use", but not 4 + 5, but 5 + 5, because B is already 5, that is, the compiler unifies the B variable as the final self-added result. Therefore, the compiler is interpreted as 5 + 5 + 6 = 16 !!!

Can it be understood like this? (++ B) + (++ B) thinks it is "used". After all, does it add up,

That is, (++ B) + (++ B) is an operation, calculated as 5 + 5, and then the B variable is automatically added on the basis of 5, therefore, 5 + 5 + 6 = 16;

Unfortunately, this is not correct. Let's take a look at this example: a = (++ B) + (B ++ ), if we follow the logic above, it should be 4 + 4 + 5 = 13, that is, after (++ B) + (B ++) is complete, it can be regarded as used, B ++ is executed, So B is 4 + 1 = 5. Unfortunately, the answer is 12. That is, the editor uses expressions to understand the word "use. However, this understanding does not seem to be able to interpret a = (++ B) + (++ B). If the expression is used as the unit, then it seems that the auto-increment should be completed first, and then the addition should be completed (this is explained from the perspective of the human). Therefore, the result is 6 + 6 + 6 = 18, but not under GCC, but what I want to say is that the compiler in TC understands this !!!

Let's take a look at the situation of a = (++ B) + (B ++:

From the compilation, we can clearly see the execution process.

It seems like a bit of an eye: Compiler !!

If we modify the program as follows:

# Include <stdio. h>

# Include <stdlib. h>

Int main ()

{

Int a = 1, B = 3, c = ++ B;

A = c + (++ B );

Printf ("a = % d \ nb = % d \ n", a, B );

Return0;

}

In fact, most people understand this meaning. This avoids the loss of B = 4 from auto-increment. It is only useful for three reasons. The above explanation is more.

It seems that we have some answers. Let's take a look at the results of a = (B ++) + (B ++.

Do you think it is very sharp !!

Let's take a look at the Assembly statement:

Three auto-increment operations are completed at the end !!!

That is, equal to a = 1 + 1 + 1, and then perform three auto-increment operations.

Let's try: What is the result of a = (++ B) + (B ++) + (++ B?

The first two seem easy:

4 + 4 = 8, right. How can I do it later? Are they all self-added first or one by one? As mentioned above, the C language is the "largest caliber" read, and the computation is completed on the right (for GCC compiler rules ).

Therefore, after 8 is calculated, read "+", then read the right side (++ B), calculate the result 8 + 5 = 13, and then B + 1 = 6; the final result is 13 + 6 = 19!

B = ???

Well, let's say 6 at the beginning. In fact, B = 7. Why? Forget that there is B ++. This is part of the final calculation.

For example, a = (++ B) + (B ++) + (++ B ); this abnormal expression! I can also write it out.

The result is (GCC): a = 37; B = 9 !!! In fact, it is mainly the understanding of the first two ++: (++ B) + (++ B). Note that ++ B is not 4, and people often think that the first one is 4, then 4 + 5, the computer does not store the number 4, so after all goes to the next (+ + B), B = 5, and then calculates B + B = 10, do you understand? Humans tend to store 4 additional resources, just as in this form, c = ++ B; a = c + (++ B ); I have demonstrated the above.

Let's take a look at the understanding of the TC Compiler:

In TC, how much is B = 3; a = (++ B) + (++ B? The answer is 18;

It can be seen that the TC compiler explains this by first completing the self-addition operation to obtain the final B value, and then performing the addition operation,

I tried to disassemble TC, but the code is very readable. After searching for half a day, I found the key part:

[Html] * Referenced by a CALL at Address:
|: 0001.011A
|
: 0001.01FA 55 push bp * overwrites the base address of the stack
: 0001.01FB 8BEC mov bp, sp * place the stack offset address in bp
: 0001.01FD 56 push si
: 0001.01FE 57 push di
: 0001.01FF BF0100 mov di, 0001
: 0001.0202 BE0300 mov si, 0003 B
: 0001.0205 46 inc si ++ B
: 0001.0206 46 inc si ++ B
: 0001.0207 46 inc si ++ B
: 0001.0208 8BFE mov di, si
: 0001.020A 03FE add di, si
: 0001.020C 03FE add di, si
: 0001.020E 56 push si
: 0001.020F 57 push di
: 0001.0210 B89401 mov ax, 0194
: 0001.0213 50 push ax
: 0001.0214 E8B206 call 08C9
: 0001.0217 83C406 add sp, 0006
: 0001.021A E85410 call 1271.
: 0001.030d 33C0 xor ax, ax
: 0001.021F EB00 jmp 0221
* Referenced by a CALL at Address:
|: 0001.011A
|
: 0001.01FA 55 push bp * overwrites the base address of the stack
: 0001.01FB 8BEC mov bp, sp * place the stack offset address in bp
: 0001.01FD 56 push si
: 0001.01FE 57 push di
: 0001.01FF BF0100 mov di, 0001
: 0001.0202 BE0300 mov si, 0003 B
: 0001.0205 46 inc si ++ B
: 0001.0206 46 inc si ++ B
: 0001.0207 46 inc si ++ B
: 0001.0208 8BFE mov di, si
: 0001.020A 03FE add di, si
: 0001.020C 03FE add di, si
: 0001.020E 56 push si
: 0001.020F 57 push di
: 0001.0210 B89401 mov ax, 0194
: 0001.0213 50 push ax
: 0001.0214 E8B206 call 08C9
: 0001.0217 83C406 add sp, 0006
: 0001.021A E85410 call 1271.
: 0001.030d 33C0 xor ax, ax
: 0001.021F EB00 jmp 0221

If no, the si register saves the value of B = 3, and first increases the value by three times to 6. Then, it adds the value twice and exists in di. This is different from the explanation of the GCC compiler. Ah, I haven't used the compilation for about six years. I 've been very unfamiliar with it, and I 've forgotten a lot of it. I'll take a look at it.

Summary:

Writing code should be considered for efficiency, but avoid ambiguity, confusing expressions, and readability requirements of the program. After all, the code you write must be maintained in the future.

For self-addition operations, pay attention to the use of conditions. Sometimes you write less code and improve the efficiency. However, unexpected errors may occur. The problem is that different compilers perform optimization, so the actual execution sequence may be different from what you understand. But surely no one will write such code in the production environment. This article only explains the processing process from the perspective of Assembly. I have seen some articles explain the process from the combination of operators and precedence. In fact, it is essentially the process of selecting the compiler.

I tried to go into depth and found that many things were returned to the teacher. Sorry, Well, I took the time to review.

This article is only for reference.

From DesignLab

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Explanation of the Self-added (++) Operator in C language in different Compilers

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Explanation of the Self-added (++) Operator in C language in different Compilers

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support