Outline of undefined behaviors in C Language

Last Update:2015-10-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Christopher Cole: a glimpse of undefined behavior in c

A few weeks ago, one of my colleagues came to my desk with a programming question. Recently, we have been asking each other about the C language knowledge, So I smiled and took the courage to face the coming hell.

He wrote several lines of code on the whiteboard and asked what the program would output?

#include 
 
  int main(){    int i = 0;    int a[] = {10,20,30};    int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];    printf("%d\n", r);     return 0;}

It looks quite simple and clear. I have explained the priority of operators-suffix operations are computed first than multiplication, multiplication is calculated first, and multiplication and addition are combined from left to right, so I captured the operator number and began to write the formula.

int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];//    =    a[0]    + 2 * a[1]  + 3 * a[2];//    =     10     +     40    +    90;//    = 140

After I wrote down the answer with pride, my colleagues responded to a simple "no ". After thinking for a few minutes, I still got stuck. I don't quite remember the combination sequence of suffix operators. In addition, I know that order won't even change the order of value calculation here, because the combination rules will only apply to operators at the same level. However, I thought that I should try to calculate this formula based on the rule that all suffix operators are evaluated from right to left. It looks quite simple and clear.

int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];//    =    a[2]    + 2 * a[1]  + 3 * a[0];//    =     30     +     40    +    30;//    = 100

My colleague once again replied that the answer is still wrong. At this time, I had to admit defeat and asked him what the answer was. This short sample code was originally removed from the larger code segment he wrote. To verify his problem, he compiled and ran the larger code sample, but was surprised to find that the code was not run as expected. He deleted unnecessary steps and obtained the sample code above. He compiled the sample code with gcc 4.7.3 and output the surprising result: "60 ".

At this moment, I was fascinated. I remember that in C language, the order in which function parameters are calculated is undefined, so we thought that the suffix operator only follows a random order instead of the order from left to right. We are still confident that the suffix-specific method and multiplication have a higher operation priority, so we will soon prove ourselves that there is no order in which we can calculate I ++, add up and multiply the three array elements to get 60.

Now I am fascinated by this. My first thought was to check the disassembly code of the code and try to find out what actually happened. I compiled this sample code with the debug symbol (debugging symbols). After using objdump, I quickly got the annotated x86_64 disassembly code.

Disassembly of section .text: 0000000000000000 
 
  :#include 
  
    int main(){   0:   55                      push   %rbp   1:   48 89 e5                mov    %rsp,%rbp   4:   48 83 ec 20             sub    $0x20,%rsp    int i = 0;   8:   c7 45 e8 00 00 00 00    movl   $0x0,-0x18(%rbp)    int a[] = {10,20,30};   f:   c7 45 f0 0a 00 00 00    movl   $0xa,-0x10(%rbp)  16:   c7 45 f4 14 00 00 00    movl   $0x14,-0xc(%rbp)  1d:   c7 45 f8 1e 00 00 00    movl   $0x1e,-0x8(%rbp)    int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];  24:   8b 45 e8                mov    -0x18(%rbp),%eax  27:   48 98                   cltq    29:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx  2d:   8b 45 e8                mov    -0x18(%rbp),%eax  30:   48 98                   cltq    32:   8b 44 85 f0             mov    -0x10(%rbp,%rax,4),%eax  36:   01 c0                   add    %eax,%eax  38:   8d 0c 02                lea    (%rdx,%rax,1),%ecx  3b:   8b 45 e8                mov    -0x18(%rbp),%eax  3e:   48 98                   cltq    40:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx  44:   89 d0                   mov    %edx,%eax  46:   01 c0                   add    %eax,%eax  48:   01 d0                   add    %edx,%eax  4a:   01 c8                   add    %ecx,%eax  4c:   89 45 ec                mov    %eax,-0x14(%rbp)  4f:   83 45 e8 01             addl   $0x1,-0x18(%rbp)  53:   83 45 e8 01             addl   $0x1,-0x18(%rbp)  57:   83 45 e8 01             addl   $0x1,-0x18(%rbp)    printf("%d\n", r);  5b:   8b 45 ec                mov    -0x14(%rbp),%eax  5e:   89 c6                   mov    %eax,%esi  60:   bf 00 00 00 00          mov    $0x0,%edi  65:   b8 00 00 00 00          mov    $0x0,%eax  6a:   e8 00 00 00 00          callq  6f 
   
        return 0;  6f:   b8 00 00 00 00          mov    $0x0,%eax}  74:   c9                      leaveq  75:   c3                      retq

The first and last commands only establish the stack structure, initialize the value of the variable, call the printf function, and return the result from the main function. Therefore, we only need to care about the commands from 0x24 to 0x57. This is where the behavior is interesting. Let's check several commands each time.

24:   8b 45 e8                mov    -0x18(%rbp),%eax27:   48 98                   cltq  29:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx

The first three commands are as expected. First, it loads the I (0) value to the eax register, expands it with a symbol to 64 bits, and then loads a [0] To The edx register. The multiplication of 1 (1 *) Here is obviously removed by the compiler after optimization, but everything looks normal. The following commands are roughly the same at the beginning.

2d:   8b 45 e8                mov    -0x18(%rbp),%eax30:   48 98                   cltq  32:   8b 44 85 f0             mov    -0x10(%rbp,%rax,4),%eax36:   01 c0                   add    %eax,%eax38:   8d 0c 02                lea    (%rdx,%rax,1),%ecx

The first mov command loads the I value (still 0) into the eax register, expands it with a symbol to 64 bits, and then loads a [0] into the eax register. An interesting thing happened. We hope I ++ has run these three commands again, but maybe the last two Commands will use some compilation magic to get the expected results (2 * a [1]). These two commands Add the value of the eax register one time, actually execute the 2 * a [0] operation, then add the result to the previous calculation result, and coexist into the ecx register. The command has obtained the value of a [0] + 2 * a [0. It seems a bit strange at first, but again, maybe a compiler magic is happening.

3b:   8b 45 e8                mov    -0x18(%rbp),%eax3e:   48 98                   cltq  40:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx44:   89 d0                   mov    %edx,%eax

The following commands start to look quite familiar. They load the I value (still 0), carry the symbol to 64-bit, load a [0] To The edx register, and then copy the value in edx to eax. Well, let's look at it more:

46:   01 c0                   add    %eax,%eax48:   01 d0                   add    %edx,%eax4a:   01 c8                   add    %ecx,%eax4c:   89 45 ec                mov    %eax,-0x14(%rbp)

Here, a [0] is automatically added three times, followed by the previous calculation results, and then saved to the variable "r ". What's incredible now -- our variable r now contains a [0] + 2 * a [0] + 3 * a [0]. Sure enough, that is, the output of the program: "60 ". But what happened to the suffix operators? They are all at the end:

4f:   83 45 e8 01             addl   $0x1,-0x18(%rbp)53:   83 45 e8 01             addl   $0x1,-0x18(%rbp)57:   83 45 e8 01             addl   $0x1,-0x18(%rbp)

It seems that the code of our compiled version is completely wrong! Why is the suffix operator dropped to the bottom and after all tasks have been completed? As my belief in reality decreases, I decided to find the source directly. No, it's not the source code of the compiler -- it's just the implementation -- I grabbed the C11 language specification.

This problem lies in the details of suffix operators. In our case, we performed three suffix auto-increment operations on the array subscript in a single expression. When the suffix operator is calculated, it returns the initial value of the variable. Allocating new values back to variables is a side effect. The result is that the side effect is defined as being put only between the ordered points. Refer to Chapter 5.1.2.3 of the standard, where the details of sequence points are defined. However, in our example, our expression shows undefined behavior. It depends entirely on the side effect of the compiler on when to assign a new value to the variable, and it will execute other parts relative to the expression.

In the end, we both learned a little new C language knowledge. As we all know, the best application is to avoid constructing complex prefix and suffix expressions, which is an excellent example of why this is necessary.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Outline of undefined behaviors in C Language

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Outline of undefined behaviors in C Language

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support