Outline of undefined behaviors in C Language

Source: Internet
Author: User

Outline of undefined behaviors in C Language

Christopher Cole: a glimpse of undefined behavior in c

 

 

A few weeks ago, one of my colleagues came to my desk with a programming question. Recently, we have been asking each other about the C language knowledge, So I smiled and took the courage to face the coming hell.

He wrote several lines of code on the whiteboard and asked what the program would output?

 

#include 
 
  int main(){    int i = 0;    int a[] = {10,20,30};    int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];    printf("%d\n", r);     return 0;}
 
It looks quite simple and clear. I have explained the priority of operators-suffix operations are computed first than multiplication, multiplication is calculated first, and multiplication and addition are combined from left to right, so I captured the operator number and began to write the formula.

 

 

int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];//    =    a[0]    + 2 * a[1]  + 3 * a[2];//    =     10     +     40    +    90;//    = 140
After I wrote down the answer with pride, my colleagues responded to a simple "no ". After thinking for a few minutes, I still got stuck. I don't quite remember the combination sequence of suffix operators. In addition, I know that order won't even change the order of value calculation here, because the combination rules will only apply to operators at the same level. However, I thought that I should try to calculate this formula based on the rule that all suffix operators are evaluated from right to left. It looks quite simple and clear.

 

 

int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];//    =    a[2]    + 2 * a[1]  + 3 * a[0];//    =     30     +     40    +    30;//    = 100
My colleague once again replied that the answer is still wrong. At this time, I had to admit defeat and asked him what the answer was. This short sample code was originally removed from the larger code segment he wrote. To verify his problem, he compiled and ran the larger code sample, but was surprised to find that the code was not run as expected. He deleted unnecessary steps and obtained the sample code above. He compiled the sample code with gcc 4.7.3 and output the surprising result: "60 ".

 

At this moment, I was fascinated. I remember that in C language, the order in which function parameters are calculated is undefined, so we thought that the suffix operator only follows a random order instead of the order from left to right. We are still confident that the suffix-specific method and multiplication have a higher operation priority, so we will soon prove ourselves that there is no order in which we can calculate I ++, add up and multiply the three array elements to get 60.

Now I am fascinated by this. My first thought was to check the disassembly code of the code and try to find out what actually happened. I compiled this sample code with the debug symbol (debugging symbols). After using objdump, I quickly got the annotated x86_64 disassembly code.

 

Disassembly of section .text: 0000000000000000 
 
  :#include 
  
    int main(){   0:   55                      push   %rbp   1:   48 89 e5                mov    %rsp,%rbp   4:   48 83 ec 20             sub    $0x20,%rsp    int i = 0;   8:   c7 45 e8 00 00 00 00    movl   $0x0,-0x18(%rbp)    int a[] = {10,20,30};   f:   c7 45 f0 0a 00 00 00    movl   $0xa,-0x10(%rbp)  16:   c7 45 f4 14 00 00 00    movl   $0x14,-0xc(%rbp)  1d:   c7 45 f8 1e 00 00 00    movl   $0x1e,-0x8(%rbp)    int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++];  24:   8b 45 e8                mov    -0x18(%rbp),%eax  27:   48 98                   cltq    29:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx  2d:   8b 45 e8                mov    -0x18(%rbp),%eax  30:   48 98                   cltq    32:   8b 44 85 f0             mov    -0x10(%rbp,%rax,4),%eax  36:   01 c0                   add    %eax,%eax  38:   8d 0c 02                lea    (%rdx,%rax,1),%ecx  3b:   8b 45 e8                mov    -0x18(%rbp),%eax  3e:   48 98                   cltq    40:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx  44:   89 d0                   mov    %edx,%eax  46:   01 c0                   add    %eax,%eax  48:   01 d0                   add    %edx,%eax  4a:   01 c8                   add    %ecx,%eax  4c:   89 45 ec                mov    %eax,-0x14(%rbp)  4f:   83 45 e8 01             addl   $0x1,-0x18(%rbp)  53:   83 45 e8 01             addl   $0x1,-0x18(%rbp)  57:   83 45 e8 01             addl   $0x1,-0x18(%rbp)    printf("%d\n", r);  5b:   8b 45 ec                mov    -0x14(%rbp),%eax  5e:   89 c6                   mov    %eax,%esi  60:   bf 00 00 00 00          mov    $0x0,%edi  65:   b8 00 00 00 00          mov    $0x0,%eax  6a:   e8 00 00 00 00          callq  6f 
   
        return 0;  6f:   b8 00 00 00 00          mov    $0x0,%eax}  74:   c9                      leaveq  75:   c3                      retq
   
  
 
The first and last commands only establish the stack structure, initialize the value of the variable, call the printf function, and return the result from the main function. Therefore, we only need to care about the commands from 0x24 to 0x57. This is where the behavior is interesting. Let's check several commands each time.

 

 

24:   8b 45 e8                mov    -0x18(%rbp),%eax27:   48 98                   cltq  29:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx
The first three commands are as expected. First, it loads the I (0) value to the eax register, expands it with a symbol to 64 bits, and then loads a [0] To The edx register. The multiplication of 1 (1 *) Here is obviously removed by the compiler after optimization, but everything looks normal. The following commands are roughly the same at the beginning.

 

 

2d:   8b 45 e8                mov    -0x18(%rbp),%eax30:   48 98                   cltq  32:   8b 44 85 f0             mov    -0x10(%rbp,%rax,4),%eax36:   01 c0                   add    %eax,%eax38:   8d 0c 02                lea    (%rdx,%rax,1),%ecx
The first mov command loads the I value (still 0) into the eax register, expands it with a symbol to 64 bits, and then loads a [0] into the eax register. An interesting thing happened. We hope I ++ has run these three commands again, but maybe the last two Commands will use some compilation magic to get the expected results (2 * a [1]). These two commands Add the value of the eax register one time, actually execute the 2 * a [0] operation, then add the result to the previous calculation result, and coexist into the ecx register. The command has obtained the value of a [0] + 2 * a [0. It seems a bit strange at first, but again, maybe a compiler magic is happening.

 

 

3b:   8b 45 e8                mov    -0x18(%rbp),%eax3e:   48 98                   cltq  40:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx44:   89 d0                   mov    %edx,%eax
The following commands start to look quite familiar. They load the I value (still 0), carry the symbol to 64-bit, load a [0] To The edx register, and then copy the value in edx to eax. Well, let's look at it more:

 

 

46:   01 c0                   add    %eax,%eax48:   01 d0                   add    %edx,%eax4a:   01 c8                   add    %ecx,%eax4c:   89 45 ec                mov    %eax,-0x14(%rbp)
Here, a [0] is automatically added three times, followed by the previous calculation results, and then saved to the variable "r ". What's incredible now -- our variable r now contains a [0] + 2 * a [0] + 3 * a [0]. Sure enough, that is, the output of the program: "60 ". But what happened to the suffix operators? They are all at the end:

 

 

4f:   83 45 e8 01             addl   $0x1,-0x18(%rbp)53:   83 45 e8 01             addl   $0x1,-0x18(%rbp)57:   83 45 e8 01             addl   $0x1,-0x18(%rbp)
It seems that the code of our compiled version is completely wrong! Why is the suffix operator dropped to the bottom and after all tasks have been completed? As my belief in reality decreases, I decided to find the source directly. No, it's not the source code of the compiler -- it's just the implementation -- I grabbed the C11 language specification.

 

This problem lies in the details of suffix operators. In our case, we performed three suffix auto-increment operations on the array subscript in a single expression. When the suffix operator is calculated, it returns the initial value of the variable. Allocating new values back to variables is a side effect. The result is that the side effect is defined as being put only between the ordered points. Refer to Chapter 5.1.2.3 of the standard, where the details of sequence points are defined. However, in our example, our expression shows undefined behavior. It depends entirely on the side effect of the compiler on when to assign a new value to the variable, and it will execute other parts relative to the expression.

In the end, we both learned a little new C language knowledge. As we all know, the best application is to avoid constructing complex prefix and suffix expressions, which is an excellent example of why this is necessary.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.