List of undefined behaviors in C Language

Source: Internet
Author: User

A few weeks ago, one of my colleagues came to my desk with a programming question. Recently, we have been asking each other about the C language knowledge, So I smiled and took the courage to face the coming hell.

He wrote several lines of code on the whiteboard and asked what the program would output?

 
 
  1. #include <stdio.h> 
  2.   
  3. int main(){ 
  4.     int i = 0; 
  5.     int a[] = {10,20,30}; 
  6.   
  7.     int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++]; 
  8.     printf("%d\n", r); 
  9.     return 0; 

It looks quite simple and clear. I have explained the priority of operators-suffix operations are computed first than multiplication, multiplication is calculated first, and multiplication and addition are combined from left to right, so I captured the operator number and began to write the formula.

 
 
  1. int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++]; 
  2. //    =    a[0]    + 2 * a[1]  + 3 * a[2]; 
  3. //    =     10     +     40    +    90; 
  4. //    = 140 

After I wrote down the answer with pride, my colleagues responded to a simple "no ". After thinking for a few minutes, I still got stuck. I don't quite remember the combination sequence of suffix operators. In addition, I know that the ordinal order won't even change the order of value calculation here, because the combination rules will only apply to operators at the same level. However, I thought that I should try to calculate this formula based on the rule that all suffix operators are evaluated from right to left. It looks quite simple and clear.

 
 
  1. int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++]; 
  2. //    =    a[2]    + 2 * a[1]  + 3 * a[0]; 
  3. //    =     30     +     40    +    30; 
  4. //    = 100 

My colleague once again replied that the answer is still wrong. At this time, I had to admit defeat and asked him what the answer was. This short sample code was originally removed from the larger code segment he wrote. To verify his problem, he compiled and ran the larger code sample, but was surprised to find that the code was not run as expected. He deleted unnecessary steps and obtained the sample code above. He compiled the sample code with gcc 4.7.3 and output the surprising result: "60 ".

At this moment, I was fascinated. I remember that in C language, the order in which function parameters are calculated is undefined, so we thought that the suffix operator only follows a random order instead of the order from left to right. We are still confident that the suffix-specific method and multiplication have a higher operation priority, so we will soon prove ourselves that there is no order in which we can calculate I ++, add up and multiply the three array elements to get 60.

Now I am fascinated by this. My first thought was to check the disassembly code of the code and try to find out what actually happened. I used the debug symbol debugging symbols) to compile this sample code. After using objdump, I quickly got the annotated x86_64 disassembly code.

 
 
  1. Disassembly of section .text: 
  2.   
  3. 0000000000000000 <main>: 
  4. #include <stdio.h> 
  5.   
  6. int main(){ 
  7.    0:   55                      push   %rbp 
  8.    1:   48 89 e5                mov    %rsp,%rbp 
  9.    4:   48 83 ec 20             sub    $0x20,%rsp 
  10.     int i = 0; 
  11.    8:   c7 45 e8 00 00 00 00    movl   $0x0,-0x18(%rbp) 
  12.     int a[] = {10,20,30}; 
  13.    f:   c7 45 f0 0a 00 00 00    movl   $0xa,-0x10(%rbp) 
  14.   16:   c7 45 f4 14 00 00 00    movl   $0x14,-0xc(%rbp) 
  15.   1d:   c7 45 f8 1e 00 00 00    movl   $0x1e,-0x8(%rbp) 
  16.     int r = 1 * a[i++] + 2 * a[i++] + 3 * a[i++]; 
  17.   24:   8b 45 e8                mov    -0x18(%rbp),%eax 
  18.   27:   48 98                   cltq  
  19.   29:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx 
  20.   2d:   8b 45 e8                mov    -0x18(%rbp),%eax 
  21.   30:   48 98                   cltq  
  22.   32:   8b 44 85 f0             mov    -0x10(%rbp,%rax,4),%eax 
  23.   36:   01 c0                   add    %eax,%eax 
  24.   38:   8d 0c 02                lea    (%rdx,%rax,1),%ecx 
  25.   3b:   8b 45 e8                mov    -0x18(%rbp),%eax 
  26.   3e:   48 98                   cltq  
  27.   40:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx 
  28.   44:   89 d0                   mov    %edx,%eax 
  29.   46:   01 c0                   add    %eax,%eax 
  30.   48:   01 d0                   add    %edx,%eax 
  31.   4a:   01 c8                   add    %ecx,%eax 
  32.   4c:   89 45 ec                mov    %eax,-0x14(%rbp) 
  33.   4f:   83 45 e8 01             addl   $0x1,-0x18(%rbp) 
  34.   53:   83 45 e8 01             addl   $0x1,-0x18(%rbp) 
  35.   57:   83 45 e8 01             addl   $0x1,-0x18(%rbp) 
  36.     printf("%d\n", r); 
  37.   5b:   8b 45 ec                mov    -0x14(%rbp),%eax 
  38.   5e:   89 c6                   mov    %eax,%esi 
  39.   60:   bf 00 00 00 00          mov    $0x0,%edi 
  40.   65:   b8 00 00 00 00          mov    $0x0,%eax 
  41.   6a:   e8 00 00 00 00          callq  6f <main+0x6f> 
  42.     return 0; 
  43.   6f:   b8 00 00 00 00          mov    $0x0,%eax 
  44.   74:   c9                      leaveq 
  45.   75:   c3                      retq 

The first and last commands only establish the stack structure, initialize the value of the variable, call the printf function, and return the result from the main function. Therefore, we only need to care about the commands from 0x24 to 0x57. This is where the behavior is interesting. Let's check several commands each time.

 
 
  1. 24:   8b 45 e8                mov    -0x18(%rbp),%eax 
  2. 27:   48 98                   cltq  
  3. 29:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx 

The first three commands are as expected. First, it loads the I (0) value to the eax register, expands it with a symbol to 64 bits, and then loads a [0] To The edx register. Here multiplied by 1) is obviously removed by the compiler after optimization, but everything looks normal. The following commands are roughly the same at the beginning.

 
 
  1. 2d:   8b 45 e8                mov    -0x18(%rbp),%eax 
  2. 30:   48 98                   cltq  
  3. 32:   8b 44 85 f0             mov    -0x10(%rbp,%rax,4),%eax 
  4. 36:   01 c0                   add    %eax,%eax 
  5. 38:   8d 0c 02                lea    (%rdx,%rax,1),%ecx 

The first mov command loads the I value into the eax register, expands it to 64-bit with a symbol, and then loads a [0] into the eax register. An interesting thing happened. We hope I ++ has run these three commands again, but maybe the last two Commands will use some compilation magic to get the expected results (2 * a [1]). These two commands Add the value of the eax register one time, actually execute the 2 * a [0] operation, and then add the result to the previous calculation result, coexist in the ecx register. The command has obtained the value of a [0] + 2 * a [0. It seems a bit strange at first, but again, maybe a compiler magic is happening.

 
 
  1. 3b:   8b 45 e8                mov    -0x18(%rbp),%eax 
  2. 3e:   48 98                   cltq  
  3. 40:   8b 54 85 f0             mov    -0x10(%rbp,%rax,4),%edx 
  4. 44:   89 d0                   mov    %edx,%eax 

The following commands start to look quite familiar. They load the I value is still 0), the signed extension to 64-bit, load a [0] To The edx register, and then copy the value in the edx to eax. Well, let's look at it more:

 
 
  1. 46:   01 c0                   add    %eax,%eax 
  2. 48:   01 d0                   add    %edx,%eax 
  3. 4a:   01 c8                   add    %ecx,%eax 
  4. 4c:   89 45 ec                mov    %eax,-0x14(%rbp) 

Here, a [0] is automatically added three times, followed by the previous calculation results, and then saved to the variable "r ". What's incredible now -- our variable r now contains a [0] + 2 * a [0] + 3 * a [0]. Sure enough, that is, the output of the program: "60 ". But what happened to the suffix operators? They are all at the end:

 
 
  1. 4f:   83 45 e8 01             addl   $0x1,-0x18(%rbp) 
  2. 53:   83 45 e8 01             addl   $0x1,-0x18(%rbp) 
  3. 57:   83 45 e8 01             addl   $0x1,-0x18(%rbp) 

It seems that the code of our compiled version is completely wrong! Why is the suffix operator dropped to the bottom and after all tasks have been completed? As my belief in reality decreases, I decided to find the source directly. No, it's not the source code of the compiler -- it's just the implementation -- I grabbed the C11 language specification.

This problem lies in the details of suffix operators. In our case, we performed three suffix auto-increment operations on the array subscript in a single expression. When the suffix operator is calculated, it returns the initial value of the variable. Allocating new values back to variables is a side effect. The result is that the side effect is defined as being put only between the ordered points. Refer to Chapter 5.1.2.3 of the standard, where the details of sequence points are defined. However, in our example, our expression shows undefined behavior. It depends entirely on the side effect of the compiler on when to assign a new value to the variable, and it will execute other parts relative to the expression.

In the end, we both learned a little new C language knowledge. As we all know, the best application is to avoid constructing complex prefix and suffix expressions, which is an excellent example of why this is necessary.

Http://blog.jobbole.com/53211/.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.