We know that there are several frequently used optimization compilation options during C language compilation, namely-O0,-O1,-O2,-O3, and-OS. I have always thought that since it is an optimization option, most of it is to optimize the logic, improve some efficiency or reduce the program size. Rarely think that they will affect the final results of the program. It was not until recently discovered a bug in a program on the ARM platform that these optimization options were sometimes less intelligent. Or the ARM platform is not so intelligent.
First, let's look at this program, which is a simple program for me to solve the problem:
#include<stdio.h><.h> buffer[] = {,,,,,,, iTest = *p = ( *)(buffer + &iTest, , buffer[, buffer[
At first glance, I think there is no problem with this program. Then we will name this program file point. c. Then, use the cross-compilation chain for the following compilation:
Arm-xxx-linux-gcc point. c-o point0-O0
Arm-xxx-linux-gcc point. c-o point1-O1
Arm-xxx-linux-gcc point. c-o point2-O2
Finally, three programs were executed separately, but the results were a little surprising:
./Point0
6
34
./Point1
34
0
./Point2
6
0
The result is consistent with the hypothetical result only when-O0 is not optimized. However, there is no problem with the x86 platform.
So I used the following command to generate the assembly code under different optimization options to determine what went wrong on the ARM platform.
Arm-xxx-linux-gcc point. c-o point0.s-O0-S
Arm-xxx-linux-gcc point. c-o point1.s-O1-S
Arm-xxx-linux-gcc point. c-o point2.s-O2-S
Then compare the three Assembly codes and find the problem lies in memcpy.
In point0.s, the program calls memcpy honestly, and then places 0x12345678 in bytes to the buffer + 7 position.
In point1.s, the program does not call memcpy, but uses the following statement:
Str r3, [sp, #7]
In this case, r3 stores 0x12345678. Since the ARM platform I use is 32-bit, the address line should not change when this statement is executed, therefore, the final result is that the data from buffer + 4 to buffer + 7 is overwritten, rather than the data from buffer + 7 to buffer + 10 is modified.
In point2.s, it seems that the pipeline has been optimized, and the program execution sequence will change. the sequence in which the initial values are assigned to some buffer locations is in str r3, [sp, #7], so the data at buffer + 6 is correct 6.
After analysis, some people may say that writing a simple program will lead to different results because of different compilation Optimization Options. Does memcpy dare not use it?
In fact, as long as you have good programming habits, you will not encounter such problems, such as the following program:
#include<stdio.h><.h> buffer[] = {,,,,,,, iTest = *p = buffer + &iTest, , buffer[, buffer[
In fact, this program simply changes the p type and ensures that the results are the same under various optimizations. It can be seen how important good programming habits are.