In the demonstration before the embedded class this morning, I mentioned optimization in the cyclic sum (in fact, it was just the one I encountered when I searched online the day before ). The examples in the demo are as follows:
Int sum = 0; for (int I = 0; I <100; I ++) {sum + = array [I];} ************************** ** int sum1 = 0, sum2 = 0; for (int I = 0; I <100; I + = 2) {sum1 + = array [I]; sum2 + = array [I + 1];} int sum = sum1 + sum2;
At that time, I found on the Internet that the second method is better, for two reasons: first, two irrelevant operations in the loop body can be processed in parallel, reducing the running time; the second is the number of loops (from the Assembly level, it is the conditional jump), which reduces the number of times, because the conditional jump only knows where the code will jump to at the last moment.
After the demonstration, I was asked by the teacher how much optimization can be achieved in the second method. Have you tested this code?
Finally, no.
So after I came back, I tested and verified it with a larger number of cycles. The Code is as follows:
#include
#include
int main(){ DWORD start_time,end_time; int sum,i; start_time=GetTickCount(); sum=0; for(i=0;i<1000000000;i++) sum+=i; end_time=GetTickCount(); printf("%d\n",end_time-start_time); sum=0; int sum2=0,sum3=0; start_time=GetTickCount(); for(int i=0;i<1000000000;i+=2) sum2+=i,sum3+=i+1; sum=sum2+sum3; end_time=GetTickCount(); printf("%d\n",end_time-start_time);}
Running result:
5594
3328
It can be seen that the second method can indeed achieve considerable performance optimization. Now, the question is, is it because the first reason plays a major role, or is it the second reason?
I modified the code of the second method. Code 2 is as follows:
sum=0; start_time=GetTickCount(); for(int i=0;i<1000000000;i+=2) sum+=i+i+1; end_time=GetTickCount(); printf("%d\n",end_time-start_time);
Running result:
5422
2953
Frequent tests show that code 2 is indeed a little faster than the second method in Code 1. Therefore, I personally feel that parallel processing optimization is not used. That is to say,In specific circumstances, you can achieve considerable performance optimization by reducing the number of conditional jumps.
Next I will try to perform-O compilation optimization. The result is as follows:
-O1 optimization can reduce the time! However, it may be that the logic of this Code is too simple. In addition, I do not know why the running result after-O2 optimization is abnormal... Please advise if you have any reason!