One piece of data is dependent and cannot be vectorized. Comment out and check the effect.
/**Name : vectorized.c*Purpose: test the performance of intel compiler*Author: Albert*/#include <stdio.h>#include <sys/time.h>#include <string.h>const int MAX = 2000000;const int ITERMAX = 20000;int main(void) {int i, iter,arr[MAX];timeval start,end;long long tPassed = 0;memset(arr, 0, sizeof(arr));gettimeofday(&start,0);for (iter = 1; iter < ITERMAX; iter++) {//for(i = 1; i < MAX; i++){//arr[i] = arr[i-1] + 1;//}for(i = 1; i < MAX; i++){ arr[i] += 1; }}gettimeofday(&end,0);end.tv_sec -= start.tv_sec;end.tv_usec -= start.tv_usec;tPassed = 1000000LL * end.tv_sec + end.tv_usec; tPassed /= 1000;printf("%lld ms\n",tPassed);return 0;}
Environment:
Linux v3901 2.6.18-53. EL5 #1 SMP wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux, 32 GB memory, Intel (r) Xeon (r) CPU e5450 @ 3.00 GHz, 8 cores, l1-Cahce 6144 kb.
[scwangj@v3901 simple]$ icpc vectorized.c -o ivectorized vectorized.c(19): (col. 2) remark: PERMUTED LOOP WAS VECTORIZED.[scwangj@v3901 simple]$ g++ vectorized.c -o vectorized[scwangj@v3901 simple]$ ./ivectorized 3696 ms[scwangj@v3901 simple]$ ./vectorized 114530 ms
Time is faster than 30 times and more! I personally think this result is 16 times more than expected, and it is not very reliable because the test program is not suitable and the expression is too simple. Please try again later.