The following content is only for personal understanding. If you have any mistakes, please forgive me and correct me.
A [I] [j] Use Time: 94 s
For (k = 0; k <10000; k ++)
For (I = 0; I <MAX; I ++)
For (j = 0; j <MAX; j ++)
A [I] [j] = 0;
A [j] [I] Time used: 488 s
For (k = 0; k <10000; k ++)
For (I = 0; I <MAX; I ++)
For (j = 0; j <MAX; j ++)
A [j] [I] = 0;
I used two methods to generate assembly code using gcc. Compared with diff, only the differences between the four Assembly codes are found.
1c1
<. File
"Array. c"
---
>. File
"Array1.c"
31c31
<Movl
4194352 (% esp), % eax
---
> Movl
4194356 (% esp), % eax
33c33
<Addl
4194356 (% esp), % eax
---
> Addl www.2cto.com
4194352 (% esp), % eax
In addition, these four statements will not produce performance differences in this line, and the performance difference will appear there. It may not be the difference between loop and computing data. Will the difference occur in the memory access location? No. The memory is randomly accessed. The address used to access the memory at any location should be the same. Now let's consider whether the cache function of the operating system is used. First, the program is loaded into the memory for execution and will not consume any resources except the cpu access memory. So it is not a system problem. After thinking for a long time, I thought that the cpu accessed data in blocks and put the acquired data in the cache. Because a [I] [I] is a sequential access, the data in the cpu cache can be directly used without accessing the memory. A [j] [I] does not perform sequential access. The next access location is not in the cpu cache.
Proposal: When writing code
1. sequential access to arrays and struct. Increase the cache hit rate.
2. Reduce Unnecessary judgment and increase the cpu branch prediction hit rate