To improve program efficiency, we should make full use of the high-speed cache of CPU. To write a program that is friendly to the CPU cache, you must first understand the operating mechanism of the CPU cache.
I5-2400S:
1. There are three levels of cache: 32 k (data and command cache separated, divided into 32 k), 256 K, 6144 K (shared among four CPUs );
2. If the clock speed is 2.5 GB, a clock cycle is 1/2. 5x10 ^ 9 = 0.4ns (clock speed = 1/clock cycle ).
CPI:
The machine cycle required for executing each command in the CPU is different CPI: Average number of average clock cycles for each command. Note: one machine cycle is equal to several clock cycles, for example, a machine cycle is equal to five clock cycles.
MIPS:
MIPS = millions of commands executed per second = 1/(CPI × clock cycle) = clock speed/CPI, run cat/proc/cpuinfo | grep bogomips command to view mips in linux, for i5-2400s CPU its mips = 4988.85
So we can calculate the average time required to execute a command T = 1/(4988.85x10 ^ 6) = 0.2ns. Note: Multiple commands can be processed in parallel in each CPU clock.
Memory System:
The latency from the core to the primary storage varies from 10 to nanoseconds. Within 100 ns, A 2.5GH CPU can process up to 500/T = commands. Therefore, the CPU uses the cache subsystem to avoid the latency of processing core access to the primary memory, this allows the CPU to process commands more efficiently. Therefore, the high-speed cache hit rate provided by the program in programming can greatly improve the program performance. In particular, we should focus on the design of the main data structure.
Cache Overview