Article welcome reprint, but reproduced when please retain this paragraph text, and placed on the top of the article
Lu Junyi (Cenalulu)
This text address: http://cenalulu.github.io/linux/all-about-cpu-cache/
Let's take a look at a mind map of all the concepts in this article.
Why do I need a CPU Cache
As the process has increased in recent decades CPU
, and has been constrained by manufacturing processes and cost constraints, the current computer's memory is mostly DRAM
and does not have a qualitative breakthrough in access speed. As a result, CPU
the processing speed and memory access speed gap is increasing, even tens of thousands of times. In this case, the CPU
traditional FSB
way of direct-attached memory is obviously due to the memory access waiting, resulting in a large number of idle computing resources, reducing CPU
overall throughput. At the same time, due to the focus of memory data access hotspot, CPU
between and memory with a relatively fast and costly to SDRAM
do a layer of cache, it appears cost-effective very high.
Why should I have a multi-level CPU Cache
With the development of science and Technology, the volume of hot-spot data is getting bigger, the simple increase of the first-level cache size is very low price. As a result, there is a level L1 Cache
two cache () that adds a layer of access speed and cost between the first level cache () and the memory L2 Cache
. Here is an excerpt from the What Every Programmer Should Know About Memory
explanation:
Soon after the introduction of the cache, the system got more complicated. The speed difference between the cache and the main memory increased again, to a point that another level of cache is add Ed, bigger and slower than the First-level cache. Only increasing the size of the First-level cache is not a option for economical rea-sons.
In addition, due to the difference in behavior and hotspot distribution of program instruction and program data, it is L1 Cache
also divided into L1i
( i for instruction
) and L1d
( d for data
) Two special-purpose caches.
The following diagram shows the response time gap between cache levels and how slow the memory is!
What is cache line
Cache Line
Can be simply understood as the CPU Cache
smallest cache unit in. The size of the current mainstream CPU Cache
Cache Line
is 64Bytes
. Assuming that we have a 512
byte-level cache, the number of caches that this first- 64B
level cache can hold is 512/64 = 8, according to the cache unit size. For details, see:
To get a better understanding Cache Line
, we can also do the following interesting experiment on our own computer.
The following C
code, which receives a parameter from the command line, creates an array of numbers as the size of the array N
int
. The array content is accessed sequentially from this array in turn, looping 10
billions of times. The total size of the final output array and the corresponding total execution time.
#include "stdio.h"#include <stdlib.h>#include <sys/time.h>LongTimediff (clock_t t1, clock_t T2) {LongElapsed Elapsed = ((Double) t2-t1)/clocks_per_sec * +;returnelapsed;}intMainintargcChar*argv[])#*******{intArray_size=atoi (argv[1]);intRepeat_times =1000000000;Long Array[Array_size]; for(intI=0; i<array_size; i++) {Array[I] =0; }intj=0;intk=0;intC=0; clock_t Start=clock (); while(J++<repeat_times) {if(k==array_size) {k=0; } C =Array[k++]; } clock_t end =clock ();printf("%lu\n", Timediff (start,end));return 0;}
If we make a line chart of this data, we will find that the total execution time 64Bytes
has a more obvious inflection point when the array size exceeds (of course, because bloggers are Mac
tested on their notebooks, they are disturbed by many other programs and therefore fluctuate). The reason is that when the array is less than the 64Bytes
array is likely to fall Cache Line
within one, and the access of an element will make the entire bar Cache Line
is filled, so it is worthwhile to benefit from a number of subsequent elements of the cache acceleration. When the array is larger than 64Bytes
that, there must be at least two Cache Line
, and then two times on the loop, Cache Line
because the cache fills more time than the data access response time, so the impact of multiple cache fills on the total execution is magnified, resulting
How does the concept of the cache line help our program ape?
Let's take a look at the following C
examples of circular optimizations commonly used in this language
In the following two pieces of code, the first piece of code is C
always faster than the second piece of code in the language. The specific reason to believe that you read carefully Cache Line
after the introduction will be easy to understand.
for(int0; i < n; i++) { for(int0; j < n; j++) { int num; //code arr[i][j] = num; }}
for(int0; i < n; i++) { for(int0; j < n; j++) { int num; //code arr[j][i] = num; }}
About CPU Cache--The program apes need to know.