About CPU Cache--The program apes need to know.

Source: Internet
Author: User

Article welcome reprint, but reproduced when please retain this paragraph text, and placed on the top of the article
Lu Junyi (Cenalulu)
This text address: http://cenalulu.github.io/linux/all-about-cpu-cache/

Let's take a look at a mind map of all the concepts in this article.

Why do I need a CPU Cache

As the process has increased in recent decades CPU , and has been constrained by manufacturing processes and cost constraints, the current computer's memory is mostly DRAM and does not have a qualitative breakthrough in access speed. As a result, CPU the processing speed and memory access speed gap is increasing, even tens of thousands of times. In this case, the CPU traditional FSB way of direct-attached memory is obviously due to the memory access waiting, resulting in a large number of idle computing resources, reducing CPU overall throughput. At the same time, due to the focus of memory data access hotspot, CPU between and memory with a relatively fast and costly to SDRAM do a layer of cache, it appears cost-effective very high.

Why should I have a multi-level CPU Cache

With the development of science and Technology, the volume of hot-spot data is getting bigger, the simple increase of the first-level cache size is very low price. As a result, there is a level L1 Cache two cache () that adds a layer of access speed and cost between the first level cache () and the memory L2 Cache . Here is an excerpt from the What Every Programmer Should Know About Memory explanation:

Soon after the introduction of the cache, the system got more complicated. The speed difference between the cache and the main memory increased again, to a point that another level of cache is add Ed, bigger and slower than the First-level cache. Only increasing the size of the First-level cache is not a option for economical rea-sons.

In addition, due to the difference in behavior and hotspot distribution of program instruction and program data, it is L1 Cache also divided into L1i ( i for instruction ) and L1d ( d for data ) Two special-purpose caches.

The following diagram shows the response time gap between cache levels and how slow the memory is!

What is cache line

Cache LineCan be simply understood as the CPU Cache smallest cache unit in. The size of the current mainstream CPU Cache Cache Line is 64Bytes . Assuming that we have a 512 byte-level cache, the number of caches that this first- 64B level cache can hold is 512/64 = 8, according to the cache unit size. For details, see:

To get a better understanding Cache Line , we can also do the following interesting experiment on our own computer.

The following C code, which receives a parameter from the command line, creates an array of numbers as the size of the array N int . The array content is accessed sequentially from this array in turn, looping 10 billions of times. The total size of the final output array and the corresponding total execution time.

#include "stdio.h"#include <stdlib.h>#include <sys/time.h>LongTimediff (clock_t t1, clock_t T2) {LongElapsed Elapsed = ((Double) t2-t1)/clocks_per_sec * +;returnelapsed;}intMainintargcChar*argv[])#*******{intArray_size=atoi (argv[1]);intRepeat_times =1000000000;Long Array[Array_size]; for(intI=0; i<array_size; i++) {Array[I] =0; }intj=0;intk=0;intC=0; clock_t Start=clock (); while(J++<repeat_times) {if(k==array_size) {k=0; } C =Array[k++]; } clock_t end =clock ();printf("%lu\n", Timediff (start,end));return 0;}

If we make a line chart of this data, we will find that the total execution time 64Bytes has a more obvious inflection point when the array size exceeds (of course, because bloggers are Mac tested on their notebooks, they are disturbed by many other programs and therefore fluctuate). The reason is that when the array is less than the 64Bytes array is likely to fall Cache Line within one, and the access of an element will make the entire bar Cache Line is filled, so it is worthwhile to benefit from a number of subsequent elements of the cache acceleration. When the array is larger than 64Bytes that, there must be at least two Cache Line , and then two times on the loop, Cache Line because the cache fills more time than the data access response time, so the impact of multiple cache fills on the total execution is magnified, resulting

How does the concept of the cache line help our program ape?

Let's take a look at the following C examples of circular optimizations commonly used in this language
In the following two pieces of code, the first piece of code is C always faster than the second piece of code in the language. The specific reason to believe that you read carefully Cache Line after the introduction will be easy to understand.

for(int0; i < n; i++) {    for(int0; j < n; j++) {        int num;            //code        arr[i][j] = num;    }}
for(int0; i < n; i++) {    for(int0; j < n; j++) {        int num;            //code        arr[j][i] = num;    }}

About CPU Cache--The program apes need to know.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.