Improving program performance and what is cache -- Starting from memory structure and local Principle Analysis -- Basic Quality of good code

Source: Internet
Author: User

Opening

Previous blog

Local Principle Analysis-basic quality of good code

ModerateProgramA Brief Introduction to locality is provided. Basically, I have learned how to write a localCode. But why can code with good locality be efficient? This question will be answered in this blog. As for the organization and implementation of the memoryArticle.

Memory Hierarchy

As we know, the memory in the computer includes hard disk, primary storage, and high-speed cache (including level-1 cache and level-2 cache). The memory is a register.

Shows how memory is organized in a computer:

I believe everyone is familiar with it. This is also shown in the introduction of the Memory Hierarchy by wiki.

We found that the higher the memory, the smaller the storage capacity, the higher the cost, and the faster the speed.

Why does this structure appear? The level of early-rising memory has only three layers: CPU registers, DRAM primary memory, and disk storage.

Due to the huge speed difference between the CPU and the primary memory, the system designer is forced to insert a small SRAM high-speed cache memory between the CPU register and the primary memory called the L1 cache, it can be accessed within 2-4 clock periods. Later, it was found that there was a large gap between the L1 high-speed cache and the primary storage, and a faster L2 cache was inserted between the L1 high-speed cache and the primary storage, it can be accessed within 10 clock cycles. Therefore, in this mode, the current storage system has been formed in the constant evolution.

Now we can know that the entire memory system is divided into multiple layers. How do they coordinate their work to improve operational efficiency?

What is cache?

For the moment, you can understand that the fast-speed memory caches the data of the slow-speed memory. Accurate description: For each K, a faster and smaller storage device located on the K layer serves as the cache for the larger and slower storage device on the K + 1 layer. That is to say, the K layer stores frequently accessed data in the k + 1 layer. Data is transmitted in blocks between caches. Of course, the block size varies with different levels of cache. Generally, the higher the block, the smaller the block.

See the example

K is the cache of K + 1, and the data transmission between them is in the unit of block size. For example, K caches data with blocks 4, 9, 14, and 3 in k + 1.

When the program needs data in these blocks, it can directly obtain data from the cache K. This is faster than reading data from the k + 1 layer.

Cache hit

When the program needs a data in layer k + 1, D will first find it in its cache K layer. If the data is in the K layer, it is called cache hit ).

Cache miss

When the required data object D is no longer cached K, it is called a cache miss. When a cache miss occurs, the K-layer cache will retrieve the block containing the data object D from the k + 1 layer. If the K-layer cache is full, it will overwrite one of the blocks. As for which block to be overwritten, this is determined by the cache replacement policy. For example, you can cover the block with the minimum usage frequency or the block that first enters the cache .. We will not discuss it here. After the Data Object D is retrieved from the k + 1 layer in the K layer, the program can read the data object d in the cache.

Cache hit and locality

Here we will briefly explain why a program with good locality can have better performance.

Use temporal locality: Due to time locality, the same data object will be used multiple times. Once a data object enters the cache of the k + 1 layer, it is expected to be referenced multiple times. This can save a lot of time expenses caused by access.

Use spatial locality: Assume that the cache K can store N data blocks. When accessing an array, because the array is continuously stored, when accessing the first element, the n elements after the first element (the cache K has n data blocks) are copied to the cache K, in this way, the access from the second element to the nth element can be obtained directly from the cache, thus improving the performance.

Similarly, when the nth element is accessed, n is not in the cache, And the cache manager copies the elements from N to 2N to the cache, access to them can be directly carried out in the cache.

Through spatial locality, we hope that the access to other objects in the cache can compensate for the time spent on copying these blocks after a miss.

The cache is everywhere in the modern system. In order to make everyone better understand, the following table summarizes the memory and performance parameters at different levels in the computer (for reference only)

Summary

This article mainly introduces the organizational structure of computer memory and the relationship between them. It also briefly introduces the cache implementation mechanism and the relationship between cache and locality.

How to Implement the cache and how the program retrieves data from the cache memory will be provided in the next article

 

I have limited recognition. If you have any shortcomings in the above articles, please contact me.

 

See: Computer Systems

If there is reprint please indicate the source: http://www.cnblogs.com/yanlingyin/

A fish, yanlingyin @ blog Garden

E-mail: yanlingyin@yeah.net

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.