CPU cache technology learning notes

Source: Internet
Author: User

1. cache mechanism Overview
1.1 What is direct mapped/fully associative Cache/n-way set associative?

The cache is subdivided into subsets of lines.
Cache line refers to the minimum unit of data transmission between slow off-chip DRAM and fast on-chip CPU cache,
Transmission in burst mode is generally used.

1), at one extreme, the cache can be direct mapped, in which case a line
In main memory is always stored at the exact same location in the cache.

2), at the other extreme, the cache is fully associative, meaning that
Any line in memory can be stored at any location in the cache.

3), most caches are to some degree n-way set associative, where any line
Of main memory can be stored in any one of N lines of the cache.
Instance, a line of memory can be stored in two different lines of
Two-way set associative cache.
(The entire cache contains multiple sets, and each set contains N cache lines, the so-called n-way)

Direct mapped cache is prone to cached line replacement, because a memory line only
It can be stored in a location in the cache; fully associative is theoretically optimal, but it is difficult to implement,
You need to implement a comparator for each cache line.

In actual cache implementation, N-way set associative is used, and only N
Cache line implements parallel comparator.

1.2 How to map memory address to cache?
1.2.1 Basic ing mechanism
Memory Address is divided into the following parts: Tag + index + offset_in_line.
Index is used to find the set in the corresponding cache. It is generally calculated using a modulo:
Set_no = index Mod (number of sets in the cache)

Find the set corresponding to the memory address, and then use the tag field
Tag comparison (this comparison uses hardware circuit parallel implementation); If the matching item is found, it indicates cache hit, find
The cache line corresponding to the memory address. Otherwise, the cache miss occurs.

If cache hit is used, the corresponding data in cache line can be found based on offset_in_line field;
Otherwise, the corresponding data of memory address needs to be read from dram to the cache.

1.2.2 several issues to be considered during implementation
1) physical address vs. Virtual Address
Does memory address use physical address or virtual address to access the cache?

Virtual Address --- not unique
Multiple processes have the same address space.
. We'll need to include a field identifying the address space in
The cache tag to make sure we don't mix them up.
. The same physical location may be described by different addresses in different
Tasks. In turn, that might lead to the same memory location
Cached in two different cache entries (Cache aliases)
Physical address ---
. A cache that works purely on physical addresses is easier to manage
(We'll explain why below), but raw Program (virtual) addresses are available
To start the cache lookup earlier, leader the system run that little
Bit faster.
(Physical address can be obtained only after virtual address is converted through MMU, which will be slower)

2), choice of line size:
When a cache miss occurs, the whole line must be filled from memory.
The larger the line size, the larger the latency for Data Reading and writing.

3), Split/uniied:
I Cache/d cache problems.
The selection is done purely by function, in that instruction
Fetches look in the I-Cache and data loads/stores in the D-Cache. (This
Means, by the way, that if you try to execute code which the CPU just
Copied into memory you must both flush those instructions out of
D-Cache and ensure they get loaded into the I-Cache .)

1.3 multi-level cache technology
Many CPUs already use L1/L2/... Cache.
The main purpose of multi-level cache technology is to reduce the penalty caused by cache miss.

2. Considerations for cache issues in programming

2.1 DMA Operation
2.1.1 before DMA out of memory
If a device is taking data out of memory, it's
Vital that it gets the right data. If the data cache is write back and
Program has recently written some data, some of the correct data may
Still be held in the D-Cache but not yet be written back to main memory.
The CPU can't see this problem, of course; if it looks at the memory
Locations it will get the correct data back from its cache.
So before the DMA device starts reading data from memory, any data
For that range of locations that is currently held in the D-Cache must be
Written back to memory if necessary.

2.1.2 DMA into memory
If a device is loading data into memory, it's important
To invalidate any cache entries purporting to hold copies of the memory
Locations concerned; otherwise, the CPU reading these localions will obtain
Stale cached data. The cache entries shocould be invalidated before
The CPU uses any data from the DMA input stream.

2.2 writing instructions
When the CPU itself is storing instructions
Memory for subsequent execution, you must first ensure
That the instructions are written back to memory and
Then make sure that the corresponding I-cache locations
Are invalidated: The MIPs CPU has no connection
The D-Cache and the I-cache.

2.3 Linux Slab allocator
Linux slab cache contains multiple slab, including multiple slab for allocation and release
Objects of the same type (generally these objects are defined using the same data type ).
Linux assigns one or more consecutive physical page frames to each slab.
Objects in the same offset contained in multiple slab of the same slab cache correspond
The probability of the same cache line is very high, at least the probability of corresponding to the same set in the CPU Cache
Very large.
Linux Slab allocator uses the so-called color offset technology to avoid this problem.
Specify a different color offset for each slab that belongs to the same slab cache,
This color offset determines the storage location of the first object in slab. Through this method,
This greatly reduces the preceding problems.

2.4 Other
Many data structure definitions in Linux have similar Annotations:
* Keep related fields in common cachelines. The most commonly accessed
* Field (B _state) goes at the start so the compiler does not generate
* Indexed addressing for it.
Struct buffer_head {
/* First cache line :*/
Unsigned long B _state;/* buffer State Bitmap (see abve )*/
Struct buffer_head * B _this_page;/* Circular List of page's buffers */
Struct page * B _page;/* The page this BH is mapped */
Atomic_t B _count;/* users using this block */
U32 B _size;/* block size */

Sector_t B _blocknr;/* block number */
Char * B _data;/* pointer to data block */

Struct block_device * B _bdev;
Bh_end_io_t * B _end_io;/* I/O completion */
Void * B _private;/* Reserved for B _end_io */
Struct list_head B _assoc_buffers;/* associated with another mapping */
It facilitates efficient access to related data domains.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.