Cache-Based Memory Hierarchy
Cache-based memory hierarchies are effective because slow storage devices are cheaper, and because programs tend to display locality:
Time locality: the location of the memory that has been referenced once may be referenced again in the near future.
Spatial locality: If a memory location is referenced once, the program may reference a nearby storage location in the near future.
General cache memory structure
A general high-speed cache memory will have S = 2 ^ s sets (groups)
Each set contains e line (Cache line)
Each line contains one vaild bit, T-bit tag, and B = 2 ^ B bytes cache block (where data is actually stored ).
We usually talk about the 64-bit 32-bit cache line. In fact, the cache block in the cache line is 64-bit 32-bit.
Assume that our memory address is m-bit, and m = 2 ^ m in total. Let's take a look at the relationship between various variables.
Cache cache data size C = (sizeof Set * Number of set) = (size of block * lines a set) * Number of set = B * E * s
Memory size 2 ^ m cache line size 2 ^ Number of cache lines in B memory 2 ^ (m-B)
2 ^ (m-B) cache lines are allocated to 2 ^ s sets. Each set has 2 ^ (m-B-S) cache lines. This number is not E, it means that there will be 2 ^ (m-B-S) cache lines falling into this set, then we need to have m-B-s bits, mark the current cache line that falls into this set. That is, t = m-B-s.
As shown in, M-bit address memory requires S-bit index, set, T-bit for tag, cache line, and B-bit for offset to get the memory of the specific address.
High-speed cache ing
The high-speed cache determines whether a request is hit, and then extracts the requested word, divided into three steps 1) group selection 2) Row matching 3) word extraction.
Direct ing
Only one row of E = 1 is mapped to each group.
Select group
Select s bits from the address
Select a row
The T bits in the address matches the T bits tag in the cache line. If the match matches, the match is hit. If the match does not match, the cache misses.
Word Extraction
B bits in the address is the offset in the cache line and the word in the hit cache line.
When direct ing does not hit, no policy is required. You can simply replace the cache line in the index group.
Group connection ing
In group connection ing, a group contains multiple cache lines. Currently, there are common four-way group connection ing and 16-way group connection ing. That is, a set contains four or 16 cache lines. Compared with direct ing, the number of sets is less than that of direct ing. Therefore, s will be small, and the cache line corresponding to each set will be large, so t will be large.
Select group
The group selection for group connection ing is consistent with that for direct ing.
Select a row
When selecting a cache line, because one set contains multiple cache lines, you need to search for the tags of each cache line in the set and check whether the cache line is hit.
Word Extraction
Consistent with direct ing
The group connection ing corresponds to multiple rows for an index and compares the tags of each row with the desired address. This greatly increases the chance of hit, this avoids frequent cache failures in a small program.
When the Group connection ing does not hit, because the indexed group contains multiple cache lines, multiple algorithms choose which cache line to replace.
Fully Connected ing
Full-link ing means that only one group is mapped to a group.
Select group
The selection of a fully connected ING Group is very simple. There is only one group and no group index is required. S = 0. The address is only divided into a tag and an offset.
Select a row
When selecting a fully connected cache ing cache line, You need to search and compare all the cache lines in the Multi-cache.
Word Extraction
Same as before
Full-link ing requires a large number of search cache lines for comparison, which makes it difficult to construct a large and fast full-link high-speed cache and expensive. Therefore, fully-connected caches are only suitable for small high-speed caches, such as TLB.
High-speed cache features of core i7
The above content comes from "deep understanding of computer systems" 6.4 and below content comes from the Internet
Several cache Methods: Virtual index virtual tagged
Logical cache, virtual index virtual tagged is purely addressing with virtual addresses, and logical address index logical address tags, this method brings a lot of problems, the process ID must be added to each row of data on the basis of the original tag to distinguish the same address among multiple processes, which is also troublesome for processing shared memory, there are different virtual addresses in different processes in the sharing, and how to synchronize them is a problem.
Physical Index physical tagged
Physical cache, physical index physical tagged, physical address index, and physical address tag are the easiest way to understand. The cache operates on physical addresses in a simple and crude way, and there is no ambiguity. However, this method has obvious defects. In a multi-process operating system, each process has its own independent address space, and the commands and code both exist in the form of virtual addresses, the memory access commands issued by the CPU are issued in the form of virtual addresses. In this way, for every memory access operation, you must first wait for MMU to translate the virtual address into a physical address, this is a serial method with low efficiency.
Virtual index physical tagged
Currently, most of them use the virtual index physical tagged method. The virtual index means that when the CPU sends an address request, the low address matches the index in the cache, physical tagged indicates that the high-level address of the virtual address matches the page table in MMU to obtain the physical address (the index and the physical address are in parallel ), then, the physical address obtained from MMU is used as the tag (or part of the tag) to match the tag bit of the cache line, in this way, the uniqueness of the same address in the cache is ensured (with an exception, cache alias), MMU and cache can work in parallel, and the efficiency is improved.
The only problem caused by this method is cache alias, where a physical address is cached in two cache lines.
Cache alias mainly occurs when the cache size is inconsistent with the size of the Memory Page box. For example, the size of each cache is 8 K, and the size of commonly used page boxes is 4 K, the two numbers indicate the following problems: the page size is 4 K, which means that MMU uses 4 K as the granularity when dividing virtual space, each 4 K alignment memory area serves as the minimum unit of memory allocation. As a result, the low 12 bits of the virtual address and the physical address are the same, because the low 12-bit address is used as its offset in the 4 K page box, no matter what its virtual address is, its offset in the 4 K page is certain. The cache size is 8 K, which means that the cache index is calculated based on the 13-bit address. If the cache address is a virtual address, the 13-bit lower contains a virtual address, therefore, this method is called virtual index. The problem arises. When a physical address is used by different processes, because the MMU paging mechanism can only ensure the same 12-bit low, the 13th-bit is likely to be different, in this way, a physical address has two copies in the cache, which may cause synchronization problems. For the cache alias problem, the current solution is guaranteed by the operating system. For virtual addresses of the same physical address in different process spaces, the difference between their virtual addresses must be an integer multiple of the cache size, which means that their 13th bits must be the same. At the same time, some CPU vendors are already developing monitoring modules, trying to solve similar synchronization problems on the hardware layer.