Relationship between TLB and Cache

Source: Internet
Author: User
Tags prefetch

1) TLB

1) TLB Overview

TLB is a memory management unit used to improve the speed of converting virtual addresses to physical addresses.

TLB is in memoryCache of the page tableIf no TLB is available, you need to access the memory twice for each data fetch, that is, querying the page table to obtain the physical address and data.

2) TLB Principle

When the CPU sends a read request to the data, the CPU sends it to the TLB based on the virtual address (the first 20 digits.
TLB stores the ing relationship between the virtual address (the first 20 digits) and the page number, if a virtual address is matched, the page number can be quickly found (the page number can be understood as a page table item ), the final physical address is obtained through the offset combination of the page box number and the last 12 digits of the virtual address.

If a virtual address is not matched in TLB, TLB is lost. You need to query the page table items in the page table. If not, the content to be read is not in the memory, read from the disk.

TLB is a high-speed cache in MMU and also a cache.

In the paging mechanism, the data in TLB is associated with the data in the page table, not maintained by the processor, but maintained by the OS. The refresh of TLB is completed by loading the audit register in the processor.

If MMU finds no hits in TLB, it replaces an entry in TLB with a page table item after regular page table search.

3) TLB refresh principles

When the process performs a context switch, it re-sets the CRC register and refresh the TLB.

In either case, you can avoid refreshing TLB.
The first case is process switching using the same page table.
The second case is that the common process switches to the kernel thread.

Lazy-TLB (lazy mode) is used to prevent the TLB from being refreshed due to process switching.
When a common process switches to the kernel thread, the system enters the lazy-TLB mode and switches to the normal process to exit the mode.

 

 

Ii) Cache
1) concept of cache:
The cache isSolving the huge speed difference between the processor and slow DRAM (slow DRAM is memory) DevicesAnd appears.
The cache is a hardware system. Linux cannot manage the cache. However, it provides an interface to flush the entire cache.
Cache can be classified into Level 1 cache, level 2 cache, and level 3 cache. Level 1 cache is in the same instruction cycle as the CPU.


For example, view the cache of the current system.

Dmidecode-T Cache

# Dmidecode 2.9
Smbios 2.6 present.

Handle 0x0700, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not specified
Configuration: enabled, not socketed, level 1
Operational Mode: Write back
Location: Internal
Installeds size: 128 KB
Maximum Size: 128 KB
Supported SRAM types:
Unknown
Installed SRAM type: Unknown
Speed: Unknown
Error Correction type: single-bit ECC
System type: Data
Associativity: 8-way set-associative

Handle 0x0701, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not specified
Configuration: enabled, not socketed, level 2
Operational Mode: Write back
Location: Internal
Installeds size: 1024 KB
Maximum Size: 2048 KB
Supported SRAM types:
Unknown
Installed SRAM type: Unknown
Speed: Unknown
Error Correction type: single-bit ECC
System type: uniied
Associativity: 8-way set-associative

Handle 0x0702, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not specified
Configuration: enabled, not socketed, level 3
Operational Mode: Write back
Location: Internal
Installeds size: 4096 KB
Maximum Size: 4096 KB
Supported SRAM types:
Unknown
Installed SRAM type: Unknown
Speed: Unknown
Error Correction type: single-bit ECC
System type: uniied
Associativity: 16-way set-associative

They are:
Level 1 cache: 128kb
Level 2 Cache: 1024kb
Level 3 cache: 4096kb

2) cache access unit (Cache line)

The CPU never directly reads/writes bytes or words from dram. The first step of each read or write from the CPU to DRAM goes through L1 cache and reads or writes to dram in integer lines each time.
The cache line is the minimum unit of synchronization between the cache and dram.
The typical Virtual Memory Page size is 4 kb, while the typical cache line size is usually 32 or 64 bytes.
The CPU read/write memory must pass the cache. If the data is not in the cache, you need to fill the data in the cache line, even if it is to read/write a byte.
The CPU does not have direct read/write memory. The cache is required for each read/write memory.

3) cache Working Mode

Write-back ):This is the highest-performance mode and the most typical mode. In the write-back mode, cache content changes do not need to be written back to the memory every time until a new cache needs to be refreshed or the software needs to be refreshed, write back to memory.
Write-through ):This mode is less efficient than the write-back mode because it forces the content to be written back to the memory each time to save the cache results. In this mode, the write time is long, read and Write-back models are as fast as they are, all at the same cost as the cache memory.
Prefectching ):Some caches allow the processor to prefetch the cache line to respond to read requests, so that the read adjacent content is also read. If the read is random, the CPU will slow down, prefetch works with software to achieve the highest performance.


Note:
Most of the cache allows the software to set the mode in a certain region. One region may be write-back, and the other may be prefetch. Users generally cannot change the cache mode, which is usually controlled by the device driver.
Prefetch is usually controlled by the software through the so-called cache implicit function madvise.

For example, you can view the mode in which the current system cache works.

Dmidecode-T Cache

# Dmidecode 2.9
Smbios 2.6 present.

Handle 0x0700, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not specified
Configuration: enabled, not socketed, level 1
Operational Mode: Write back
Location: Internal
Installeds size: 128 KB
Maximum Size: 128 KB
Supported SRAM types:
Unknown
Installed SRAM type: Unknown
Speed: Unknown
Error Correction type: single-bit ECC
System type: Data
Associativity: 8-way set-associative

Handle 0x0701, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not specified
Configuration: enabled, not socketed, level 2
Operational Mode: Write back
Location: Internal
Installeds size: 1024 KB
Maximum Size: 2048 KB
Supported SRAM types:
Unknown
Installed SRAM type: Unknown
Speed: Unknown
Error Correction type: single-bit ECC
System type: uniied
Associativity: 8-way set-associative

Handle 0x0702, DMI type 7, 19 bytes
Cache Information
Socket Designation: Not specified
Configuration: enabled, not socketed, level 3
Operational Mode: Write back
Location: Internal
Installeds size: 4096 KB
Maximum Size: 4096 KB
Supported SRAM types:
Unknown
Installed SRAM type: Unknown
Speed: Unknown
Error Correction type: single-bit ECC
System type: uniied
Associativity: 16-way set-associative

The result indicates that all data is written back as follows:
Operational Mode: Write back

 

3) memory consistency

Write back involves memory consistency and involves a series of problems:

1) when the system updates the Cache during multi-processing, if a processor modifies the cache content, the second processor will not be able to access the cache until the cache content is written to the memory.
In the modern processor, hardware has been carefully designed to ensure this will not happen, and the hardware is responsible for keeping the cache consistent among various CPUs.

2) The peripheral hardware device can access the memory through direct memory access without knowing or using the cache, so that memory and cache will appear between memory and cache.Not synchronized.
Managing DMA operations is the work of the operating system. For example, a device driver ensures the consistency between memory and cache.

3) when the data in the cache is older than the data in the memory, it is called stale. if the software initializes DMA so that data is transmitted between the device and ram, the software must tell the CPU that the entries in the cache must be invalid.

4) when the data in the cache is better than the data in the memory, it is called dirty. when the device driver allows a device to read data from the memory through DMA, it must ensure that all dirty entries are written into the memory. it is also called flushing or sync cache.

Relationship between TLB and Cache

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.