The relationship between TLB and cache

Source: Internet
Author: User
Tags prefetch switches

i) TLB

1 Overview of TLB

A TLB is a memory snap-in used to improve the cache of virtual addresses to the speed of physical address translation.

The TLB is the cache of page tablesin memory, and if there is no TLB, each fetch of data requires two accesses to the memory, i.e., the page table to obtain the physical address and fetch the data.

2 Principle of TLB

When the CPU reads the data request, the CPU looks up from the virtual address (Top 20 bits) to the TLB.
The TLB holds the mapping relationship between the virtual address (top 20 digits) and the page frame number. If you match to a virtual address, you can quickly find the page box number (the page box number can be understood as a page table entry) and the final physical address by combining the page box number with the 12-bit offset of the virtual address.

If you do not match the virtual address in the TLB, there is a TLB loss, you need to query the page table entry in the page table, and if it is not in the page table, the content to read is not in memory and needs to be read to disk.

A TLB is a cache in the MMU.

In the paging mechanism, the data associated with the page table in the TLB is not maintained by the processor but maintained by the OS, and the refresh of the TLB is done by loading the CR3 registers in the processor.

If MMU finds that there is no hit in the TLB, it replaces an entry in the TLB with the Found page table entry after the regular page table lookup.

3 The refresh principle of TLB

Reset the CR3 register when the process is context-switched, and refresh the TLB.

There are two situations where you can avoid the brush tlb.
The first scenario is a process switch that uses the same page table.
The second scenario is that the normal process switches to the kernel thread.

The technique of lazy-tlb (lazy mode) is to prevent a process switch from causing the TLB to be refreshed.
When a normal process switches to a kernel thread, the system enters Lazy-tlb mode and exits the mode when it is cut to a normal process.

II) cache
1 The concept of cache:
The cache is designed to address the huge speed differences between processors and slow dram (or memory) devices .
Cache belongs to the hardware system, Linux can not manage cache. But it will provide flush the entire cache interface.
Cache is divided into one level cache, level two cache, level three cache and so on. Level one cache is in the same instruction cycle as the CPU.


For example, view the cache for the current system.

DMIDECODE-T Cache

# Dmidecode 2.9
SMBIOS 2.6 present.

Handle 0x0700, DMI type 7, bytes
Cache Information
Socket Designation:not Specified
Configuration:enabled, not socketed, Level 1
Operational Mode:write Back
Location:internal
Installed size:128 KB
Maximum size:128 KB
Supported SRAM Types:
Unknown
Installed SRAM Type:unknown
Speed:unknown
Error Correction Type:single-bit ECC
System Type:data
Associativity:8-way set-associative

Handle 0x0701, DMI type 7, bytes
Cache Information
Socket Designation:not Specified
Configuration:enabled, not socketed, Level 2
Operational Mode:write Back
Location:internal
Installed size:1024 KB
Maximum size:2048 KB
Supported SRAM Types:
Unknown
Installed SRAM Type:unknown
Speed:unknown
Error Correction Type:single-bit ECC
System type:unified
Associativity:8-way set-associative

Handle 0x0702, DMI type 7, bytes
Cache Information
Socket Designation:not Specified
Configuration:enabled, not socketed, Level 3
Operational Mode:write Back
Location:internal
Installed size:4096 KB
Maximum size:4096 KB
Supported SRAM Types:
Unknown
Installed SRAM Type:unknown
Speed:unknown
Error Correction Type:single-bit ECC
System type:unified
Associativity:16-way set-associative

respectively:
Level 1 cache:128kb
Level 2 CACHE:1024KB
Level 3 cache:4096kb

2 Cache Access Unit (cache line)

CPUs never read/write sections or words directly from DRAM, the first step of each read or write from the CPU to DRAM passes through the L1 cache, each time reading or writing to DRAM in an integer line.
Cache line is the smallest unit of cache and DRAM synchronization.
Typical virtual memory page sizes are 4KB, while typical cache line sizes are typically 32 or 64 bytes.
CPU read/write memory through the cache, if the data is not in the cache, you need to cache line for the unit to fill the cache, even read/write a byte.
CPU does not have direct read/write memory, each read/write memory must go through the cache.

3 The working mode of the cache

data Writeback (write-back): This is the most high-performance mode, but also the most typical, in the write-back mode, the cache content changes do not need to write back the memory every time, until a new cache to refresh or software requirements refresh, write back memory.
Write Pass (Write-through): This mode is less efficient than write-back mode because it forces the content to be written back to memory to save the results of the cache, which is time-consuming to write, and reads and writes back as fast, This is all for memory and cache in line with the price paid.
prefetch (prefectching): Some cache allows the processor to prefetch cache line to respond to read requests, so that the adjacent content read is also read out, if read is random, will slow the CPU, Prefetching is generally matched with software to achieve maximum performance.


Note:
Most cache allows software to be set in a locale, one area may be writeback, and the other may be prefetching. Users generally cannot change the cache mode, which is usually controlled by device drivers.
Prefetching is usually controlled by the software through the so-called cache implicit function madvise.

For example: View the current system cache in which mode to work

DMIDECODE-T Cache

# Dmidecode 2.9
SMBIOS 2.6 present.

Handle 0x0700, DMI type 7, bytes
Cache Information
Socket Designation:not Specified
Configuration:enabled, not socketed, Level 1
Operational Mode:write Back
Location:internal
Installed size:128 KB
Maximum size:128 KB
Supported SRAM Types:
Unknown
Installed SRAM Type:unknown
Speed:unknown
Error Correction Type:single-bit ECC
System Type:data
Associativity:8-way set-associative

Handle 0x0701, DMI type 7, bytes
Cache Information
Socket Designation:not Specified
Configuration:enabled, not socketed, Level 2
Operational Mode:write Back
Location:internal
Installed size:1024 KB
Maximum size:2048 KB
Supported SRAM Types:
Unknown
Installed SRAM Type:unknown
Speed:unknown
Error Correction Type:single-bit ECC
System type:unified
Associativity:8-way set-associative

Handle 0x0702, DMI type 7, bytes
Cache Information
Socket Designation:not Specified
Configuration:enabled, not socketed, Level 3
Operational Mode:write Back
Location:internal
Installed size:4096 KB
Maximum size:4096 KB
Supported SRAM Types:
Unknown
Installed SRAM Type:unknown
Speed:unknown
Error Correction Type:single-bit ECC
System type:unified
Associativity:16-way set-associative

The results indicate that they are all written back, as follows:
Operational Mode:write Back

III) Memory Consistency

Write back will involve memory consistency, with a range of issues involved:

1 multiprocessing to the system update cache, a processor modified cache content, the second processor will not be able to access the cache, until the cache content is written in memory.
In modern processors hardware has been carefully designed to ensure that this does not happen, and the hardware is responsible for keeping the cache consistent across CPUs.

2 Peripheral hardware devices can access memory via DMA (Direct Memory access) without the processor knowing and not using the cache, so that there will be a different step between memory and cache.
Managing DMA Operations is the work of the operating system, such as device drivers, which will ensure consistency between memory and cache.

3 when the data in the cache is older than the data in memory, it is called stale. If the software initializes the DMA to pass data between the device and RAM, then the software must tell the Cpu,cache that the entry must be invalidated.

4 when the data in the cache is newer than the data in memory, it is called dirty. When a device driver allows a device to read data from memory by DMA, it must ensure that all dirty entries are written into memory. Also known as flushing or sync cache.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.