Linux Kernel and TLB

Source: Internet
Author: User

TLB-translation lookaside Buffer

A quick table is a bypass quick table buffer. It can also be understood as a page table buffer and address translation high-speed cache.

Because the page tables are stored in the primary storage, each access request by the program requires at least two times: one access request for physical addresses, and the second access for data. The key to improving the memory access performance is to rely on the access locality of the page table. When a converted virtual page number is used, it may be used again in the near future ,.

TLB is a high-speed cache. Memory Management hardware uses it to improve the conversion speed from virtual addresses to physical addresses. Currently, all personal desktops, laptops, and server processors use TLB to map virtual addresses to physical addresses. Using the TLB kernel, you can quickly find the ing between virtual addresses and physical addresses without requesting RAM memory to obtain the ing between virtual addresses and physical addresses. This is very similar to data cache and instruction caches.

TLB Principle

When the CPU needs to access a virtual address/linear address, the CPU will first increase by 20 bits based on the virtual address (20 is specific to x86, and different architectures have different values) search in TLB. If no table item exists in the table, it is called TLB miss. You need to access the page table in slow Ram to calculate the corresponding physical address. At the same time, the physical address is stored in a TLB table item. After accessing the same linear address, you can directly obtain the physical address from the TLB table item, which is called TLB hit.

Assume that no TLB exists in the x86_32 architecture. For linear address access, obtain the PTE from PGD (first memory access ), obtain the page box address (second memory access) in the PTE, and finally access the physical address. A total of three Ram accesses are required. If TLB exists and TLB hit exists, you only need one ram access.


TLB table items

The basic unit of internal storage of TLB is page table entries, which correspond to the page table entries stored in Ram. The size of page table entries remains unchanged. Therefore, the larger the TLB capacity, the more page table entries that can be stored, and the higher the chance of TLB hit. However, The TLB capacity is limited, so the ram page tables and TLB page table entries cannot match one by one. Therefore, when the CPU receives a linear address, two judgments must be made quickly:

1 required also indicates whether it has been cached in TLB (TLB miss or TLB hit)

2. Which of the following TLB entries does the required page table belong?

In order to minimize the time required by the CPU to make these judgments, it is necessary to do a lot of work between TLB page table entries and memory page table entries.

Fully Connected-full associative

In this way, table items in TLB cache have no relationship with linear addresses. That is to say, a TLB table item can be associated with any linear address page table item. This association method maximizes the project space utilization of The TLB table. However, the latency may also be quite large, because each CPU request, The TLB hardware compares the linear address with the TLB table items one by one until the TLB hit or all TLB table items are compared completely. In particular, as the CPU cache increases, a large number of TLB table items are required. Therefore, this method is only suitable for small-capacity TLB instances.

Direct match

Each linear address block can correspond to a unique TLB table item through the modulo operation, so that only one comparison is required, reducing the latency of the comparison within TLB. However, this method has a very high probability of conflict, resulting in the occurrence of TLB miss, reducing the hit rate.

For example, assume that the TLB cache contains 16 table items, and the CPU accesses the following linear address blocks sequentially: 1, 17, 1, 33. When the CPU accesses address Block 1, 1 mod 16 = 1, TLB checks whether its first page table item contains the specified linear address block 1. If it contains, it hits; otherwise, it is loaded from Ram; then, when the CPU address range is 17, 17 mod 16 = 1, TLB finds that its first page table item does not correspond to a linear address block 17, TLB Miss occurs, when TLB accesses Ram, it loads the page table items of address block 17 into TLB. When the CPU then accesses address Block 1, Miss occurs again. TLB has to Access RAM and reload the page table items corresponding to address block 1. Therefore, in some specific access modes, the performance of direct matching is very poor.

Group connection-set-associative

In order to solve the conflict between low internal efficiency and direct matching of full connections, group connections are introduced. In this way, all TLB table items are divided into multiple groups. Each linear address block is no longer a TLB table item, but a TLB table item group. When the CPU performs address conversion, it first calculates the TLB table item group corresponding to the linear address block, and then compares the TLB table item Groups in sequence. According to the group length, we can call it 2, 4, and 8.

After long-term engineering practices, it is found that eight-Channel group connections are a performance demarcation point. The hit rate of the Eight-way group is almost the same as that of the all-connected group. If the hit rate exceeds the eight-way group, the disadvantage of the intra-group comparison delay exceeds the advantage of the increase in the hit rate.

These three methods have their own advantages and disadvantages. group-based connection is a compromise and is suitable for most application environments. Of course, other cache organizations can be used for different fields.

TLB table item update

TLB table item update can be automatically initiated by TLB hardware or automatically updated by software

1. When TLB Miss occurs, the CPU obtains page table items from Ram and automatically updates TLB table items.

2. in some cases, table items in TLB are invalid, such as process switching and kernel page tables. At this time, the CPU hardware does not know which TLB table items are invalid, the software can only refresh TLB in these scenarios.

At the Linux kernel software layer, a variety of TLB table item refresh methods are provided, but different architectures provide different hardware interfaces. For example, x86_32 only provides two hardware interfaces to refresh TLB table items:

1. When a value is written to the Cr 3 register, the processor will automatically refresh the TLB table items on Non-Global pages.

2. After Pentium Pro, The invlpg Assembly command is used to invalidate a single TLB table entry with a specified linear address.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.