Linux TLB table

Source: Internet
Author: User

Tlb-translation lookaside Buffer

Fast table, literal translation is bypass fast table buffer, also can be understood as page table buffer, address transform cache.

Since the page table is stored in main memory, it takes at least two times for each time the program is visited: one time to fetch the physical address and the second time to retrieve the data. The key to improve the performance of the visit is to rely on the local access of the page table. When a converted virtual page number is used, it may be used again in the near future.

TLB is a cache that the memory management hardware uses to improve the conversion speed of virtual addresses to physical addresses. All current personal desktops, notebooks, and server processors use TLB to map virtual addresses to physical addresses. Using the TLB kernel, you can quickly find the virtual address pointing to the physical address without requiring RAM memory to obtain a mapping of the virtual address to the physical address. This is very similar to the data cache and instruction caches.

TLB principle

When the CPU accesses a virtual address/linear address, the CPU is looked up in the TLB first based on the high 20 bits of the virtual address (20 is x86 specific, different schemas have different values). If there is no corresponding table entry in the table, called TLB miss, the corresponding physical address needs to be computed by accessing the page table in slow RAM. At the same time, the physical address is stored in a TLB table entry, and later access to the same linear address, directly from the TLB table entry to obtain the physical address, called the TLB hit.

Imagine the absence of a TLB in the X86_32 schema, access to a linear address, first obtaining PTEs from the PGD (first memory access), obtaining the Page box address (second memory access) in the PTE, and finally accessing the physical address, which requires 3 RAM access. If a TLB is present and the TLB hits, then only one RAM access is required.

TLB Table entry

The basic unit of the TLB is the page table entry, which corresponds to the page table entry stored in RAM. The size of the page table entry is fixed, so the greater the TLB capacity, the More page table entries you can hold, and the more likely the TLB hit will be. However, the TLB capacity is limited after all, so the Ram page table and the TLB page table entries do not correspond to one by one. So the CPU receives a linear address, so it has to make two quick judgments:

1 required also indicates that no has been cached within the TLB (TLB Miss or TLB hit)

2 which entry in the TLB is required for the page table

In order to minimize the time required for the CPU to make these judgments, it must be done in the same way as the TLB page table entries and the memory page table entries.

Fully Connected-full associative

In this organization mode, there is no relationship between the table entry and the linear address in the TLB cache, that is, a TLB table entry can be associated with a page table entry of any linear address. This correlation makes the utilization of the TLB table entry space the most. But the latency can also be quite large, because each time the CPU requests, the TLB hardware compares the linear address to the TLB table entries one by one until the TLB hit or all the TLB table entries are completed. In particular, as the CPU caches more and more, a large number of TLB table entries need to be compared, so this organization is only suitable for small-capacity TLB

Direct match

Each linear address block can correspond to a unique TLB table entry through modulo operations, so that only one comparison is made, reducing the latency of comparisons within the TLB. However, this is a very high probability of conflict, resulting in the occurrence of TLB miss, which reduces the hit rate.

For example, we assume that the TLB cache contains a total of 16 table entries, and the CPU sequentially accesses the following linear address blocks: 1, 17, 1, 33. When the CPU accesses address Block 1 o'clock, 1 mod = 1,tlb to see whether its first page table entry contains the specified linear address Block 1, which contains the hit, otherwise loaded from RAM; then the CPU azimuth address Block 17,17 mod 16 = 1,tlb finds that its first page table entry corresponds to a non-linear address block 17,tlb Miss occurs, the TLB access RAM loads the page table entry for address block 17 into the TLB;CPU next accesses the address Block 1, and then the miss,tlb has to access the RAM to reload the corresponding page table entry for Address Block 1. Therefore, in some specific access modes, the performance of direct matching is poor to the pole

Group Connection-Set-associative

In order to solve the conflict between low efficiency and direct matching, the group was connected. In this way, all the TLB table entries are divided into groups, and each linear address block is no longer a TLB table entry, but a TLB table item group. When the CPU makes address translation, it first calculates which TLB table item group The linear address block corresponds to, and then matches the order in this TLB table item group. According to the length of the group, we can call it 2, 4, 8.

After long-term engineering practice, it is found that the 8-way group is a performance demarcation point. The 8-way group is connected with almost the same hit rate as the full-connected hit rate, more than 8, and the disadvantages of the intra-group comparison delay outweigh the benefits of increased hit ratio.

These three ways have advantages and disadvantages, the group is a compromise choice, suitable for most of the application environment. Of course, for different areas, you can also use other cache organization.

TLB Table Entry Update

TLB table entry updates can have TLB hardware auto-initiated, or can have software active updates

1. After the TLB miss occurs, the CPU Gets the page table entries from RAM and the TLB table entries are automatically updated

2. The table entries in the TLB are not valid in some cases, such as process switching, changing the kernel page table, and so on, when the CPU hardware does not know which TLB table entries are invalid and can only be refreshed by software in these scenarios.

In the Linux kernel software layer, a rich TLB table item Refresh method is provided, but different architectures provide different hardware interfaces. For example, X86_32 only provides two hardware interfaces to flush the TLB table entries:

1. When writing a value to the CR3 register, it causes the processor to automatically refresh the TLB table entries for non-global pages

2. After Pentium Pro, the INVLPG assembly instruction is invalid for a single TLB table entry that is used to specify a linear address.

Reference URL: http://www.xuebuyuan.com/597883.html

Linux TLB table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.