Cache per CPU page

Source: Internet
Author: User

In the "Linux page box-level memory management and processing details" blog, we talked about how the kernel calls functions such as alloc_pages to allocate one or more consecutive page boxes. In essence, these functions use the partner algorithm to get one or more consecutive idle page boxes from the specified zone_t.

As we will see in the "Slab distributor" post, the kernel often requests and releases a single page box. To improve system performance, if a single page box is requested or released, the kernel adds one more step before using the partner algorithm, that is, cache of each CPU page box.

Each memory management area defines a "per CPU" Page box high-speed cache. All "per CPU" high-speed caches contain some pre-allocated page boxes, they are used to meet a single page memory request sent by the local CPU.

Furthermore, the kernel provides two high-speed caches for each memory management zone and each CPU: one hot high-speed cache, And the content contained in the pages it stores may be in the CPU Hardware high-speed cache; there is also a cold high-speed cache.

If the kernel or user-state process writes to the page box immediately after it is allocated to the page box, obtaining the page box from the hot cache will be advantageous to the system performance. What does it mean? We know that the hardware cache in the CPU has a page box recently used. Each access to the page-frame storage unit will lead to the replacement of a page originally stored in the hardware cache. Of course, unless the hardware cache contains one line: It maps the accessed "hot" Page box unit, so we call it "hit ".

If the page box is to be filled by the DMA Operation, it is convenient to get the page box from the cold cache. In this case, the CPU is not involved, and the hardware cache row is not modified. The page box obtained from the cold cache saves the hot page box reserve for memory allocation of other types.

If we really cannot understand the definitions of hot cache and cold cache, we can simply understand that hot cache is related to CPU and needs to use the high-speed cache of the corresponding CPU, when we read and write a page, if we do not hit the hardware cache, we will replace a page. The cold cache has nothing to do with the CPU. When we read and write a page, we will not care whether it has hit the CPU Hardware cache.

1. Data Structure

The main data structure for high-speed cache in each CPU page is a per_cpu_pageset array data structure in the pageset field of the zone_t descriptor stored in the memory management area. This array contains one element provided for each CPU. This element is composed of two per_cpu_pages descriptors, one for hot cache and the other for cold cache.

Specific data structure:

 

The kernel uses two bits to monitor the size of hot or cold cache: if the number of page boxes is lower than the lower limit, the kernel allocates a single batch page box from the partner system to supplement the corresponding high-speed cache. Otherwise, if the number of page boxes is higher than the upper limit, the kernel releases the batch page from the cache to the partner system. The values of batch, low, and high depend on the number of page boxes contained in the memory management area.

2. cache allocation page box on each CPU page

The buffered_rmqueue () function has a page box in the specified memory management area. It uses high-speed cache for each CPU page box to process single page box requests.

The parameter is the address of the memory management area descriptor, the log order of the memory size requested for allocation, and the allocation flag gfp_flags. If the _ gfp_cold mark in gfp_flags is set, the page box should be obtained from the cold cache, otherwise, it should be obtained from the hot cache (this flag only makes sense for a single page box request ). This function essentially performs the following operations:

1. If order is not equal to 0, the cache on each CPU page cannot be used: the function jumps to step 1.
2. Check whether the local cache per CPU needs to be supplemented in the memory management area identified by the _ gfp_cold mark (the Count field of the per_cpu_pages descriptor is smaller than or equal to the low field ). In this case, it performs the following sub-steps:

A) allocate a single batch page box from the partner system by calling the _ rmqueue () function repeatedly.
B) insert the descriptor of the allocated page into the cache linked list.
C) Update count by adding the number of actually allocated page boxes.

3. If the Count value is positive, the function obtains a page box from the cache linked list. The Count value is reduced by 1 and the page jumps to step 2. (Note that the cache for each CPU page box may be blank. This happens when the _ rmqueue () function is called in Step 2a and the allocation page box fails .)
4. At this point, the memory request has not been met, either because the request spans several consecutive page boxes, or because the selected page box's high-speed cache is empty. Call the _ rmqueue () function to allocate the requested page box from the partner system.
5. If the memory request is satisfied, the function will initialize the page descriptor of the (first) page box: Clear some flags, set the private field to 0, and set the reference counter of the page box to 1. In addition, if the _ gpf_zero flag in gfp_flags is set, the function fills the allocated memory area with 0.
6. Return the page descriptor address of the (first) page. If the memory allocation request fails, null is returned.

 

3. Release pages to cache on each CPU page

To release a single page box to high-speed cache for each CPU page box, the kernel uses the free_hot_page () and free_cold_page () functions. They are all simple encapsulation of the free_hot_cold_page () function, and the received parameters are the descriptor address page and cold mark of the page box to be released (either hot cache or cold cache ).

The free_hot_cold_page () function performs the following operations:
1. Obtain the memory management zone descriptor address that contains the page box from the page-> flags field.
2. Obtain the per_cpu_pages descriptor address of the cache in the management area selected by the cold flag.
3. check whether the cache should be cleared. If the Count value is greater than or equal to high, call the free_pages_bulk () function to manage the area Descriptor and the number of page boxes to be released (batch field) the address of the cache linked list and the number 0 (0 to the order page box) are passed to this function. The free_pages_bulkl () function repeatedly calls the _ free_pages_bulk () function to release the specified number of pages (obtained from the cache linked list) to the partner System in the memory management area.
4. Add the released page to the cache linked list and add the Count field.

It should be noted that, in the current Linux 2.6 kernel version, from no page box is released to the cold high-speed cache: As for the hardware high-speed cache, the kernel always assumes that the page box to be released is hot. Of course, this does not mean that the cold cache is empty: when the lower bound is reached, buffered_rmqueue () is used to supplement the cold cache.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.