Linux Memory Fragmentation Prevention technology

Last Update:2018-02-13 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Linux kernel organization manages physical memory in the form of buddy system (partner system), while physical memory fragmentation formally buddy one of the weaknesses of the system, and in order to prevent and solve the fragmentation problem, kernel takes some practical techniques, which are summarized here.

1 integrated fragmentation at low memory

To request memory pages from Buddy, if the appropriate page is not found, two steps will be made to adjust the memory work, compact and reclaim. The former is to consolidate fragments to get larger contiguous memory, which is a buffer memory that is recycled that does not necessarily have to consume memory. Here we focus on comact, the whole process is as follows:

__alloc_pages_nodemask

__alloc_pages_slowpath

__alloc_pages_direct_compact

Try_to_compact_pages

Compact_zone_order

Compact_zone

Isolate_migratepages

Migrate_pages

Release_freepages

Not all applications that do not have memory will be compact, first to satisfy order greater than 0, and Gfp_mask carry __gfp_fs and __gfp_io, in addition, the remaining memory conditions need zone to meet certain conditions, kernel called "Fragmentation index" (fragmentation index), this value between 0~1000, the default fragmentation index is greater than 500 for the compact, you can adjust the default value by proc file extfrag_threshold. Fragmentation index is calculated by the Fragmentation_index function:

/*
* Index is between 0 and 1000
*
* 0 = allocation would fail due to lack of memory
* $ = allocation would fail due to fragmentation
*/
Return 1000-div_u64 ((1000+ (Div_u64 (info->free_pages * 1000ULL, requested)), Info->free_blocks_total)

In the process of consolidating memory fragmentation, the fragment page will only move inside the zone and move the page at the zone's low address to the end of the zone as far as possible. The request for a new page location is implemented through the COMPACTION_ALLOC function.

The move process is also divided into synchronous and asynchronous, the first compact will be used asynchronously after the memory request fails, and synchronization will be used after subsequent reclaim. The synchronization process moves only pages that are not used in the face, and the asynchronous process traverses and waits for all movable pages to be moved after the page is used.

2 organize pages by mobility

The memory pages are divided into the following three types according to mobility:

unmovable: Fixed in-memory position, cannot move freely. Kernel allocated memory basically belongs to this type;

Reclaimable: You cannot move, but you can delete a collection. For example, file mapping memory;

Movable: Can move freely, the memory of user space basically belongs to this type.

When requesting memory, based on mobility, first request memory in the free pages of the specified type, and the free memory in each zone is organized as follows:

struct Zone {
......
struct Free_area Free_area[max_order];
......
}
struct Free_area {
struct List_head free_list[migrate_types];
unsigned long nr_free;
};

When the specified type of Free_area request for memory is not available, can be appropriated from the standby type, the misappropriation of memory will be released into the newly specified type list, kernel This process is called "misappropriation".

The alternate type priority list is defined as follows:

static int fallbacks[migrate_types][4] = {
[migrate_ Unmovable] = {migrate_reclaimable, migrate_movable, Migrate_reserve},
[migrate_reclaimable] = {MI Grate_unmovable, Migrate_movable, Migrate_reserve},
#ifdef CONFIG_CMA
[ Migrate_movable] = {MIGRATE_CMA, migrate_reclaimable, migrate_unmovable, Migrate_reserve},
[ MIGRATE_CMA] = {Migrate_reserve},/* never used */
#else
[migrate_movable] = { Migrate_reclaimable, Migrate_unmovable, Migrate_reserve},
#endif
[migrate_ Reserve] = {Migrate_reserve},/* never used */
#ifdef config_memory_isolation
[Migrate_isolate] = {Migrate_reserve},/* never used */
#endif
};

It is important to note that not all scenarios are suitable for organizing pages by mobility, and when the memory size is not large enough to be allocated to various types, it is not appropriate to enable mobility. There is a global variable to indicate whether it is enabled and set when memory is initialized:

void __ref build_all_zonelists (pg_data_t *pgdat, struct zone *zone)
{
......
if (Vm_total_pages < (pageblock_nr_pages * migrate_types))
page_group_by_mobility_disabled = 1;
Else
page_group_by_mobility_disabled = 0;
......
}

If page_group_by_mobility_disabled, all memory is non-removable.

One of the parameters determines the page that each memory region owns at least, Pageblock_nr_pages, which is defined as follows:

#define Pageblock_order Hugetlb_page_order

#else/* Config_hugetlb_page */
/* If huge pages is not used, GROUP by Max_order_nr_pages */
#define Pageblock_order (MAX_ORDER-1)
#endif/* Config_hugetlb_page */
#define Pageblock_nr_pages (1UL << pageblock_order)

During system initialization, all pages are marked as movable:

void __meminit Memmap_init_zone (unsigned long size, int nid, unsigned long zone,
unsigned long start_pfn, enum Memmap_context context)
{
......
if ((Z->zone_start_pfn <= PFN)
&& (PFN < ZONE_END_PFN (z))
&&! (PFN & (pageblock_nr_pages-1)))
Set_pageblock_migratetype (page, migrate_movable);
......
}

Other types of mobility pages are generated later, that is, "stealing" as mentioned earlier. When this happens, it is common to "steal" the higher-priority, larger, contiguous pages in the fallback, thus avoiding the creation of small fragments.

/* Remove an element from the buddy allocator from the fallback list */
Static inline struct page *
__rmqueue_fallback (struct zone *zone, int order, int start_migratetype)
{
......
/* Find the largest possible block of pages in the and the other list */
for (Current_order = max_order-1; current_order >= order;
--current_order) {
for (i = 0;; i++) {
Migratetype = Fallbacks[start_migratetype][i];
......
}

You can view the various types of page distributions for the current system through/proc/pageteypeinfo.

3 virtual removable memory domain

Before the technology of organizing pages based on mobility, there is a way to kernel, which is the virtual memory domain: zone_movable. The basic idea is simple: divide the memory into two parts, movable and non-movable.

Enum Zone_type {
#ifdef CONFIG_ZONE_DMA
ZONE_DMA,
#endif
#ifdef CONFIG_ZONE_DMA32
Zone_dma32,
#endif
Zone_normal,
#ifdef CONFIG_HIGHMEM
Zone_highmem,
#endif
Zone_movable,
__max_nr_zones
};

Zone_movable enable requires specifying the kernel parameter Kernelcore or movablecore,kernelcore used to specify the amount of memory that cannot be moved, Movablecore specifies the size of the removable memory, if two are specified, Take a larger number of non-removable memory. If none is specified, it does not start.

Unlike other memory domains, zone_movable does not correlate to any physical memory range, which is taken from either the high-end memory domain or the normal memory domain.

Find_zone_movable_pfns_for_nodes is used to calculate the amount of memory zone_movable in each node, and the memory area used is usually the highest memory domain for each node, in the function find_usable_zone_for_ Movable is reflected in the.

When allocating zone_movable memory to each node, the Kernelcore is evenly distributed to each node:

Kernelcore_node = Required_kernelcore/usable_nodes;

When the Kernel alloc page is Gfp_flag, if __gfp_highmem and __gfp_movable are also specified, the memory is requested from the Zone_movable memory domain.

Linux Memory Fragmentation Prevention technology

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More