The Linux kernel organization manages physical memory in the form of buddy system (partner system), while physical memory fragmentation formally buddy one of the weaknesses of the system, and in order to prevent and solve the fragmentation problem, kernel takes some practical techniques, which are summarized here.
1 integrated fragmentation at low memory
To request memory pages from Buddy, if the appropriate page is not found, two steps will be made to adjust the memory work, compact and reclaim. The former is to consolidate fragments to get larger contiguous memory, which is a buffer memory that is recycled that does not necessarily have to consume memory. Here we focus on comact, the whole process is as follows:
__alloc_pages_nodemask
__alloc_pages_slowpath
__alloc_pages_direct_compact
Try_to_compact_pages
Compact_zone_order
Compact_zone
Isolate_migratepages
Migrate_pages
Release_freepages
Not all applications that do not have memory will be compact, first to satisfy order greater than 0, and Gfp_mask carry __gfp_fs and __gfp_io, in addition, the remaining memory conditions need zone to meet certain conditions, kernel called "Fragmentation index" (fragmentation index), this value between 0~1000, the default fragmentation index is greater than 500 for the compact, you can adjust the default value by proc file extfrag_threshold. Fragmentation index is calculated by the Fragmentation_index function:
- /*
- * Index is between 0 and 1000
- *
- * 0 = allocation would fail due to lack of memory
- * $ = allocation would fail due to fragmentation
- */
- Return 1000-div_u64 ((1000+ (Div_u64 (info->free_pages * 1000ULL, requested)), Info->free_blocks_total)
In the process of consolidating memory fragmentation, the fragment page will only move inside the zone and move the page at the zone's low address to the end of the zone as far as possible. The request for a new page location is implemented through the COMPACTION_ALLOC function.
The move process is also divided into synchronous and asynchronous, the first compact will be used asynchronously after the memory request fails, and synchronization will be used after subsequent reclaim. The synchronization process moves only pages that are not used in the face, and the asynchronous process traverses and waits for all movable pages to be moved after the page is used.
2 organize pages by mobility
The memory pages are divided into the following three types according to mobility:
unmovable: Fixed in-memory position, cannot move freely. Kernel allocated memory basically belongs to this type;
Reclaimable: You cannot move, but you can delete a collection. For example, file mapping memory;
Movable: Can move freely, the memory of user space basically belongs to this type.
When requesting memory, based on mobility, first request memory in the free pages of the specified type, and the free memory in each zone is organized as follows:
- struct Zone {
- ......
- struct Free_area Free_area[max_order];
- ......
- }
- struct Free_area {
- struct List_head free_list[migrate_types];
- unsigned long nr_free;
- };
When the specified type of Free_area request for memory is not available, can be appropriated from the standby type, the misappropriation of memory will be released into the newly specified type list, kernel This process is called "misappropriation".
The alternate type priority list is defined as follows:
- static int fallbacks[migrate_types][4] = {
- [migrate_ Unmovable] = {migrate_reclaimable, migrate_movable, Migrate_reserve},
- [migrate_reclaimable] = {MI Grate_unmovable, Migrate_movable, Migrate_reserve},
- #ifdef CONFIG_CMA
- [ Migrate_movable] = {MIGRATE_CMA, migrate_reclaimable, migrate_unmovable, Migrate_reserve},
- [ MIGRATE_CMA] = {Migrate_reserve},/* never used */
- #else
- [migrate_movable] = { Migrate_reclaimable, Migrate_unmovable, Migrate_reserve},
- #endif
- [migrate_ Reserve] = {Migrate_reserve},/* never used */
- #ifdef config_memory_isolation
- [Migrate_isolate] = {Migrate_reserve},/* never used */
- #endif
- };
It is important to note that not all scenarios are suitable for organizing pages by mobility, and when the memory size is not large enough to be allocated to various types, it is not appropriate to enable mobility. There is a global variable to indicate whether it is enabled and set when memory is initialized:
- void __ref build_all_zonelists (pg_data_t *pgdat, struct zone *zone)
- {
- ......
- if (Vm_total_pages < (pageblock_nr_pages * migrate_types))
- page_group_by_mobility_disabled = 1;
- Else
- page_group_by_mobility_disabled = 0;
- ......
- }
If page_group_by_mobility_disabled, all memory is non-removable.
One of the parameters determines the page that each memory region owns at least, Pageblock_nr_pages, which is defined as follows:
#define Pageblock_order Hugetlb_page_order
- #else/* Config_hugetlb_page */
- /* If huge pages is not used, GROUP by Max_order_nr_pages */
- #define Pageblock_order (MAX_ORDER-1)
- #endif/* Config_hugetlb_page */
- #define Pageblock_nr_pages (1UL << pageblock_order)
During system initialization, all pages are marked as movable:
- void __meminit Memmap_init_zone (unsigned long size, int nid, unsigned long zone,
- unsigned long start_pfn, enum Memmap_context context)
- {
- ......
- if ((Z->zone_start_pfn <= PFN)
- && (PFN < ZONE_END_PFN (z))
- &&! (PFN & (pageblock_nr_pages-1)))
- Set_pageblock_migratetype (page, migrate_movable);
- ......
- }
Other types of mobility pages are generated later, that is, "stealing" as mentioned earlier. When this happens, it is common to "steal" the higher-priority, larger, contiguous pages in the fallback, thus avoiding the creation of small fragments.
- /* Remove an element from the buddy allocator from the fallback list */
- Static inline struct page *
- __rmqueue_fallback (struct zone *zone, int order, int start_migratetype)
- {
- ......
- /* Find the largest possible block of pages in the and the other list */
- for (Current_order = max_order-1; current_order >= order;
- --current_order) {
- for (i = 0;; i++) {
- Migratetype = Fallbacks[start_migratetype][i];
- ......
- }
You can view the various types of page distributions for the current system through/proc/pageteypeinfo.
3 virtual removable memory domain
Before the technology of organizing pages based on mobility, there is a way to kernel, which is the virtual memory domain: zone_movable. The basic idea is simple: divide the memory into two parts, movable and non-movable.
- Enum Zone_type {
- #ifdef CONFIG_ZONE_DMA
- ZONE_DMA,
- #endif
- #ifdef CONFIG_ZONE_DMA32
- Zone_dma32,
- #endif
- Zone_normal,
- #ifdef CONFIG_HIGHMEM
- Zone_highmem,
- #endif
- Zone_movable,
- __max_nr_zones
- };
Zone_movable enable requires specifying the kernel parameter Kernelcore or movablecore,kernelcore used to specify the amount of memory that cannot be moved, Movablecore specifies the size of the removable memory, if two are specified, Take a larger number of non-removable memory. If none is specified, it does not start.
Unlike other memory domains, zone_movable does not correlate to any physical memory range, which is taken from either the high-end memory domain or the normal memory domain.
Find_zone_movable_pfns_for_nodes is used to calculate the amount of memory zone_movable in each node, and the memory area used is usually the highest memory domain for each node, in the function find_usable_zone_for_ Movable is reflected in the.
When allocating zone_movable memory to each node, the Kernelcore is evenly distributed to each node:
Kernelcore_node = Required_kernelcore/usable_nodes;
When the Kernel alloc page is Gfp_flag, if __gfp_highmem and __gfp_movable are also specified, the memory is requested from the Zone_movable memory domain.
Linux Memory Fragmentation Prevention technology