Linux kernel source scenario analysis-allocation of user pages for memory management

Source: Internet
Author: User

First, several important data structures are introduced.

1. Page

typedef struct PAGE {struct List_head list;struct address_space *mapping;unsigned long index;struct page *next_hash;atomi c_t count;unsigned Long flags;/* atomic flags, some possibly updated asynchronously */struct List_head lru;unsigned long A ge;wait_queue_head_t wait;struct page **pprev_hash;struct buffer_head * buffers;void *virtual; /* non-null if kmapped */struct zone_struct *zone;} mem_map_t;
where virtual points to the actual page memory address.


2, Zone_struct

typedef struct ZONE_STRUCT {/* * commonly accessed fields: */spinlock_tlock;unsigned longoffset;unsigned longfree_pages; Unsigned longinactive_clean_pages;unsigned longinactive_dirty_pages;unsigned longpages_min, Pages_low, pages_high;/ * * Free areas of different sizes */struct list_headinactive_clean_list;free_area_tfree_area[max_order];/* * rarely used F Ields: */char*name;unsigned longsize;/* * discontig memory support fields. */struct pglist_data*zone_pgdat;unsigned longzone_start_paddr;unsigned longzone_start_mapnr;struct page*zone_mem_ Map;} zone_t, #define Zone_dma0#define zone_normal1#define zone_highmem2#define Max_nr_zones3


3, free_area_t

typedef struct FREE_AREA_STRUCT {struct list_headfree_list;unsigned int*map;} free_area_t;


each physical page in the system has a page structure (or mem_map_t). The system establishes a page structure array based on the size of the physical memory at initialization, and as a "warehouse" for the physical page, each page data structure in the system represents a physical page in the Mem_map. The subscript of the page structure of each physical page in this array is the number of the physical pages.

The physical pages in the warehouse are divided into two administrative areas of ZONE_DMA and Zone_normal. Each admin area has a data structure, the ZONE_STRUCT data structures. There is a set of "Idle Interval" (free_area_t) queues in the ZONE_STRUCT data structure. Why is a "group" queue, not a "one" queue? This is also because it is often necessary to "block" the number of consecutive pages in the physical space to be allocated, so the size of the block is managed separately. Therefore, there is a queue in the management area data structure to maintain some discrete (continuous length of 1) physical pages, and a queue to maintain a continuous length of 2 of the page block, and the continuous length of 4, 8, 16 ... block of pages until 2 ^ Max_order. The constant max_order is defined as 10, meaning that the largest contiguous page block can reach 2 ^ 10 = 1024 pages, or 4M bytes.

The offset in the admin area structure represents the start page number of the partition in Mem_map. Once the management area is established, each physical page will permanently belong to a certain administrative area, depending on the starting address of the page, as if a building belongs to a police station, depending on its address.

The structure of the FREE_AREA_STRUCT structure in the idle area to maintain the bidirectional chain queue List_head is a common data structure. Back to the page structure above, where the first component is a list_head structure, the page structure of the physical pages is through it into the free_area_struct structure of the two-way chain queue.


Alloc_pages allocates memory, as follows:

Static inline struct page * alloc_pages (int gfp_mask, unsigned long order) {/* * Gets optimized away by the compiler. */if (Order >= Max_order) return Null;return __alloc_pages (contig_page_data.node_zonelists+ (Gfp_mask), order);}

where contig_page_data.node_zonelists are as follows :

typedef struct ZONELIST_STRUCT {zone_t * zones [max_nr_zones+1];//NULL delimitedint gfp_mask;} zonelist_t;

The __alloc_pages function is as follows:

/* * This is the ' heart ' of the zoned buddy allocator: */struct page * __alloc_pages (zonelist_t *zonelist, unsigned long o Rder) {zone_t **zone;int direct_reclaim = 0;unsigned int gfp_mask = zonelist->gfp_mask;struct Page * Page;......memory_ Pressure++;......if (Order = = 0 && (gfp_mask & __gfp_wait) &&! ( Current->flags & pf_memalloc)) Direct_reclaim = 1;//If the requirement is to allocate only a single page, and to wait for the assignment to complete, and not for administrative purposes, a local amount is direct_ Reclaim is set to 1, which means that it can be reclaimed from the inactive clean pages buffer queue in the appropriate page management area ... if (Inactive_shortage () > INACTIVE_TARGET/2 && free_ Shortage ()) WAKEUP_KSWAPD (0);//wake-up KSWAPD kernel thread ... else if (free_shortage () && nr_inactive_dirty_pages > Free_shortage () && nr_inactive_dirty_pages >= freepages.high) wakeup_bdflush (0);//wake Bdflush kernel thread Try_again : ... zone = zonelist->zones;for (;;) {zone_t *z = * (zone++), if (!z) break;if (!z->size) BUG (), if (z->free_pages >= z->pages_low) {page = Rmqueue (Z, O Rder);//Attempt to assign an if (page) return page from the admin area;} else if (z->free_pages< Z->pages_min &&waitqueue_active (&kreclaimd_wait)) {wake_up_interruptible (&kreclaimd_wait) ;//Wake Kreclaimd}}......page = __alloc_pages_limit (zonelist, order, Pages_high, direct_reclaim);//continuously lower water level if (page) return page;......page = __alloc_pages_limit (zonelist, order, Pages_low, direct_reclaim);//continuously lower the water level if (page) return page; ... wakeup_kswapd (0);//wake-up kernel thread kswapdif (Gfp_mask & __gfp_wait) {//If the current assignment policy indicates that it is determined to be determined for the page that is required to be allocated, it is preferred to wait when it is not allocated. Let the system be dispatched at once. __set_current_state (task_running); Current->policy |= sched_yield;schedule ();//allow the system to dispatch at once, so that KSWAPD can be scheduled to run immediately, Other processes may also release some pages. }......page = __alloc_pages_limit (zonelist, order, pages_min, direct_reclaim);//When dispatched again, or when the allocation policy indicates that no wait is allowed, the parameter Pages_ Min Again calls the _alloc_pages_limit () if (page) return page;//if it fails again, it's time to see who is asking for the memory page to be allocated. If the process that requires the allocation of pages is KSWAPD or KRECLAIMD, which is itself a "memory allocation worker," the purpose of allocating memory pages is to perform official duties, is to better allocate memory pages, which is currently more important than the general process, these processes Task_ The PF_MEMALLOC flag bit for the flags field in the struct structure is 1. if (! ( Current->flags & Pf_memalloc) {//For non-KSWAPD or KRECLAIMD processes ... if (Order > 0 && (Gfp_mask & __gfp_wait)) {//Assigned pages more than 1, and if no page is assigned, would rather wait for zone = zonelist->zones;/* First, clean some dirty pa Ges. */current->flags |= Pf_memalloc;page_launder (gfp_mask, 1); Current->flags &= ~PF_MEMALLOC;for (;;) {zone_t *z = * (zone++), if (!z) break;if (!z->size) continue;while (z->inactive_clean_pages) {struct page * page;/* Mo ve one page to the free list. */page = Reclaim_page (z), if (!page) break;__free_page (page),/* Try if the allocation succeeds. */page = Rmqueue (z, order); if (page) return page;}}} ... if ((Gfp_mask & (__gfp_wait|__gfp_io)) = = (__gfp_wait|__gfp_io)) {WAKEUP_KSWAPD (1); Memory_pressure++;if (! Order)//If the number of pages allocated is 1, then Try_againgoto Try_again; else if (Gfp_mask & __gfp_wait) {try_to_free_pages (gfp_mask); memory_pressure++;if (!order)//If the number of pages assigned is 1, try_ Againgoto Try_again;}} If it is "doing business", or although not performing official duties, but has done everything possible, all measures have been taken, except that the requested distribution of the page is not reversed to the previous label Try_again zone = zonelist->zones;for (;;) {zone_t *z = * (zone++); struct page * page = null;if (!z) break;if (!z->size) BUG (), ... if (direct_reclaim) {//If the request is assigned a page = Reclaim_page (z); if (page) return page;} /* Xxx:is PAGES_MIN/4 a good amount to reserve for this? */if (Z->free_pages < Z-&GT;PAGES_MIN/4 &&! ( Current->flags & pf_memalloc)) Continue;page = Rmqueue (z, order);//This time is at the expense of, as long as the above Z-&GT;PAGES_MIN/4, try to allocate from the management area if (page) return page; /* No luck. */PRINTK (kern_err "__alloc_pages:%lu-order allocation failed.\n", order); return NULL;}
There are two parameters when calling. The first parameter, Gfp_mask, is an integer that indicates which allocation policy to use, and the second parameter order represents the desired physical block size, which can be 1, 2, 4 ..., until 2 ^ max_order pages.


The Rmqueue function is as follows, using the partner algorithm.

static struct page * Rmqueue (zone_t *zone, unsigned long order) {free_area_t * area = Zone->free_area + order;//points to the desired size of the link The queue header of the physical memory block unsigned long Curr_order = order;struct list_head *head, *curr;unsigned long flags;struct page *page;spin_lock_ Irqsave (&zone->lock, flags);d o {head = &area->free_list;curr = Memlist_next (head); if (curr! = head) {// First assign unsigned int index;page = memlist_entry (curr, struct page, list) in the queue that exactly satisfies the size requirement, and if (Bad_range (zone,page)) BUG (); Memlist_del (curr); index = (page-mem_map)-zone->offset; Mark_used (index, Curr_order, area), zone->free_pages-= 1 << order;//number of space pages reduce the corresponding number page = expand (Zone, page, index , order, Curr_order, area), or if a larger queue has been tried, the remainder of the allocated chunk will be broken down into small chunks and linked into the corresponding queue Spin_unlock_irqrestore (&zone->lock, Flags); Set_page_count (page, 1);//Use Count to 1if (Bad_range (zone,page)) BUG ();D Ebug_add_pagereturn page;} curr_order++;//If not, try allocating area++ in a larger queue;} while (Curr_order < max_order); Spin_unlock_irqrestore (&zone->lock, flags); return NULL;}


The __alloc_pages_limit function is as follows:

static struct page * __alloc_pages_limit (zonelist_t *zonelist,unsigned long order, int limit, int direct_reclaim) {zone_t * *zone = Zonelist->zones;for (;;) {zone_t *z = * (zone++); unsigned long water_mark;if (!z) break;if (!z->size) BUG ();/* * We Allocate if the number of free + Inactive_clean * Pages is above the watermark. */switch (limit) {//different water level default:case Pages_min:water_mark = z->pages_min;break;case Pages_low:water_mark = z-> Pages_low;break;case Pages_high:water_mark = Z->pages_high;} if (z->free_pages + z->inactive_clean_pages > Water_mark) {struct page *page = null;/* If possible, reclaim a Pag E directly.  */if (Direct_reclaim && z->free_pages < Z->pages_min + 8)//If the required allocation page is 1 pages = Reclaim_page (z);/* If that Fails, fall back to Rmqueue. */if (!page)//If not assigned to page = Rmqueue (z, order);//attempt to allocate an if (page) return page from the buffer;} /* Found nothing. */return NULL;}


In fact, the vast majority of assignment page operations are successful on the first page of the administrative area as defined by the allocation policy. However, from here we can see how carefully the design of a system needs to be considered.

Linux kernel source scenario analysis-allocation of user pages for memory management

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.