Memory Management-1

Source: Internet
Author: User

Linux memory management is based on segmentation, paging the logical address into a physical address, while some of the RAM is permanently allocated to the kernel used to store kernel code and static data. The rest of the RAM is dynamic memory. Linux uses a number of effective management methods, including page table management, high-end memory (temporary mapping area, fixed mapping zone, permanent map area, non-contiguous memory area) management, to reduce the external fragmentation of the partner system, to reduce the internal fragmentation of the slab mechanism, The partner system did not establish a previous page allocation system and emergency memory management, etc...



This figure CopyFrom http://bbs.chinaunix.net/thread-2018659-2-1.html

Page Descriptor:

The page descriptor records the current state of the page box, which is a page descriptor of a struct page, and all the page descriptors are placed in Mem_map, each descriptor field is 32bytes.

/* All physical page in the system have a struct page associated with * it to keep track of whatever it's we are using The page for at the * moment. Note that we have no-which tasks is using the * a page, though if it is a Pagecache page, Rmap structures can te LL US * who is mapping it. * * The objects in struct page is organized in double word blocks in * order to allows us to use atomic double Word opera tions on portions * of struct page. That's currently only used by slub and the arrangement * allows the use of atomic double word operations on the Flags/map Ping * and LRU list pointers also. */struct Page {/* First double word block * * Describes the status of the pages box unsigned long flags;/* Atomic flags, some possibly * updated Asynchron ously */struct address_space *mapping;/* If Low bit clear, points to * inode address_space, or NULL.  * If page mapped as anonymous * memory, low bit was set, and * It points to ANON_VMA object: * See Page_mapping_anon below. *//* Second Double word */struct {Union {pgoff_t index;/* our offset within mapping. */void *freelist;/* Slub/slob First free object */bool pfmemalloc;/* If set By the page allocator, * Alloc_no_watermarks were set * and the low watermark were not * met implying the system * is Under some pressure. The * caller should try ensure * This page was only used to * free other pages. */};union {#if defined (config_have_cmpxchg_double) && defined (config_have_aligned_struct_page)/* Used for Cmpxchg_double in Slub */unsigned long counters; #else/* * Keep _count separate from slub cmpxchg_double data. * As the rest of the double word is protected by * Slab_lock but _count are not. */unsigned counters; #endifstruct {union {/* * Count of PTEs mapped in * MMS, to show when page is * mapped & limit Rev Erse Map * Searches. * * used also for tail pages * refcounting instead of * _count. Tail pages cannot * be mapped and keeping the * Tail page _count Zero @ * All times guarantees * Get_page_unless_zero () W Ill * never succeed onTail * pages. */atomic_t _mapcount;struct {/* slub */unsigned inuse:16;unsigned objects:15;unsigned frozen:1;}; int units;/* slob */};atomic_t _count;/* Usage count, see below. */};};};/ * Third Double word block */union {struct List_head lru;/* pageout list, eg. active_list * protected by Zone->lru_lock ! */struct {/* Slub per CPU Partial pages */struct page *next;/* next partial slab */#ifdef config_64bitint pages;/* Nr of P Artial Slabs left */int pobjects;/* Approximate # of objects */#elseshort int pages;short int pobjects; #endif};struct List _head list;/* slobs List of pages */struct {/* Slab fields */struct Kmem_cache *slab_cache;struct slab *slab_page;};};/ * Remainder is isn't double word aligned */union {unsigned long private;/* mapping-private opaque data: * Usually used for Buffer_heads * if pageprivate set; Used for * swp_entry_t if pageswapcache; * Indicates order in the buddy * system if Pg_buddy is set. */#if use_split_ptlocksspinlock_t PTL; #endifstruct Kmem_cache *slab;/* Slub:pointer to Slab */struct page *first_page;/* Compound tail pages */};/* * on machines where all RAM are mapped into Kernel address space, * we can simply calculate the virtual address. On machines with * Highmem some memory are mapped into kernel virtual memory * Dynamically, so we need a place to store tha T address. * Note that this field could is a-bits on x86 ...;) * * architectures with slow multiplication can define * WANT_PAGE_VI Rtual in Asm/page.h */#if defined (want_page_virtual) void *virtual;/* Kernel VIRTUAL address (NULL if not kmapped, ie. hi GHMEM) */#endif/* want_page_virtual */#ifdef config_want_page_debug_flagsunsigned long debug_flags;/* use atomic bitops On this */#endif #ifdef config_kmemcheck/* * Kmemcheck wants to track the status of each byte in a page; This * was a pointer to such a status block. NULL if not tracked. */void *shadow; #endif}
However, the system divides the physical memory into several node nodes,Each node of memory is described by pg_data_t, which allocates memory from the node closest to the running CPU when assigning a page with a node-local-allocation strategy. Because the process is often run on the same CPU, the memory obtained from the current node is likely to be used. The physical memory of each node is divided into several zone zones,

typedef struct PGLIST_DATA {struct zone <span style= "White-space:pre" ></span>node_zones[MAX_NR_ZONES]; The node Admin area describes the zonelist data structure used by the array struct zonelist node_zonelists[max_zonelists];//page allocator;                                     Allocates memory int nr_zones from the standby area when no memory is available for the node; The number of management areas in the node #ifdef config_flat_node_mem_map/* means! Sparsemem */struct page *node_mem_map<span style= "White-space:pre" ></span> node in the pages descriptor array #ifdef config_ Memcgstruct page_cgroup *node_page_cgroup, #endif #endif#ifndef config_no_bootmemstruct bootmem_data *bdata;<span Style= "White-space:pre" ></span> used during kernel initialization phase #endif#ifdef config_memory_hotplug/* * must be held any time  Expect NODE_START_PFN, node_present_pages * or node_spanned_pages stay constant. Holding this would also * guarantee that any pfn_valid () stays that. * * Nests above Zone->lock and Zone->size_seqlock. */spinlock_t Node_size_lock; #endifunsigned long Node_start_pfn;<span style= "White-space:pre" ></span> Subscript for the first page box in a nodeunsigned long node_present_pages; /* Total number of physical pages */unsigned long node_spanned_pages; /* Total size of physical page range, including holes */int node_id;<span style= "white-space:pre" ></span> section Dot identifier wait_queue_head_t kswapd_wait;<span style= "white-space:pre" &GT;&LT;/SPAN&GT;KSWAPD page swapping the wait queue used by the daemon Wait_ queue_head_t pfmemalloc_wait;struct task_struct *KSWAPD; kernel thread to KSWAPD/* Protected by Lock_memory_hotplug () */int   Kswapd_max_order; <span style= "White-space:pre" &GT;&LT;/SPAN&GT;KSWAPD thread to create the value of the free block take logarithm enum zone_type classzone_idx;} pg_data_t;
All node descriptors are stored in a one-way list, and his first element is pointed to by the pgdat_list variable.
The physical memory of each memory node is divided into 3 admin areas zone_dma/zone_normal/zone_highmem


Admin Area Descriptor Field

struct Zone {/* fields commonly accessed by the page allocator *//* zone watermarks, access with *_wmark_pages (zone) macro s *//* three waterline values for this management area: high waterline (more plentiful), low waterline, min waterline. */ unsigned Long watermark[nr_wmark];/* * When free pages is below this point, additional steps is taken * when Rea Ding the number of free pages to avoid per-cpu counter * drift allowing watermarks to be breached */unsigned long percpu_d rift_mark;/* * We don ' t know if the memory that We ' re going to allocate would be freeable * Or/and It'll be released even  Tually, so to avoid totally wasting several * GB of RAM We must reserve some of the lower zone memory (otherwise we risk * To run OOM in the lower zones despite there ' s tons of freeable RAM * on the higher zones). This array is recalculated at runtime if the * Sysctl_lowmem_reserve_ratio sysctl changes. */<span style= "White-space:pre" ></span>/ * when high-end memory, normal memory area can not be allocated to memory, you need to allocate memory from the normal, DMA region.            * Additional retention required to prevent the DMA area from being consumedSome memory is available for driver use.            * This field refers to the amount of memory that needs to be retained when it is rolled back from the upper memory area to the memory area.            */ unsigned longlowmem_reserve[max_nr_zones];/* * This is a per-zone reserv E of pages that should not being * considered dirtyable memory. */unsigned Longdirty_balance_reserve; #ifdef the NUMA node to which the Config_numaint node; /* belongs. */ /* * Zone Reclaim becomes active if more unmapped pages exist. *//* page recycling occurs when a recyclable page exceeds this value. */ unsigned longmin_unmapped_pages; /* when the recoverable page for slab is greater than this value, the cached pages in slab are recycled. */ unsigned longmin_slab_pages; #endif <span style= "White-space:pre" ></span>/* per CPU cache,< Span style= "White-space:pre" > </span> <span style= "White-space:pre" ></span> <span Style= "White-space:pre" >* the page cache per CPU.            * When assigning a single page, the page is first allocated from that cache. This will:           * Avoid using global locks            * Avoid the same page being repeatedly allocated by different CPUs, causing caching The failure of the line.    &nbsP       * Avoid splitting chunks in the admin area into fragments.             */ </span>struct per_cpu_pageset __percpu *pageset;/* * Free areas of Different sizes */spinlock_tlock;int all_unreclaimable; /* All pages pinned */#if defined config_compaction | | Defined config_cma/* PFN where the last incremental compaction isolated free pages */unsigned LONGCOMPACT_CACHED_FREE_PFN; #endif #ifdef config_memory_hotplug/* See spanned/present_pages For more description */<span style= "White-space:pre" > /* the lock is used to protect the partner system data structures. That is, the protection of Free_area related data. *</span>/seqlock_tspan_seqlock; #endif #ifdef config_cma/* * CMA needs to increase watermark levels during the Alloca tion * process to make sure, the system is not starved. */unsigned longmin_cma_pages; #endif <span style= "White-space:pre" >/* the main variables of the partner system. This array defines 11 queues, each of which is a page of size 2^n */ </span>struct Free_areafree_area[max_order]; #ifndef config_ sparsemem/* * Flags for a pageblock_nr_pages block. See Pageblock-flags.h. * In Sparsemem, the This map is stored in struct mem_section */ /* In the area of the page sign array */ unsigned long*pageblock_flags; #en DIF/* Config_sparsemem */#ifdef config_compaction/* * on compaction failure, 1<<compact_defer_shift compactions * A Re skipped before trying again. The number attempted since * Last failure was tracked with compact_considered. */unsigned intcompact_considered;unsigned intcompact_defer_shift;intcompact_order_failed, #endif <span style= " White-space:pre ">/* populated unused fields, make sure that the following fields are cached line aligned */ </span>zone_padding (_pad1_) <span style=" White-space:pre "></span>/* <span style=" White-space:pre "></span> * LRU is used to determine which fields are active, Which is not active, <span style= "White-space:pre" ></span> * and determines accordingly that it should be written back to disk to free memory.  <span style= "White-space:pre" ></span>*//* fields commonly accessed by the page Reclaim scanner */   Spinlock_tlru_lock;struct lruveclruvec;unsigned longpages_scanned; /* Since last reclaim */unsigned longFlags /* Zone flags, see below *//* Zone statistics */atomic_long_tvm_stat[nr_vm_zone_stat_items];/* * The target ratio of ACTIV  E_anon to Inactive_anon pages in * This zone ' s LRU. Maintained by the Pageout code. */unsigned int inactive_ratio; Zone_padding (_pad2_)/* rarely used or read-mostly fields *//* * wait_table--The array holding the hash table * wait_table _hash_nr_entries--the size of the hash table array * wait_table_bits--wait_table_size = = (1 << wait_table_bits) * * The purpose of all these are to keep track of the people * waiting for a page to become available and make them * runnabl E again when possible. The trouble is that this * consumes a lot of space, especially when so few things * wait on pages at a given time. So instead of using * per-page waitqueues, we use a waitqueue hash table. * The bucket discipline is to sleep on the same queue when * colliding and wake all in that wait queue when removing. * When something wakes, it must check-to is sure its PAGE is * truly available, a la thundering herd.  The cost of A * collision was great, but given the expected load of the * table, they should was so rare as to be outweighed By the * benefits from the saved space. * * __wait_on_page_locked () and Unlock_page () in Mm/filemap.c, is the * primary users of these fields, and in Mm/page_all OC.C * Free_area_init_core () performs the initialization of them. */wait_queue_head_t* wait_table;unsigned longwait_table_hash_nr_entries;unsigned longwait_table_bits;/* * Discontig Memory support fields. */ /* Admin Area belongs to Node */ struct pglist_data*zone_pgdat;/* ZONE_START_PFN = = zone_start_paddr >> PAGE_SHIFT * * Unsigned longzone_start_pfn; /* admin Area page offset in Mem_map */ /* * ZONE_START_PFN, Spanned_pages and Present_pages  is all * protected by Span_seqlock.  It is a seqlock because it had * to be read outside of Zone->lock, and it was done in the main * allocator path. But, it's written quite infrequently. * * The lock is declared along with ZoNe->lock because it is * frequently read in proximity to Zone->lock. It ' s Good to * give them a chance of being in the same cacheline. */unsigned longspanned_pages;/* total size, including holes */unsigned longpresent_pages;/* amount of memory (excluding HO Les) *//* * rarely used fields: */const char*name; #ifdef config_memory_isolation/* * The number of migrate_isolate *pagebl ock*. * We need this for free page counting. Look at Zone_watermark_ok_safe. * It ' s protected by Zone->lock */intnr_pageblock_isolate; #endif}






Memory Management-1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.