Linux kernel source code scenario analysis chap 2 Storage Management (4)

Source: Internet
Author: User
Tags ocaml

Linux kernel source code scenario analysis chap 2 Storage Management (4)
Usage and turnover of physical pages 1. Several terms 1.1 virtual storage pages

A fixed size in a virtual address space. The boundary is 4 kb aligned with the page size and its content.

1.2 physical page

Relative to a VM page, the page must be mapped to a page on a physical storage medium. Based on whether it is in memory, we can divide it into memory pages and disk pages.
In addition,In general, the allocation and release of physical memory pages refer to the physical media, while the switching and switching of pages refer to their content.

1.3 Exchange Technology

When the system memory is not enough, we can put the information that is not used for the time being on the disk to free up space for other urgent information and read it from the disk as needed.
(Swap partition is mainly used in linux, and virtual memory technology is used in windows)
In the early days, it was based on segment-based switching, but the efficiency was too low, so it was developed into on-demand page switching technology.This is a typical practice of changing the time for space.

2. abstract description of physical pages 2.1 memory physical pages

During system initialization, the kernel creates a page structure for each page based on the detected physical memory size to form a page array, and use a global volume mem_map to point to this array. (However, in my opinion, this is for the UMA uniform media. For the NUMA page array, it should belong to a node.)

At the same time, these pages are combined into multiple consecutive Memory Page blocks with physical addresses as needed, and several management zone zones are created based on the size of the blocks, in each management area, an idle queue is set for the allocation and use of physical memory pages.

2.2 switch device physical page 2.2.1 swap_info_struct

The kernel defines a swap_info_struct data structure,Used to describe and manage files and devices used for page exchanges.

==================== include/linux/swap.h 49 64 ====================49  struct swap_info_struct {50      unsigned int flags;51      kdev_t swap_device;52      spinlock_t sdev_lock;53      struct dentry * swap_file;54      struct vfsmount *swap_vfsmnt;55      unsigned short * swap_map;56      unsigned int lowest_bit;57      unsigned int highest_bit;58      unsigned int cluster_next;59      unsigned int cluster_nr;60      int prio; /* swap priority */61      int pages;62      unsigned long max;63      int next; /* next entry on swap list */64  };

Swap_map points to an array. Each value in the array represents a physical page on the disk. The array subscript determines the location of the page on the disk or file. The array size is related to pages.
It seems that the swap_map pointer is very similar to the mem_map pointer pointing to a page array.=== !! <~~ ~.~

Note that the first page on the device, ie, swap_map [0], is not used for page exchange,It contains some information about the device or file, and shows which pages can be used as bitmaps..

We use the lowest_bit and highest_bit fields to mark the start and end of the file. Use the max field to mark the physical size of the device.

Because our disks are usually rotated, we try to use the cluster method when allocating disk space. cluster_next and cluster_nr are designed for this purpose.

Since linux allows multiple page switching devices (files), a swap_info_struct array is defined in the kernel.

 struct swap_info_struct swap_info[MAX_SWAPFILES];

At the same time, the kernel also sets up a queue swap_list to connect the swap_info_struct structure of each disk device or file that can be allocated a physical page with priority.

==================== mm/swapfile.c 23 23 ====================23 struct swap_list_t swap_list = {-1, -1};==================== include/linux/swap.h 153 156 ====================153 struct swap_list_t {154 int head; /* head of priority-ordered swapfile list */155 int next; /* swapfile to be used next */156 };
2.2.2 swap_entry_t page switch item

Similar to the pte_t data structure in the memory, the physical memory page is connected with the virtual storage page, and the page on the disk also has a swp_entry_t data structure to implement similar functions.

==================== include/linux/shmem_fs.h 8 18 ====================8 /*9 * A swap entry has to fit into a "unsigned long", as10 * the entry is hidden in the "index" field of the11 * swapper address space.12 *13 * We have to move it here, since not every user of fs.h is including14 * mm.h, but m.h is including fs.h via sched .h :-/15 */16 typedef struct {17 unsigned long val;18 } swp_entry_t;

Here, offset indicates the location of the page in a disk device or file, ie, logical page number in the file.To put it bluntly, it corresponds to the subscript in the array pointed to by swap_map.
Type indicates the file in which the page is located, which is a sequence number.To put it bluntly, it corresponds to swap_info, which represents the subscript in the array of multiple page Switches.

In addition, the swp_entry_t structure is closely related to the pte_t structure.They have data structures of the same size.
When a page is in the memory, the second-bit P is 1, and Others describe the address and page attribute of the physical memory page.
When the page is on the disk, the second-bit P is 0, and the other bits indicate the destination of the page.

3. Disk turnover 3.1 Physical space management _ swap_free
==================== mm/swapfile.c 141 182 ====================141 /*142 * Caller has made sure that the swapdevice corresponding to entry143 * is still around or has not been recycled.144 */145 void __swap_free(swp_entry_t entry, unsigned short count)146 {147 struct swap_info_struct * p;148 unsigned long offset, type;149150 if (!entry.val)151 goto out;152153 type = SWP_TYPE(entry);154 if (type >= nr_swapfiles)155 goto bad_nofile;156 p = & swap_info[type];157 if (!(p->flags & SWP_USED))158 goto bad_device;159 offset = SWP_OFFSET(entry);160 if (offset >= p->max)161 goto bad_offset;162 if (!p->swap_map[offset])163 goto bad_free;164 swap_list_lock();165 if (p->prio > swap_info[swap_list.next].prio)166 swap_list.next = type;167 swap_device_lock(p);168 if (p->swap_map[offset] < SWAP_MAP_MAX) {169 if (p->swap_map[offset] < count)170 goto bad_count;171 if (!(p->swap_map[offset] -= count)) {172 if (offset < p->lowest_bit)173 p->lowest_bit = offset;174 if (offset > p->highest_bit)175 p->highest_bit = offset;176 nr_swap_pages++;177 }178 }179 swap_device_unlock(p);180 swap_list_unlock();181 out:182 return;

It should be noted that the operation of releasing disk page content does not actually involve disk operations, but the "book operation" in the memory indicates that the page content on the disk has been voided. Therefore, the cost is very small.

3.2 Memory Page turnover

There are two meanings:
1. Page allocation, use, and recovery do not necessarily involve page disk swap
2. Disk swap, the ultimate goal is to reclaim the page.

For pages in a user space, as well as pages involving allocation, use, and recycling, it also involves page switching and switchingEven Process Code segments are dynamically allocated from the system perspective.

Pages mapped to system space are not swapped out.After the service is used up, you need to release the service. Some pages are difficult to obtain and may use the LRU queue.

3.2.1 page exchange policy The simplest strategy is out-of-the-box allocation, but it is conceivable that the LRU, ie, and recently least-used page switching policies are inefficient, but may cause page jitter. To reduce jitter, the temporary storage queue is introduced into the dirty and clean pages for further optimization. 3.2.2 switch-in and switch-out of the physical memory page IdleIn this case, the page is in the free_area queue of a zone management zone. The page reference count is 0.Allocate, Allocation page. The reference count is 1, and the page is not in the free_area queue.Active status, Connected to active_list through the lru structure, increasing the reference countInactive (dirty), Connect to inactive_dirty_list using lru, and write inactive dirty content into the swap device in descending reference count, and move it to inactive_clean_list.Inactive (clean)If you receive access within a period of time after the transfer is inactive, you can switch to the active status to restore the ing. If you need to restore the ing, you can clear the page from the clean queue, return to the idle queue, or assign it separately.

I will explain it in my own language:
We first allocated a page, and then the page is active. Then, if we do not access it for the time being, it will become aging and enter the inactive non-active (dirty) state, but at this time, we did not write the switch device immediately. After a while, no one was actually visiting it. We wrote it into the switch device, but we still did not release this part of the page, it is marked as inactive non-active (clean) state. Now it is managed by the corresponding storage zone, which was previously managed by the global queue. If the page is accessed again before it is used for other purposes, you can directly create a ing. This method reduces page jitter.

3.2.3 policy implementation Global LRU queue, active_list, and inactive_dirty_list each page management area sets inactive_clean_list global address_space Data Structure swapper_space to accelerate search, introducing page_hash_table

Next let's take a look at the code exchanged in the kernel.

3.2.3.1 code
==================== mm/swap_state.c 54 70 ====================54 void add_to_swap_cache(struct page *page, swp_entry_t entry)55 {56 unsigned long flags;5758 #ifdef SWAP_CACHE_INFO59 swap_cache_add_total++;60 #endif61 if (!PageLocked(page))62 BUG();63 if (PageTestandSetSwapCache(page))64 BUG();65 if (page->mapping)66 BUG();67 flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1));68 page->flags = flags | (1 << PG_uptodate);69 add_to_page_cache_locked(page, &swapper_space, entry.val);70 }==================== mm/filemap.c 476 494 ====================476 /*477 * Add a page to the inode page cache.478 *479 * The caller must have locked the page and480 * set all the page flags correctly..481 */482 void add_to_page_cache_locked(struct page * page, struct address_space *mapping, unsigned long index)483 {484 if (!PageLocked(page))485 BUG();486487 page_cache_get(page);488 spin_lock(&pagecache_lock);489 page->index = index;490 add_page_to_inode_queue(mapping, page);491 add_page_to_hash_queue(page, page_hash(mapping, index));492 lru_cache_add(page);493 spin_unlock(&pagecache_lock);494 }==================== include/linux/fs.h 365 375 ====================365 struct address_space {366 struct list_head clean_pages; /* list of clean pages */367 struct list_head dirty_pages; /* list of dirty pages */368 struct list_head locked_pages; /* list of locked pages */369 unsigned long nrpages; /* number of total pages */370 struct address_space_operations *a_ops; /* methods */371 struct inode *host; /* owner: inode, block_device */372 struct vm_area_struct *i_mmap; /* list of private mappings */373 struct vm_area_struct *i_mmap_shared; /* list of shared mappings */374 spinlock_t i_shared_lock; /* and spinlock protecting it */375 };==================== mm/swap_state.c 31 37 ====================31 struct address_space swapper_space = {32 LIST_HEAD_INIT(swapper_space.clean_pages),33 LIST_HEAD_INIT(swapper_space.dirty_pages),34 LIST_HEAD_INIT(swapper_space.locked_pages),35 0, /* nrpages */36 &swap_aops,37 };==================== include/linux/mm.h 150 150 ====================150 #define get_page(p) atomic_inc(&(p)->count)==================== include/linux/pagemap.h 31 31 ====================31 #define page_cache_get(x) get_page(x)==================== mm/filemap.c 72 79 ====================72 static inline void add_page_to_inode_queue(struct address_space *mapping, struct page * page)73 {74 struct list_head *head = &mapping->clean_pages;7576 mapping->nrpages++;77 list_add(&page->list, head);78 page->mapping = mapping;79 }==================== mm/filemap.c 58 70 ====================58 static void add_page_to_hash_queue(struct page * page, struct page **p)59 {60 struct page *next = *p;6162 *p = page;63 page->next_hash = next;64 page->pprev_hash = p;65 if (next)66 next->pprev_hash = &page->next_hash;67 if (page->buffers)68 PAGE_BUG(page);69 atomic_inc(&page_cache_size);70 }==================== include/linux/pagemap.h 68 68 ====================68 #define page_hash(mapping,index) (page_hash_table+_page_hashfn(mapping,index))==================== mm/swap.c 226 241 ====================226 /**227 * lru_cache_add: add a page to the page lists228 * @page: the page to add229 */230 void lru_cache_add(struct page * page)231 {232 spin_lock(&pagemap_lru_lock);233 if (!PageLocked(page))234 BUG();235 DEBUG_ADD_PAGE236 add_page_to_active_list(page);237 /* This should be relatively rare */238 if (!page->age)239 deactivate_page_nolock(page);240 spin_unlock(&pagemap_lru_lock);241 }==================== include/linux/swap.h 209 215 ====================209 #define add_page_to_active_list(page) { \210 DEBUG_ADD_PAGE \211 ZERO_PAGE_BUG \212 SetPageActive(page); \213 list_add(&(page)->lru, &active_list); \214 nr_active_pages++; \215 }

From the add_to_page_cache_locked function, we can know that the page is added to three Queues:
1. Use list to join the temporary storage queue swapper_space
2. Use next_hash and pprev_hash to add hash_queue
3. Use lru to join the LRU queue active_list

3.3 users participate in memory management

Privileged users can participate in storage management through swapon and swapoff.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.