It introduces the mechanism of virtual memory and the implementation of MMAP system call. MMAP allows the device memory to be mapped directly to the address space of the user process. The management of physical memory, including cache allocation and recycling, page mechanism, swap space and so on.
1) switching module (swap)
This module is responsible for controlling the swap-out of memory content, which, through the substitution mechanism, leaves the page box (Ram page) of the physical memory with a valid logical page, that is, to retire the recently inaccessible logical pages from main memory and save the recently accessed logical pages. The source programs implemented by this module are:
- The main function of PAGE_IO.C is to read and write Exchange files.
- The main function of swap_state.c is to modify the swap cache (swap cache).
- The main function of swapfile.c is to complete the swap-out system call (Sys_swapin, Sys_swapon).
- The main function of SWAP.C is to define the data structures and constants used by the interchange, such as Free_page_low, Free_page_high.
- KSWAPD is a daemon (kernel thread) that periodically handles swapping memory and the outer village
2) core memory management module
This module is responsible for core memory management functions, that is, the management of pages, which will be used by other kernel subsystems (such as file systems). The source programs implemented by this module are:
- PAGE_ALLOC.C main function function is to handle the release, collection and distribution of pages
- MEMORY.C only contains the related functions of the page mechanism.
3) structure-specific modules
This module is responsible for providing a common interface to various hardware platforms, which changes the virtual address mapping of the hardware MMU by executing commands and provides a common way to notify other kernel subsystems when a page fault occurs. This module is the physical basis for implementing virtual memory.
First, the memory management program through the mapping mechanism to map the logical address of the user program to the physical address, when the user program is running, if the virtual address of the program to find no corresponding physical memory, the request also requires 1, if there is free memory to allocate, The request allocates memory 2 (hence the memory allocation number second recycle), and the physical page being used is recorded in the page cache 3 (using the caching mechanism). If there is not enough memory to allocate, then call the swap mechanism, freeing up a portion of the memory 4,5. In addition, in the address map to find the physical page 8 through the TLB (transform fallback buffer), the switching mechanism also uses the Exchange cache 6, The page table is also modified to map the file address 7 after swapping the physical page contents to the swap file.
1.1 Memory address type and memory protection 1.1.1 Address type
Linux is a system that uses virtual memory that allows programs running on the system to allocate more space than available physical memory. Virtual memory can also use a number of tricks on the process address space, such as mapping device memory.
The Linux system uses several types of addresses, and the following is a list of the address type concepts used by Linux.
1) User virtual address
This address is the address used by the program for user space. Depending on the hardware architecture, the user address can be either 32-bit or 64-bit, and each process has its own independent virtual address space.
2) Physical Address
The address is the address of the physical memory used between the processor and the system memory. The physical address is 32 or 64 bits.
3) Bus Address
This address refers to the address of the bus register, which is used between the peripheral bus and the memory.
4) Kernel logical address
The kernel logical addresses make up the regular kernel address space, which maps most or all of the main memory and is considered a physical memory usage. In most architectures, the difference between a logical address and its associated physical address is only the offset of a constant. Logical addresses use a hardware-specific pointer size, so on a 32-bit system configured with a large amount of memory, only logical addresses can be used to address all physical memory. In the kernel, logical addresses are usually stored in variables such as unsigned long or void*. The memory returned by Kmalloc is the logical address.
Macros allow you to convert logical addresses to physical addresses, return physical addresses associated with them by defining the macro __pa () in <asm/page.h>, or use __VA () macros to map physical addresses back to logical addresses, but only for low-end memory pages.
5) Memory Virtual address
The memory allocated by the function Vmalloc is the virtual address, and the Kmap function also returns the virtual address. This address does not necessarily map directly to physical memory. It requires a series of processing, such as allocating memory, address translation, etc., to be associated with a logical address. Virtual addresses are usually stored in pointer variables.
6) Low-end memory and high-end memory
Both low-end memory and high-end memory refer to physical memory.
- Low-end memory. Low-end memory represents the physical memory that exists in the logical address of the memory space. Almost all of us encounter low-end memory.
- High-end memory. High-end memory is those that do not have logical addresses, and high-end memory is often a process page reserved for user space.
On i386 systems, the boundary between low-end memory and high-end memory is typically set to 1GB. It is the limit set by the kernel itself, which divides the 32-bit address space into kernel space and user space.
User space can access 4GB of virtual linear memory space. where 0 to 3GB of virtual memory address is the user space, the user process can directly access it. the virtual memory address from 3GB to 4GB is kernel space, and the code and data used for kernel access are shared by all processes and cannot be manipulated by the user.
All processes from 3GB to 4GB of virtual space are the same, with the same page catalog entries and the same page table, corresponding to the same physical memory segment. This allows the kernel-state process to share code snippets and data segments.
1.1.2 Memory Protection
1) protection between different tasks
On 80386, each task is placed at a different virtual address by assigning a different virtual-physical address translation mapping. On 30386, each task has its own Cong and page table.
The way to protect the operating system is to store the operating system in a common area of the virtual address space, and then assign each task a similar virtual address space in this region, and make the same virtual-physical address mapping. This part of the virtual address space that is common to each task is called the Global address space.
There is only one part of the virtual address space that the task occupies, that is, the part of the virtual address that is not shared by any other task, called the Local address space.
In different tasks, access to the same virtual address is actually converted to a different physical address. This allows the operating system to assign the same virtual address to the memory of each task and still ensure that the task is isolated. On the other hand, access to the same virtual address in the global address space is converted to the same physical address in all tasks, supporting common code and data sharing, for example, sharing the operating system.
2) Protection of the same task
Within a task, four levels of execution privilege are defined to restrict access to the middle of a task. At level 0 is the operating system kernel, processing I/O, memory management and other critical operations, at level 1 is the system call handler, the user program can invoke the procedure here to execute system calls, but only some specific and protected procedures can be called; Level 2 is the library process; The last user program runs on level 3, be protected to the minimum.
1.2 80386 section-page management mechanism
80386 to two-level virtual-Physical address translation, that is, the use of a segmentation mechanism and paging mechanism to achieve two-level address translation. The first level converts the virtual address containing the segment address and the offset within the segment into a linear address. The second level translates the linear address into a physical address.
The Linux kernel maintains a page table in physical memory for each process, the page table resides in memory, and cannot be swapped to disk. However, the kernel uses a separate page table to manage the page table map kernel segments, which are different from one page table per process in user space, and the page table belongs to the entire kernel and is independent of the currently running process.
1.3 Memory organization of the process 1.3.1 Memory management data structure
There are three important data structures in memory management Struc vm_area_struct,struct page and struct mm_struct, which are used to represent the memory usage of the process, as shown in relation to process 4.10.
Each process has a struct MM_STRUCT structure that describes the virtual memory for a process. The structure contains the process's page table and many other large amounts of information.
The VM_AREA_STRUCT structure describes a virtual memory address area for a process. A VMA (virtual memory area) is a part of the process virtual memory space that has the same rules for page fault handling, such as a shared library, run area, and so on, representing a separate contiguous address space for process space.
The page structure is used to describe a physical page in which each physical page has a page structure to protect the trace.
1.3.2 VMA display in the/proc file system
VMA is the acronym for Virtual Memory Aera, which is an imaginary region of RAM. In each process, it is generally divided into these areas of virtual memory: The code area of the program (that is, the text segment), and each type of data corresponds to a region that includes initialization data (data that has been explicitly assigned at execution), uninitialized data (BSS), program stack, and so on.
VMA in the program running, constantly by the application, clearing, searching, segmentation, fusion and other VMA management operations, all VMA exist a doubly linked list, in addition to the AVL tree management, to speed up the search.
The memory area of a process can be seen from/proc/pid/maps, where each item in the/proc/*/maps corresponds to a member of a vm_area_struct struct, and each field can be described in a single list.
For example:
The fields in each row are described below:
- Start-end start and end addresses for virtual memory regions
- Perm a bitmask of read, write, and execute licenses for the virtual memory area. The last character p means private, and s represents sharing.
- Offset of the virtual memory area in the mapped file (in pages).
- Major:minor the main device number and the secondary device number.
- Image mapped file name
The application of VMA is to create a paging or page-break operation when the page is wrong. All VMA of the process are connected in a sorted doubly linked list, arranged in descending order of the virtual address, and each VMA corresponds to an adjacent address space range.
1.4 Virtual Memory Management 1.4.1 Bulk Object Cache
The kernel provides kmalloc and kfree, allocates actual physical memory blocks known to real addresses, and provides Vmalloc and vfree for allocating and freeing the virtual memory used by the kernel. The memory returned by Kmalloc is more suitable for use by a device-like driver.
When a structure or table needs to establish a Bulk object instance, the usual Vmalloc function is applied.
Kmalloc The assigned address range is typically 3g~high_memory, so the physical address assigned is only one page_offset offset from the virtual address, and no page table needs to be modified for the address segment. And the assigned virtual logical address and physical address are contiguous.
Where Kmalloc is assigned a physical address, and the return is a virtual address (the physical address is converted to a virtual address by a certain offset).
The Vmalloc function allocates virtual space above 3g+high_memory+vmalloc_offset and is managed by the Vmlist list. 3GB is the address on which the kernel state accesses physical memory, and High_memory is the highest address of physical memory that is actually available on the computer. The Vmalloc_offset is a 8MB-length "isolation belt" that acts as a cross-border protection.
The virtual space management structure for the Vmalloc function assignment is listed below:
In MM/VMALLOC.C, there is a global variable in the Vmlist list that declares the struct vm_struct *vmlist, in the Vmlist list, there is a 4KB size "barrier" between each block of virtual memory to detect the out-of-bounds error of the access pointer. The address of the first node is Vmalloc_start.
function Vmalloc's function is to allocate contiguous virtual memory that matches the size of the page, but the corresponding physical memory still needs to be broken by a fault, and is allocated by the page fault service program, and the allocated physical pages are discontinuous. The function Vmalloc requests a chunk buffer, but the currently unused content is not called into physical memory.
The function Vmalloc is listed as follows:
Long size) { return __vmalloc (size,gfp_kernel| __gfp_highmem,page_kernel);}
The function __vmalloc allocates enough pages to match the size, mapping them into contiguous kernel virtual space, but the allocated memory blocks are not necessarily contiguous. The first step in the function is to look for a block of virtual memory (Get_vm_area (size) of the appropriate size in vmlist. The second step is to check if this virtual block is available (idle), set up the page directory, find the idle (within the virtual block map) assigned to the calling process (Get_free_page ()), and if the virtual block is unavailable, you must release the virtual block (vfree).
1.4.2 Memory Mapping
Introduction to Memory Mapping
When the executable is run, it is mapped into the virtual address space of the process, forming the VM_AREA_STRUCT structure list, and then part of the program is loaded into the physical memory by the operating system. This access to link an image to a process virtual address space is called a memory map. Through memory mapping, the contents of a file are directly linked to the virtual address space of the process.
As the VM_AREA_STRUCT structure is generated, the standard operating functions on the virtual memory areas described by these structures are also initialized by Linux. The work of converting between logical and physical addresses is done by the kernel and hardware memory management Unit (MMU), which is a part of the CPU. The kernel tells the MMU how to map logic closely to a particular physical page for each process, while the MMU completes the actual conversion work when the process presents a memory request. In order to reduce overhead, the most recently executed address translation results are stored within the conversion fallback cache (TLB) of the MMU. Linux does not explicitly manage TLB except because the operation of the kernel causes the TLB to be invalid and occasionally notifies the CPU.
The kernel maintains one or more arrays of struct page items that track all the physical memory on the system.
Some functions and macros can be used to convert between a struct page and a virtual address:
- struct page *virt_to_page (void *kaddr): This macro accepts a kernel logical address and returns the struct page pointer associated with it. Because it requires a logical address, it has no effect on the memory and high-end memory returned by Vmalloc.
- void *page_address (struct page *page): Returns the kernel virtual address of the page if the address exists. For high-end memory, the address exists only if the page has already been mapped.
When you move or modify a page table, you should hold the Page_table_lock spin lock.
Several files related to memory mapping in the MM directory, where the function of the main function do_mmap in the mmap.c file is to map the logical address in the file into a virtual linear address, that is, the logical address obtained from the file structure into the address required for the VM_AREA_STRUCT structure. The function of the main function sys_mremap in the Mremap.c file is to expand or shrink the existing virtual memory space. The main function function in the filemap.c file is to handle the memory map and also the cache, which is to map the linear address space to memory and modify the page cache. This section contains I/O operations that read and write from disk.
SYS_BRK system Call
SYS_BRK provides low-level operations for malloc and free functions that support the C language. C library function malloc through the system call SYS_BRK to the kernel to request a virtual address space VMA to establish the address map, VMA all set up with the memory page mapping, c library function free through this system call to tell the memory needs to reclaim the address space, cancel the established mapping. The SYS_BRK system call can operate on the heap size of the user process, allowing the heap to expand or shrink.
SYS_BRK calls the DO_BRK function, which is a simple do_mmap function that handles only anonymous mappings and maps all VMA to memory pages, unlike DO_MMAP functions that cause paging to be paged.
Mmap system Call
A process can call mmap () through the system to map the contents of an open file to its user space. It directly calls the Do_mmap () function to complete the mapping, in this function, the parameter file is a mapped file, the parameter addr is the mapped address, the parameter len is VMA length, the length prot specifies the access permission for the VMA segment, and the parameter flag is the attribute of the VMA segment.
1.4.3 locking and protection of virtual memory
Linux can lock or protect any part of a virtual memory segment. The virtual address of a process is locked, the essence of which is the Vm_flags attribute "or" upper vm_locked of the VMA segment. After the virtual storage and lock, it corresponds to the physical page memory, no longer be replaced by the page replacement program. This partially locked source is freed unless the process that calls Mlock terminates or calls exec to execute another program. A child process created through a fork () call cannot inherit a page that is locked by the parent process call Mlock. The lock operation has four system call functions: Mlock, Munlock, Mlockall, and Munlockall.
1.5 Physical Memory Management
NUMA (non-Uniform Memory access System) systems maintain a physically dispersed, logically identical memory pattern, and are the mainstream architecture of high-performance servers.
Overview of the 1.5.1 Partner system partner system
One of the most important things about Linux kernel memory management is how to avoid fragmentation when you frequently request to release memory. Linux uses a partner system to solve the problem of external fragmentation, using slab to solve internal fragmentation problems, where we discuss external fragmentation first. There are two ways to avoid external fragmentation: one is the allocation of non-contiguous memory as previously described, and the other is to use an efficient way to monitor memory so that it does not intercept a chunk of contiguous free memory when the kernel is requesting a small chunk of memory, thus guaranteeing the continuity and integrity of large chunks of memory. Obviously, the former can not be a common way to solve the problem, one to map the non-contiguous memory linear address space is limited, and each mapping to rewrite the kernel's page table, and then flush the TLB, which makes the allocation of the speed is greatly reduced, this is very often to apply for the memory of the kernel is obviously unbearable. So Linux uses the latter to solve the problem of external fragmentation, known as the partner system.
What is a partner system?
The purpose of the partner system is to satisfy the kernel's request for memory with the smallest memory block. At first, only one block, that is, the entire memory, if the size of 1M, and the minimum allowable block is 64K, then when we apply for a 200K size of memory, we must first divide the 1 m block into two equal portions, each 512K, the relationship between the two points is called a partner, Then divide the first 512K memory block into two equal halves, 256K, the first 256K memory block allocated to memory, this is an allocation process.
Allocation and recycling of physical pages
Linux uses partner algorithms to efficiently allocate and reclaim page blocks. When the page is allocated, the memory allocator looks for a free block in the Free_area array that is the same size as the request. The allocation algorithm first searches for a page that satisfies the request size. It starts by searching for free pages along the chain from the list field of the Free_area data structure. If the size of the free page is not requested in such a way, it searches for a block of memory twice the size of the request. This process will continue until free_area is searched or the memory blocks that meet the requirements are found.
If the found page is larger than the requested block, divide it into two blocks: one matches the request block and the other is free block. Each chunk size is a power of 2 n times. The free block is linked into a queue of the appropriate size, and another page block is assigned to the caller.
When a page is reclaimed, the memory allocator checks for the presence of contiguous or partner memory blocks of the same size. If so, they are combined to form a new free block of twice times the size of the original. After each combination, the code also checks to see if it can continue to merge into a larger page.
The _get_free_page function is the upper-level interface for requesting free pages. The Free_pages function frees the memory page.
Let's take a look at the process by which partner systems allocate and reclaim memory blocks.
1 initialization, the system has 1 m of contiguous memory, the minimum allowable memory block is 64K, the white part of the figure is the free memory block, the coloring of the representative allocated memory block.
2 program A to request a piece of memory size of 34K, corresponding order 0, that is, 2^0=1 a minimum memory block
2.1 Memory block for order 0 (64K) does not exist in the system, so the Order 4 (1M) memory block splits into two order 3 memory blocks (512K)
2.2 There is still no memory block for order 0, so the memory block of order 3 splits into two order 2 blocks of memory (256K)
2.3 There is still no memory block for order 0, so the memory block of order 2 splits into two order 1 blocks of memory (128K)
2.4 There is still no memory block for order 0, so the memory block of order 1 splits into two order 0 blocks of memory (64K)
2.5 The memory block of order 0 is found, and one of them is assigned to program A, and now the partner system memory is an order 0 block of memory, an order
1 memory block, an order 2 memory block, and an order 3 block of memory
3 Program B applies for a size of 66K of memory, the corresponding order is 1, that is, 2^1=2 a minimum memory block, due to the existence of an order 1 in the system
Storage block, so it is used directly to assign
4 program C to apply for a size of 35K of memory, the corresponding order is 0, the same as the system has an order 0 of the memory block, directly used to divide
With
5 Program D applies for a piece of memory size 67K, corresponding to the order 1
5.1 The memory block of order 1 does not exist in the system, so the memory block of order 2 is split into two blocks of order 1
5.2 Find the memory block of order 1 for allocation
6 Program B releases the memory it has requested, which is an order 1 block of memory
7 Program D releases the memory it has requested
7.1 An order 1 memory block back to the memory
7.2 Because the memory block's partner is also idle, the two order 1 blocks of memory are combined into a memory block of order 2
8 program A frees up the memory it has requested, which is an order 0 block of memory
9 program C frees up the memory it's requesting
9.1 A memory block of order 0 is freed
9.2 Two order 0 partner blocks are idle, merge, generate an order 1 block of memory m
9.3 Two order 1 partner blocks are idle, merge, generate an order 2 block of memory
9.4 Two order 2 partner blocks are idle, merge, generate an order 3 block of memory
9.5 Two order 3 partner blocks are idle, merge, generate an order 4 block of memory
1.5.2 Cache and Slab
Structure of the cache
All instances of an object exist in the same buffer. Different objects, placed in different buffers. Each buffer has several slab, arranged in full, half full, empty order. The entire kernel memory can be seen as being organized in this buffer.
The size of the assignable object is defined in the global variable malloc_size[] that is cached in the kernel, and can only be allocated to objects of the cache_sizes structure description size Each time the object is allocated.
Slab structure
The slab block is the interface for kernel memory allocation and page-level allocation. All instances of an object are in the same buffer, and each buffer has several slab blocks, in full, half full, and idle order. The size of each slab block is an integer multiple of the page size and contains several objects.
The slab algorithm is based on object caching, which is the invariant part of preserving object initialization state. When creating a new object, if there is an idle object position in the buffer, the object is obtained without initialization. When an object is disposed, it is simply marked as free in the cache, without destruction. Only the slab algorithm frees a portion of unused cache space when system resources are low. This reduces the time-consuming process of initialization and destruction of a large number of commonly used objects.
1.5.3 Swap Space
The memory management system needs to dump the temporarily unused memory data into the external memory, and Linux saves the swapped pages in two ways: one that uses the entire device, such as a partition of a hard disk, called a transform device, and a fixed-length file in a file system called a swap file. They are collectively referred to as swap spaces. The internal format of the two modes of exchange is consistent. The first 4096 bytes are a bitmap ending with the word Fukou string "Swap-space". Each bit of the bitmap corresponds to a page of swap space, which indicates that the corresponding page can be used for the page break operation. Every 4096 bytes is the space to actually store the swap out page. In this way, each swap space can hold up to (4096-10) *8-1=32687 of pages.
Switching devices are much more effective than exchanging files. In a switching device, blocks of data belonging to the same page are always stored continuously. Once the first block is determined, subsequent blocks of data can be read or written sequentially. In a swap file, the data belonging to the same page is logically contiguous (in the bitmap of the swap file), but the actual location of the block may be fragmented and needs to be retrieved through the inode of the swap file, which is determined by the file system that owns the swap file.
The swap file is on a physical disk, so all the swap pages are stored in clusters, and the pages in each cluster are kept in succession.
1.5.4 page mechanism
Page fault errors occur for three reasons:
- The program has an error, the virtual memory is not valid (not yet mapped to the disk), and Linux sends the SIGSEGV signal to the process and terminates the process;
- Virtual memory is valid, but all its corresponding pages are not currently in physical memory (but on disk), that is, a page fault, when the operating system must load it into physical memory from the disk image (the page is not assigned or belongs to a shared library) or the swap file (this page is swapped out).
- In the final case, the virtual address to be accessed is write-access (write to the read-only region), which is a protection error.
For a valid virtual address, that is, the assigned virtual address, if the "page fault" error, Linux must distinguish the location of the pages, that is, whether the page is in a swap file, or in an executable image. Linux differentiates the location of the page from the information in the page table entry. If the page table entry for the page is not valid, but not NULL, the page is in the swap file, the operating system is loading the page from the interchange file, and if none is present, the virtual address is mapped to physical memory.
1.5.5 Daemon Process KSWAPD
When there is insufficient physical memory, the Linux memory management subsystem needs to release some of the physical memory pages. This task is done by the kernel's switching daemon KSWAPD, which is actually a kernel thread that starts during kernel initialization and runs periodically. Its task is to ensure that the system has enough free pages, so that the memory management subsystem can effectively run.
1.5.6 Memory management-related caches
Linux Memory Management Overview