Storage Management (2)

Source: Internet
Author: User

In Linux, the partner algorithm divides all idle pages into 10 block groups. The size of blocks in each group is the power of 2. For example, the block size in group 0th is 20 (1 page), the block size in group 1st is 21 (2 pages), and the block size in group 9th is 29 (512 pages ). That is to say, the block sizes in each group are the same, and the blocks of the same size form a linked list.

The partner algorithm combines two blocks that meet the preceding conditions into one. This algorithm is an iterative algorithm. If the merged blocks can be merged with adjacent blocks, then the algorithm is merged.

A simple example is provided to illustrate how the algorithm works.

Assume that the size of the allocated block is 128 pages (a block composed of multiple pages is called a page block ). This algorithm first searches in the linked list with a block size of 128 pages to see if such an idle block exists. If yes, it will be allocated directly. If no, the algorithm will find the next larger block. Specifically, it is to find an idle block in the linked list with a block size of 256 pages. If such an idle block exists, the kernel divides the 256 pages into two equal parts, one for distribution, and the other for insertion into the linked list with the block size of 128 pages. If no idle page block is found in the linked list with a block size of 256 pages, you can continue to find a larger block, that is, the block of the 512 pages. If such a block exists, the kernel splits 512 pages from the 128 page blocks to meet the request, then, 384 pages are taken from 256 pages and inserted into the linked list with a block size of 256 pages. Insert the remaining 128 pages into the linked list with a block size of 128 pages. If there are no idle blocks in the linked list of the 512 pages, the algorithm will discard the allocation and send an error signal.

The inverse process of the above process is the block release process, which is also the reason for the algorithm name. Two blocks that meet the following conditions are called partners:

The two blocks are of the same size;

The physical addresses of the two blocks are consecutive.

The partner algorithm combines two blocks that meet the preceding conditions into one. This algorithm is an iterative algorithm. If the merged blocks can be merged with adjacent blocks, the algorithm will continue to merge.

 

Linux uses partner algorithms to manage and allocate pages. However, due to hardware reasons, different regions in the memory have different features. There are two main problems: some hardware can only use some specific memory addresses to execute DMA; Some architectures some memory cannot be permanently mapped to the kernel space. Therefore, some memory must be allocated from a specific region and cannot be managed by a single partner system. To distinguish these memory areas, Linux uses three zones, each of which is managed by a partner system. The zone_dma contains the memory that can be used to perform DMA operations; zone_normal contains the memory area that can be normally mapped to the virtual address; zone_highmem contains the memory area that cannot be permanently mapped to the kernel address space. The division of these regions is related to the specific architecture. For example, zone_normal in some architectures overwrites all the memory areas, and the other two zones are empty. In x86 systems, zone_dma is 0 ~ 16 MB memory range, zone_highmem contains all physical memory above MB, zone_normal covers the area. These regions are managed separately, and different tasks are allocated from different regions. For example, a DMA task can only allocate memory from zone_dma, common memory requests are allocated from zone_normal, zone_highmem, and zone_dma. In the Allocator's view, these three zones are only three different memory pool objects, which can be processed in the same way. The system requires a large number of physical pages. When the program image is loaded into the memory, the operating system needs to allocate pages. When the program is executed and uninstalled, the operating system needs to release these pages. In addition, physical pages are also required to store the relevant data structure (such as the page table itself. This mechanism and data structure for allocation and recovery is very important for maintaining the efficiency of the virtual memory subsystem. All physical pages in the system are described using the page data structure. Each room also corresponds to a page variable. An array is formed by a set of all page variables of a zone. The zone_mem_map member pointer of the zone points to the starting address of the array. The page group Initialization is completed at system startup. The page distributor algorithm is based on the partner system. The partner system organizes the memory area as a page-based block. N is called the "level" of the block ", blocks with the same level are connected together with linked lists. A level must be specified for each allocation, and the block size is measured in units of this level. During allocation, search from level n to the maximum level until a non-empty block is found. If the non-empty block level is not N, split it into two parts, one of which is placed in the idle block of the corresponding level, and the other is split if it is not level n, until the n-level block is returned. When recycling, first calculate the partner of the recycled block, and then check whether your partner is in the idle chain with a level of N. If the block is found, merge it with the partner (the partner is deleted from the idle chain and the entire "current block" location), and then n = n + 1, continue the merge process. When the partner is not in the idle chain, the merge process ends. The slab splitter Linux kernel has many requirements for dynamic memory allocation, and the object size is also uneven. the Linux Kernel provides the slab layer and plays the role of the cache layer of the general data structure. The slab layer groups different caches based on object types. Each cache stores different types of objects. For example, one cache is used to store task_struct, and the other one Stores inode, these caches contain several slabs, and slab consists of multiple active physical consecutive pages. For general data structures, each slab has only one page. Each slab contains some object members, that is, the managed data structure. The system obtains the object from slab when allocating objects. First, allocate a full slab from the cache. If no slab is available, allocate it from an empty slab. If no slab is available, it is allocated from the empty slab. If there is no slab, create a new slab for allocation. Because each slab is a cache block containing the same object, it makes it easier and faster to allocate and release objects. In addition, because each object is almost allocated and initialized when it is released, it can save a lot of initialization time. For example, to allocate an inode variable, you first need a memory of malloc (sizeof (inode) size, and then initialize the data member in the inode, after use, the content in the memory after free exists in the allocated free space is similar to that in the initialization. For example, the reference count of inode must be reduced to zero. Many data structures in the kernel are free after they are restored to the initial state. For example, a mutexlock is in the unlock state during initialization and release. Therefore, as long as the object is put in a valid state for a long time during cache initialization, the fields of the objects allocated each time in the future must be determined, so that repeated Initialization is unnecessary, which can save a lot of costs. In Linux, kmem_cache_create has a parameter ctor initialization function, which can be used for this purpose, but Linux does not seem to use this feature of Slab (because it has called ctor function ). The kmalloc core kmalloc memory allocation function is similar to the application-layer malloc function, but this function runs very fast unless it is blocked. The memory space allocated to the kmalloc function is still stored in the original data area. The system physical memory is managed by the kernel, and the physical memory can only be allocated by page size. This requires a page-oriented distribution technology to achieve the maximum flexibility in computer memory management. The simple linear allocation technology similar to the malloc function is no longer effective. In a page-oriented system like the Linux kernel, it is difficult to maintain the memory if the linear allocation policy is used. Empty processing will soon become a problem, which will lead to a waste of memory and reduce system performance. In Linux, you need to maintain the page pool to handle the allocation requirements of kmalloc, so that the page can be easily put in or out of the page pool. To meet the memory allocation request that exceeds the page_size byte size, the mm/slab. c file maintains the cluster list on the page. Each page cluster stores several consecutive pages for DMA allocation. The final solution of the allocation policy used by Linux is that the kernel can only allocate some predefined fixed-size byte arrays. If you apply for any size of memory space, the system may allocate more space. The predefined memory size is generally "slightly smaller than a power of 2" (and the memory size managed by the system in the update Implementation is exactly a power of 2 ). If you can remember this, you can use the memory more effectively. For example, if you need a buffer around B, you 'd better apply for B instead of 2048b. Applying for a memory space that happens to be a power of 2 is the worst case-the kernel will allocate two times the size of the applied memory. In general, Linux always tries its best to map all physical memory to the kernel address space during initialization. If the kernel address space starts at 0xc0000000 and the virtual address space reserved for vmalloc is 128 MB, a maximum of 1 GB-128 MB physical memory can be directly mapped to the kernel space, the kernel can be accessed directly. If there is more physical memory, it is called high-end memory. The kernel cannot be accessed directly. It can only be accessed after the page table ing is modified. Memory partitions can make kernel page allocation more reasonable. When the system's physical memory is larger than 1 GB, the kernel cannot pre-map all the physical memory to the kernel space, resulting in high-end memory, high-end memory is most suitable for ing to user process space. The pre- ing part can be directly used in the kernel buffer zone. There is a small block of memory available for DMA operations, which is left for DMA Operation allocation and is generally not easily allocated. Memory partitions can also adapt to discontinuous physical memory distribution, and are the basis of a non-consistent Memory Access System (NUMA.

Hardware tends to use multiple system buses, each of which serves a group of processors. Each group of processors has its own memory and may have its own I/O channels. However, each CPU can access the memory associated with other groups in the same way. Each group is called a "NUMA node ". The number of CPUs in the NUMA node depends on the hardware supplier. Accessing local memory is faster than Accessing memory associated with other NUMA nodes. This is the origin of the "non-consistent memory access architecture" name.

On 32-bit machines, page tables can only be stored in low-end memory. The low-end memory is limited to only the first 896 MB of physical memory, while meeting most of the remaining kernel requirements. When applications use a large number of processes and map a large amount of memory, the low-end memory may soon be insufficient. There is a configuration option in the 2.6 kernel called highmem PTE, so that page table entries can be stored in high-end memory, releasing more low-end memory areas for other kernel data structures that must be placed here, the process of using these page table entries will be a little slower. However, for systems with a large number of processes running, you can store page tables in high-end memory to squeeze out more memory in the low-end memory area. When applying for and releasing virtual memory and releasing small and continuous memory space, use the kmalloc and kfree functions to allocate in the physical memory, you can use the vmalloc function when using a large memory space. The memory space applied by the vmalloc function is continuous in the virtual memory. When they are mapped to the physical memory, discontinuous physical pages can be used, in addition, only the accessed part is placed on the physical page. Vmalloc although this section of the region may be physically discontinuous (each page to be accessed must independently call the function _ get_free_page), the kernel considers them sequential on the address. The allocated memory space is mapped to the kernel data segment, and the user space is invisible, which is different from other allocation technologies. If an error occurs in vmalloc, 0 (null address) is returned. If the request succeeds, a pointer pointing to a linear address space with a size is returned. The memory allocated by the vmalloc function in the core is supported by the linked list of the vm_struct structure. Unlike other memory allocation functions, vmalloc returns a very "high" address value-these addresses are higher than the top of the physical memory. Because vmalloc allows access to the allocated page with a continuous "high" address after adjusting the page table, the processor can access the returned memory area. The inner nuclear energy uses the address returned by vmalloc like other addresses, but the address used in the program is different from the address on the address bus. Vmalloc-allocated addresses cannot be used outside the microprocessor, because they are meaningful only on the processor's paging unit. When the driver needs a real physical address (such as the DMA address used by the peripherals to drive the system bus), such an address cannot be allocated through the vmalloc function. The correct use of the vmalloc function is to allocate a large contiguous area of memory for caching for the software. Note that the overhead of vmalloc is greater than that of _ get_free_page, because it processes and obtains the memory and creates a page table. Therefore, it is not worthwhile to use the vmalloc function to allocate only one page of memory space. The kernel Virtual Memory allocated by vmalloc and the kernel Logical Memory allocated by kmalloc/_ get_free_page are in different intervals and do not overlap. Because the kernel space is managed by partitions, it performs its respective duties. The user space is allocated to 0 ~ Between 3 GB, 3 GB is followed by the physical memory ing interval, and then the address space for vmalloc memory allocation starting with vmalloc_start. An example of the vmalloc function is called by the create_module system. It uses the vmalloc function to obtain the memory space required by the created module. After insmod calls the code of the relocation module, it will call the memcpy_fromfs function to copy the module itself into the allocated space. The vfree function is used to release the memory space allocated with vmalloc. This is like the kfree function to release the memory space allocated by the kmalloc function. Like vmalloc, ioremap also creates a new page table. However, unlike vmalloc, ioremap does not actually allocate memory. The returned value of ioremap is a virtual address that can be used to access the specified physical memory area, vfree is called to release the obtained virtual address. Ioremap is used to map high-memory PCI buffer to user space. For example, if the frame buffer of the VGA device is mapped to the address 0xf0000000 (a typical value), ioremap can create a correct page table so that the processor can access it. The page table created during system initialization is only used to access the memory area lower than the physical address space. The initialization process of the system does not detect the PCI buffer, but is managed by the drivers themselves. If you want drivers to be ported between different platforms, be careful when using ioremap. On Some platforms, you cannot directly map the PCI memory area to the IP address space of the processor, such as alpha. In this case, the reeadb function or some other I/O functions should be used instead of accessing the re ing area as in the normal memory area. These functions can be transplanted between different platforms. Vmalloc and ioremap functions have no limit on the size of memory space that can be allocated. However, to detect some errors made by programmers, vmalloc does not allow allocation of memory space that exceeds the physical memory size. However, when the vmalloc function requests too much memory space, the same problems may occur when calling the kmalloc function. The ioremap and vmalloc functions are page-oriented (they both modify the page table). Therefore, the allocated or released memory space is actually raised to the nearest page boundary. Moreover, the ioremap function does not consider how to remap physical addresses that are not page boundaries.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.