Reprint from http://www.itmian4.com/forum.php? MoD = viewthread & tid = 2867 #3352593-tsina-1-77660-27781be5b98356c3a2bccaf9a6db7b15
1. What kernel locks does Linux have?
The synchronization mechanism of Linux has been continuously improved since 2.0 to 2.6. From the initial atomic operation to the subsequent semaphores, from the large kernel lock to today's spin lock. The development of these synchronization mechanisms is accompanied by the transition from a single processor to a symmetric multi-processor. Linux kernel locks are mainly spin locks and semaphores.
A spin lock can only be held by one executable thread at most. If an execution thread attempts to request a spin lock that has been used (held, then this thread will keep busy loop-Rotating-waiting for the lock to be available again. If the lock is not in contention, the execution thread requesting it can immediately obtain it and continue. The spin lock prevents more than one execution thread from entering the critical section at any time.
Semaphores in Linux are sleep locks. If a task tries to obtain an held semaphore, the semaphore will push it into the waiting queue and then sleep it. In this case, the processor is free to execute other code. When the process holding the semaphore releases the semaphore, a task in the waiting queue will be awakened to obtain the semaphore.
The sleep feature of semaphores makes the semaphores suitable for cases where the lock is held for a long time. They can only be used in the process context because the interrupt context cannot be scheduled; in addition, when the Code holds a semaphore, it cannot hold a spin lock.
It is accompanied by an excessive amount of resources from non-preemptible kernel to preemptible kernel. The locking mechanism of Linux becomes more and more effective and complex.
Synchronization mechanisms in Linux kernel: Atomic operations, semaphores, read/write semaphores, and spin-lock APIs. Other synchronization mechanisms include, including large kernel locks, read/write locks, large reader locks, RCU (read-copy update, as the name suggests, is read-copy modification), and sequential locks.
2. What is the meaning of the user mode and Kernel Mode in Linux?
MS-DOS and other operating systems run in a single CPU mode, but some Unix-like operating systems use dual mode, can effectively achieve time sharing. On Linux machines, the CPU is either in trusted kernel mode or restricted user mode. Except that the kernel itself is in kernel mode, all user processes are running in user mode.
Kernel-mode code can access all processor instruction sets and all memory and I/O space without restriction. If a user-mode process needs this privilege, it must send a request to the device driver or other kernel-mode code through a system call. In addition, user-mode code can contain missing pages, while kernel-mode code cannot.
In kernels 2.4 and earlier, user-mode processes can be switched out of context and preemptible by other processes. The kernel mode code can always exclusively occupy the CPU unless the following two conditions occur:
(1) it voluntarily abandons the CPU;
(2) An interruption or exception occurs.
2.6 kernel preemption is introduced, and most kernel-mode code can also be preemptible.
3. How to Apply for a large kernel memory?
In the Linux kernel environment, the success rate of applying for a large block of memory decreases with the increase of the system running time. Although you can apply for a physically discontinuous memory with a continuous virtual address through vmalloc series calls, however, it is inefficient to use and vmalloc has limited memory address space on 32-bit systems. Therefore, it is generally recommended to apply for large memory segments during the system startup phase, but the probability of success is only relatively high, rather than 100%. If the program is really concerned about the success of this application, you can only use boot memory ). The following is a sample code for applying for and exporting startup memory:
Void * x_bootmem = NULL;
Export_symbol (x_bootmem );
Unsigned long x_bootmem_size = 0;
Export_symbol (x_bootmem_size );
Static int _ init x_bootmem_setup (char * Str)
{
X_bootmem_size = memparse (STR, & Str );
X_bootmem = alloc_bootmem (x_bootmem_size );
Printk ("Reserved % lu bytes from % P for X \ n", x_bootmem_size, x_bootmem );
Return 1;
}
_ Setup ("X-bootmem =", x_bootmem_setup );
It can be seen that its application is relatively simple, but its advantages and disadvantages are always symbiotic, And it inevitably has its own restrictions:
The memory application code can only be connected to the kernel and cannot be used in the module.
The Applied memory is not used and counted by the page distributor and slab distributor, that is, it is outside the visible memory of the system, even if you release it somewhere in the future.
Generally, users only apply for a large block of memory. If you need to implement complex memory management on it, you need to implement it by yourself.
When memory allocation is not allowed to fail, starting the memory reserved memory space will be our only choice.
4. What are the main methods for inter-process communication?
(1) pipeline (PIPE): the pipeline can be used for communications between kinship processes, allowing one process to communicate with another process with which it has a common ancestor.
(2) Named Pipe (Named Pipe): The Named Pipe overcomes the restriction that the pipe has no name. Therefore, in addition to the functions of the pipe, it also allows communication between unrelated processes. The named pipe has a corresponding file name in the file system. The named pipe is created by running the mkfifo command or by calling the mkfifo command.
(3) signal: a signal is a complex communication method used to notify the receiving process of an event, except for inter-process communication, the process can also send signals to the process itself. In addition to supporting the sigal function of the early UNIX signal semantics, Linux also supports the sigaction function of the posix.1 standard. (In fact, this function is based on BSD, BSD re-implements the signal function with the sigaction function in order to implement a reliable signal mechanism and unify external interfaces ).
(4) Message Queue: A chain table of messages, including the POSIX Message Queue System V message queue. A process with sufficient permissions can add messages to the queue. A process with the read permission can read messages from the queue. Message Queue overcomes the limitations of insufficient signal carrying information, and pipelines can only bear unformatted byte streams and limited buffer size.
(5) shared memory: Enables multiple processes to access the same memory space, which is the fastest available IPC form. It is designed to reduce the running efficiency of other communication mechanisms. It is often used in conjunction with other communication mechanisms, such as semaphores, to achieve synchronization and mutual exclusion between processes.
(6) semaphores (semaphore): used for synchronization between processes and between different threads of the same process.
(7) socket: a more general inter-process communication mechanism that can be used for inter-process communication between different machines. It was initially developed by the BSD branch of the UNIX system, but now it can be transplanted to other UNIX-like systems: both Linux and System V variants support sockets.
5. What are the functions used to apply for kernel memory through the partner system?
Zone-based buddy system is implemented in the physical Page Management ). Use a standalone buddy system to manage memory in different zones and monitor idle pages independently. Corresponding interfaces such as alloc_pages (gfp_mask, order) and _ get_free_pages (gfp_mask, order.
Additional knowledge:
1. Principles
The Linux kernel uses an internal storage paging model that is applicable to both 32-bit and 64-bit systems. For 32-bit systems, two-level page tables are sufficient, while in x86_64 systems, four-level page table is used.
* Page Global Directory)
* Page upper directory)
* Page middle directory)
* Page table)
The global directory on the page contains the addresses of the parent directories on several pages. The parent directory on the page contains the addresses of the intermediate directories on several pages in sequence, and the central directory on the page contains the addresses of several pages and tables, each page table item points to a page box. Linux uses a 4 kb page as the standard memory allocation unit.
Multi-level paging directory structure
1. Partner System Algorithm
In practical applications, you often need to allocate a set of consecutive page boxes, and frequently apply for and release consecutive page boxes of different sizes, it is inevitable that many idle page boxes are scattered in the memory blocks of the allocated page box. In this way, even if these page boxes are idle, it is difficult for other applications that need to allocate consecutive page boxes to meet the requirements.
To avoid this problem, the kernel introduces the buddy system algorithm ). Group all idle page boxes into 11 linked lists, each of which contains page blocks with a size of 1, 2, 4, 8, 16, 32, 64,128,256,512, and 1024 consecutive page boxes. You can apply for a maximum of 1024 consecutive page boxes, corresponding to the continuous memory of 4 MB. The physical address of the first page of each page is an integer multiple of the size of the page.
If you want to apply for a block with 256 page frames, first search for the idle block from the linked list with 256 page frames. If not, search for it in the linked list with 512 page frames, if found, the page box is divided into two 256 page boxes, one is allocated to the application, and the other is moved to the linked list of the 256 page boxes. If no idle blocks exist in the linked list of the 512 page boxes, continue to search for the linked list of the 1024 page boxes. If not, an error is returned.
When a page block is released, two consecutive page blocks are automatically merged into a large page block.
1.2.slab distributor
The slab splitter is derived from the Solaris 2.4 allocation algorithm and works on the physical memory page box distributor to manage the cache of objects of specific sizes for fast and efficient memory allocation.
The slab splitter creates a separate buffer for each kernel object used. The Linux kernel already uses the physical memory management page of the partner system. Therefore, the slab distributor works directly on the partner system. Each buffer is composed of multiple slab, and each slab is a set of consecutive physical memory page boxes, which are divided into a fixed number of objects. By default, slab can contain up to 1024 page boxes, depending on the object size. Due to alignment and other requirements, the memory allocated to objects in slab may be larger than the actual size of the objects required by users, which may cause a certain amount of memory waste.
2. Common memory allocation functions
2. 1. _ get_free_pages
Unsigned long _ get_free_pages (gfp_t gfp_mask, unsigned int order)
The _ get_free_pages function is the most primitive memory allocation method. The original page box is obtained directly from the partner system, and the returned value is the starting address of the first page box. _ Get_free_pages encapsulates the alloc_pages function in implementation. From code analysis, the alloc_pages function will assign a length of 1 <
2.2.kmem _ cache_alloc
Struct kmem_cache * kmem_cache_create (const char * Name, size_t size,
Size_t align, unsigned long flags,
Void (* ctor) (void *, struct kmem_cache *, unsigned long ),
Void (* dtor) (void *, struct kmem_cache *, unsigned long ))
Void * kmem_cache_alloc (struct kmem_cache * C, gfp_t flags)
Kmem_cache_create/kmem_cache_alloc is a memory allocation method based on the slab distributor. It is suitable for repeatedly allocating and releasing memory blocks of the same size. First, use kmem_cache_create to create a cache region, and then use kmem_cache_alloc to obtain a new memory block from the cache region. The maximum memory that kmem_cache_alloc can allocate at a time is determined by mm/slab. the max_obj_order macro definition in the C file. In the default 2.6.18 kernel version, this macro is defined as 5. Therefore, a maximum of 1 <5*4 kb, that is, kb, can be applied at a time.
Continuous physical memory. By analyzing the kernel source code, it is found that when the size parameter of the kmem_cache_create function is greater than kb, a bug () is called (). The test results verify the analysis results. The kernel crashes when the kmem_cache_create allocation exceeds KB of memory.
2.3.kmalloc
Void * kmalloc (size_t size, gfp_t flags)
Kmalloc is the most common memory allocation method in the kernel. It is implemented by calling the kmem_cache_alloc function. The maximum memory size that kmalloc can apply for at a time is determined by the content of include/Linux/kmalloc_size.h. In the default 2.6.18 kernel version, kmalloc can apply for a maximum of 131702b continuous physical memory of kb. Test results show that if you try to use the kmalloc function to allocate memory greater than kb, the compilation fails.
2.4.vmalloc
Void * vmalloc (unsigned Long SIZE)
The previous memory allocation methods are physically consecutive, ensuring a low average access time. However, in some scenarios, requests to the memory area are not very frequent, and a high Memory Access time is acceptable. In this case, a linear continuous and physically discontinuous address can be allocated, the advantage is that a large memory size can be allocated at a time. Figure 3-1 shows the address range of the memory used by vmalloc. Vmalloc has no explicit limit on the memory size that can be allocated at a time. Exercise caution when using the vmalloc function for performance considerations. During the test, a maximum of 1 GB space can be allocated at a time.
Linux Kernel Memory Distribution
2.5.dma _ alloc_coherent
Void * dma_alloc_coherent (struct device * Dev, size_t size,
Ma_addr_t * dma_handle, gfp_t green code)
DMA is a hardware mechanism that allows direct transmission of Io data between the peripheral device and the primary memory without the involvement of the CPU. The DMA mechanism can greatly improve the throughput of communication with the device. The DMA Operation involves high-speed CPU slow-down and corresponding memory data consistency. The data consistency must be ensured. In the x86_64 architecture, the hardware has successfully solved this problem, the implementation of the dma_alloc_coherent and _ get_free_pages functions is not much different. The former actually calls the _ alloc_pages function to allocate memory. Therefore, the size limit for memory allocation at a time is the same as that for the latter. The memory allocated by _ get_free_pages can also be used for DMA operations. Test results show that dma_alloc_coherent
The maximum memory that can be allocated at a time is 4 MB.
2.6.ioremap
Void * ioremap (unsigned long offset, unsigned Long SIZE)
Ioremap is a more direct memory "Allocation" method. When used, it directly specifies the physical start address and the size of the memory to be allocated, and then maps the physical address to the kernel address space. The physical address space used by ioremap is determined in advance, which is not the same as the above memory allocation methods, rather than allocating a new physical memory. Ioremap is mostly used by device drivers to allow the CPU to directly access the I/O space of external devices. The memory that ioremap can map is determined by the original physical memory space, so no tests are conducted.
2.7.boot memory
If you want to allocate a large amount of continuous physical memory, none of the above allocation functions can meet the requirements, you can only reserve part of the memory in the Linux Kernel boot phase in a special way.
2.7.1. allocate memory during kernel boot
Void * alloc_bootmem (unsigned Long SIZE)
You can bypass the partner system during the Linux kernel boot process to allocate large memory blocks. You can use the alloc_bootmem function to apply for memory of the specified size before calling the mem_init function during Linux kernel boot. If you need to call this memory elsewhere, you can export the first address of the memory returned by alloc_bootmem through export_symbol, and then you can use this memory. The disadvantage of this memory allocation method is that the Code for applying for memory must be linked to the Code in the kernel before it can be used. Therefore, the kernel must be re-compiled, and the memory management system cannot see this part of the memory, you need to manage it on your own. The test results show that, after the kernel is re-compiled and restarted, the allocated memory block can be accessed during boot.
2.7.2. reserve the top memory through kernel boot parameters
During Linux kernel boot, the input parameter "mem = size" is used to reserve the top memory range. For example, if the system has 800000 MB memory, the parameter "mem = 248m" reserves 8 MB memory at the top. after entering the system, you can call ioremap (0xf800000, 0 x) to apply for this memory.
3. Comparison of several allocation functions
Maximum memory allocation principle others
_ Get_free_pages: directly operate on the page box. 4 MB is suitable for allocating a large amount of continuous physical memory.
Kmem_cache_alloc is KB Based on the slab mechanism. It is suitable for frequent applications to release memory blocks of the same size.
Kmalloc is the most common distribution method for kb Based on kmem_cache_alloc. It can be used when the memory size is smaller than the page size.
Vmalloc establishes non-consecutive physical memory-to-virtual address ing physically discontinuous, suitable for scenarios where large memory is needed, but there is no requirement for address continuity
Dma_alloc_coherent implements 4 MB Based on _ alloc_pages for DMA Operation
Ioremap maps known physical addresses to virtual addresses. It is applicable to scenarios where physical addresses are known, such as device drivers.
Alloc_bootmem reserves a portion of memory when starting the kernel. The kernel cannot be seen as smaller than the physical memory size, and the memory management requirements are high.