Linux Kernel Learning Summary

Source: Internet
Author: User

Tags: Linux

Divided into two parts: memory and process.


1) Check memory and memory allocation for several commands:

A) Nmon: graphical interface, not only to look at the memory, but also to see the CPU, network, kernel, disk information, memory can see Swap,slab, page Table usage.

b) Top=cat/proc/pid/statm, it is more important that the VIRT/RES/SHR three values, respectively, representing virtual memory, physical memory, shared memory, note: Top display is in megabytes, STATM is the number of pages, to be multiplied by 4k.

c) Pmap <pid>: The memory page address that is displayed to the PID allocation, should be the physical memory address? and the corresponding address to open the file.

D) strace

e) pstree-p, Top-h-P, the former view the child process, the latter with the H view thread, but sometimes the information output is the same, not very clear, you can use the Pmap to look at, personally think if the address is a child process, the difference between the child process and the thread is the thread sharing the memory space of the parent process, There is no need to do interprocess communication in any other way.

f) Pstack-p <pid> View the memory address and stack allocated by the process (function call)

g) Slaptop, view the contents of the Slap cache.

h) about the last buffer and cache of the first line of the free command, in fact, the Bufer and the cache of the current memory are not written to the disk, the real free and used time to add and subtract the two, because the estimated Buffer/cache can be released quickly,

The first line of free used, which is the used of physical memory, contains the following cache/buffer, only to predict that they will be released.

2) Main source of memory consumption (B,c can be viewed by Nmon command, slap is more intuitive with slaptop)

A. Process consumption

B. Slab: The kernel module allocates resources, in order to improve efficiency and resource utilization, are allocated through the slab, the kernel to the 0~896m address object of the regular operation has a pool, slab used to cache the pool

C. Page table: Virtual memory address maps a page list of physical memory addresses that remain in memory

3) Several memory-related caches in Linux systems

First, a program after the system calls the exit () function to free some memory space, is not immediately released back to the disk, so wasteful, will be left in memory as the cache, still consumes memory, so the memory will not be enough for a long time.

A. Bufer, full name Bufer cache, is the,cpu->bufer-> disk used to write the buffer block device

B. Cache, paging cache, the above text description of the caches, used to read the cached files, Cpu->cache, did not take the disk to find.

The buffer/cache shown in the free command

C. There is also a directory of Dentry and Inode cache, this is the concept of VFS, such as a path/home/xxx/yyy, this is the dentry cache, Dentry: the inode number and file name, Dentry also save the directory and its child object relationship, The variable used for the file system. Dentry also plays the role of caching, caching the most frequently used files for faster access.

How to release the cache: Echo 1/2/3 >/proc/sys/vm/drop_cache, 1 is drap paging cache, 2 is inode and Dentry, and three is all released

4) virtual memory vs. physical memory

32-bit system by default each process can allocate 4G of virtual memory, to achieve the purpose of process isolation, paging management to achieve efficient allocation, fixed physical address allocation purposes, in this 4G space, 0~3g to user space, specifically to the user program use, 3~4g to the kernel use, if the physical memory is greater than 4G, One of the lowest 0~16m maps to the kernel 1G 16M, is ZONE_DMA, to the DMA pretext device, physical memory 17~896m mapped to the kernel 1G 17~896m, called Zone_normal, used to store the kernel common context, The remaining 128M of the kernel area is temporarily mapped to the total space above the remaining 1G of physical memory (high-end memory, through displacement) so that the kernel can access all the physical memory areas, while 3G in the user area can only access 3G (outside the kernel 1G) in physical memory. Note: Only zone_normal can be used directly by the kernel.

A. Low-end memory in physical memory (less than 1G) is allocated by the Kmallock function, the linear address of the virtual memory and the corresponding physical memory address are continuous, by two algorithms: the partner algorithm is responsible for the bulk of the continuous physical memory allocation, the slab algorithm is responsible for the small block memory allocation, the interface is Kmalloc function

B. High-end memory in physical memory is allocated by the Vmallock function, linear addresses are contiguous but physical memory is not contiguous, which makes it more efficient to utilize physical memory

C. Mallock allocates user space memory.

D. The disadvantage of memory segmentation management: It is still a program-based segment that cannot be fully and efficiently utilized when swapping memory with swap, resulting in external fragmentation.

E. The kernel zone start of all processes is the same low-end memory that is mapped, saving such as drive, kernel functions.

5) How to map the virtual linear address to the physical memory address in the paging management center

Linux supports up to 4 levels of paging, depending on CPU vendors, with two and three levels commonly used

A. Level Three:

The linear address from high to low is divided into: up to 10 bits is the page directory address, the Middle 10 is the page address, the lowest 12 bits is offset address. Find the physical memory address of the page table through the page directory address, find the starting address in physical memory through the page address, find the specific location by the offset, the physical address of the page directory in the CPU of the CR3 register (MMU to read), that is, find the physical address in the page directory, page, page start- > page offset, the conversion of this address is done by the MMU (Memory management unit).

The MMU is a hardware on the CPU, the CPU to read the linear address, the first will go to the MMU tlb (buffer linear), not read the linear address to find PA, that is, the required memory pages in physical memory, if not read, will report a page fault, The CPU will go to the swap partition to swap the memory pages back to load into memory.

B. Level Two:

Only the page directory and the page table, no offset, the logic is the same.

6) Shared Memory

Two process linear address mapped to the same physical address, but there is no synchronization mechanism, a process to write, B can also write, can be synchronized through the semaphore.

7) CPU and peripheral communication

There are three ways to do this:

A. Status query, the CPU will go to query the status of the peripheral information, until there is a peripheral to send a stop signal,

B-1. Hard interrupt, the peripheral will send the interrupt signal in the IRQ->cpu-> driver corresponding, such as keyboard press, this is a system call, user-state-kernel state.

B-2. Soft interrupts, issued by the process, are sent directly to the driver.

C. DMA, the peripheral does not go through the CPU directly with the memory communication.

Irq:interupt request, interrupt requests, 2-14 req available in Linux system, one peripheral connection, only one IRQ and CPU at the same time.

8) CPU Read Memory order

A. First the CPU core issued a read VA request, this request will first go to the MMU tlb, inside the passbook of the most recent query VA records and physical pages, if there is a direct physical page to the kernel, do not need to translate.

b, if the TLB is not, then it will be translated to the PA page to read the presence TLB (if the cache is allowed, this permission is set in the page table), and then loaded to the kernel.

C. If you do not load the physical page, you will be prompted with a fault, and the disk reads the load into the physical page.

Part Two: Process

At least one thread of a process.

1) The creation process is what data is created:

A. Text area: Code execution

B. Data region: Dynamically allocated memory pages + storage variables

C. Stacks, called functions and local variables, heaps of functions and variables created by the system, heaps are user-created functions and variables

These three points are also called contexts.

2) Status of the process: Ready (queued), running, blocking (waiting for I/O)

3) thread Process differences:

A thread does not own a data region, and the data region of a process is shared by multiple threads by a stack. The meaning of a thread is that there are multiple execution parts in an application at the same time.

4) User state, kernel state, system call, process switch

User state: Only 3G of physical memory can be accessed and low-end memory areas cannot be accessed.

Kernel state: CPU can access all memory space, including peripherals

The CPU consists of three levels: RING0 is the kernel, Ring3 is the user, and 12 is not used.

System call: When the user process needs to execute kernel code, such as fork, exit of the process end, or call peripheral, process execution init80h to make system calls, the CPU ring from 3 to 0, into the low-end memory area to execute the kernel program, and then switch back.

Method of the Kernel state: init80h system calls, hardware interrupts, exceptions (e.g., missing pages).

Process switching: You can see that the system call is mainly CPU state switching, and the system process is different, the CPU often need to suspend a program, restore the previous program, the kernel saves the context of a process, switch to the B process.

Process switching is fundamentally different from system calls: it looks similar, but the system call is just a CPU mode switch, the context is the same process, there is no process switch, the call ends the view queue if there is a higher priority than the current process, save the current process, switch to the new process, change the context.

5) Fork Vfork Clone Cow

Fork creates the process, the child process has the same memory space data as the parent process, the return value is PID, gives the independent virtual memory linear address, the vfork creation process does not give the virtual address space, the address space of the parent process is shared, this is already the logic of the thread, and clone is the function that Linux gives to create the thread, You can create a thread like vfork, or you can create a process that is not a parent-child relationship.

Cow is a write-time replication, such as fork a new process, if the parent process fully replicate the memory data, and then the child process does not modify the page, the direct copy is wasted, so with cow, similar to the storage of snapshoot, just give a pointer to the parent process memory area, if there is a new system Call to write a process, and then copy to write to him, the advantage is that if you just read it does not need to copy him. The process end does not immediately clean up the memory back disk, and is similar to cow

6) Zombie Process

Zombie process is a process that has ended, does not occupy memory space, no execution code, can not be scheduled, to leave a position in the process list, record the exit status, take up PID, too much is not conducive to system scheduling.

Zombie Process Reason: when the child process ends, the exit () system call is executed, the kernel frees the process resources, including the open files, the memory is occupied, but still retains certain information, including PID, exit code, exit status, run time, until the parent process passes the wait function to be completely released after this state. , the zombie process is the parent process that did not come to fetch.

Processing method:

A, kill the parent process, becomes an orphan process, called by the Init process to release the wait function.

B. The parent process can notify the kernel via the Singal () signal, the child process exits without concern, and the kernel is recycled directly.

7) process/thread synchronization

Semaphore: is a process (inter-thread) synchronization, a process (thread) completes an action to tell another process (thread) through the semaphore, another process (thread) to perform certain actions. There are two values and the number of multi-valued semaphores .

Mutex: is a mutex between threads, one thread consumes a shared resource, and other threads cannot access it until the thread leaves, and the other threads begin to use the shared resource. The mutex can be considered as a binary semaphore.

Semaphores are mainly used for inter-process communication and, of course, for inter-thread communication. The mutex can only be used for inter-thread communication.

Like a mutex, an execution unit that accesses a shared resource protected by a spin lock must first obtain a lock, which must be released after the shared resource has been accessed. If no execution unit holds the lock when acquiring a spin lock, the lock is immediately obtained, and if the lock has a hold when the spin lock is acquired, then the acquire lock operation will spin there until the lock is released by the hold of the spin lock. From this we can see that the spin lock is a low-level protection of data structures or snippets of the original way, such a lock can have two problems: deadlock and excessive CPU resource rotation lock comparison is applicable to lock users to keep the lock time is relatively short case. It is because the spin lock user generally keeps the lock time very short, so it is very necessary to choose spin instead of sleep, and the efficiency of the spin lock is much higher than the mutual exclusion lock.

      Spin locks can only be held by one executable thread at most, and if an execution thread attempts to request a spin lock that has been contended (already held), then the thread will always be busy-rotating-waiting for the lock to be re-usable. If the lock is not contended, the executing thread requesting it can get it immediately and proceed. Spin locks can prevent more than one thread of execution from entering the critical section at any time.
The sleep characteristics of the semaphore, which makes the semaphore suitable for long-time hold of the lock; can only be used in the context of a process, because the interrupt context is not scheduled In addition, when the code holds the semaphore, it can no longer hold the spin lock .

Signal Volume Interpretation:

Semaphore is a counter, process/thread want to perhaps a memory resource, will first test the semaphore, if a positive number represents can be used, if it is 0 is not available, after use, count-1, run out of count+1.

8) Inter-process communication

A. Piping (pipe)

can be ls-al/proc/pid/fd (file directory), inside the information by:

1. Files opened by the process

2. Socket: After the inode number, CAT/PROC/NET/TCP will have this inode binary Ip:port, representing the current process network connection situation

3. Pipe: Followed by an inode, the current process through the inode represents the pipe, and another process synchronization, another process has opened the inode pipe. If there are two pipes of the same inode under the same FD, it may be multithreading under this process.

9) Signal

Type of signal

There are many reasons for signaling, which are simply categorized by the cause of the signal to understand the various signals:

(1) Signals related to the termination of the process. This type of signal is emitted when the process exits, or when the child process terminates.

(2) Signals related to process exception events. such as a process that is out of bounds, or an attempt to write a read-only memory area (such as a program body area), or perform a privileged instruction and various other hardware errors.

(3) A signal associated with an unrecoverable condition encountered during a system call. If the system calls exec, the original resources have been freed, and the system resources are now exhausted.

(4) A signal related to the non-predictive error condition encountered while executing the system call. such as executing a system call that does not exist.

(5) A signal sent by the process in the user state. As the process calls the system, call kill to send a signal to other processes.

(6) signal associated with the terminal interaction. If the user closes a terminal, or presses the break key, and so on.

(7) Trace the signal of the process execution.

The list of supported Linux signals is as follows. Many of the signals are related to the architecture of the machine, and the signals listed in Posix.1 are listed first:

Signal value processing action causes the signal


SIGHUP 1 A terminal hangs or the control process terminates

SIGINT 2 A Keyboard interrupt (if the break key is pressed)

Sigquit 3 C Keyboard exit key is pressed

Sigill 4 C Illegal Instruction

SIGABRT 6 C Exit instruction issued by abort (3)

SIGFPE 8 C Floating-point exception

SIGKILL 9 AEF Kill signal

SIGSEGV-C Invalid memory reference

(Ten) VFS

Hard link: In fact, the same file has multiple aliases, with the same inode, and Dentry is different.

1. The file has the same inode and data block;

2. Only files that already exist can be created;

3. Creating hard links for different cross-file systems

4. Cannot create a directory, can only create hard links to files

5. Deleting a hard link does not affect other files that have the same inode number;

Soft Link: The soft link has its own inode, which has its own file, but the contents of this file is the path name of the other file. Therefore, the soft link has its own inode number and user data block.

1. Soft links have their own file attributes and permissions, etc.;

2. Soft links can be created for non-existent files or directories;

3. Soft link can cross file system;

4. Soft links can be created on files or directories;

5. When creating a soft link, the link count I_nlink will not increase;

6. Deleting a soft link does not affect the file being pointed to, but if the original file is deleted, it becomes a dead link, but the path to re-create it back to normal soft link, but the content of the source file may be changed.

Super fast: Super block, DUMP2FS $DF-H can see the super fast information of the partition, is used to store the file system control information data structure, Fs+block number +inode number, VFS is a software layer, exists in memory.

For example CP ext2 under the a file to ext3 under the same:

User layer read () system call->sys_read->ext2 File system reading method, physical media--load into memory, map to b file memory address--with the ext3 file under the system write method write to disk.

Linux Module deployment

Several commands and files:


Modprobe-i <.ko> installation module, installed according to the dependency relationship in MODEULES.DEP

Modprobe-r <.ko> Deleting modules

Depmod: Put the new driver module into the/lib/modules/core/kernel/fs. NET. Sound, which executes depmod later, generates dependencies for the corresponding module and updates the MODULES.DEP file. Then install it with the command above

The boot module is loaded and the hardware scan information is saved in DMESG.

Linux Kernel Learning Summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: