Translation: Brother Fei ( http://hi.baidu.com/imlidapeng )
All rights reserved, respect other people's labor results, please indicate the author and original source and this statement when reproduced.
The original name: "Linux Performance and Tuning guidelines"
Original address: http://www.redbooks.ibm.com/abstracts/redp4285.html
-------------------------------------------------------------------------------------------
What is the 1.1.1 process?
1.1.2 Process Life cycle
1.1.3 Threads
1.1.4 process priority and nice values
1.1.5 Context Exchange
1.1.6 Interrupt Handling
1.1.7 Process Status
1.1.8 Process Memory Segment
1.1.9 CPU Scheduler
-------------------------------------------------------------------------------------------
Process management is one of the most important tasks for any operating system. Efficient process management ensures smooth and efficient operation of the application. Linux's process management is very similar to UNIX. It includes process scheduling, interrupt processing, signal sending, process prioritization, process switching, process state, process memory, and so on. In this section, I'll discuss the principles of Linux process management. It can help you better understand how the Linux kernel manages processes to affect system performance.
What is the 1.1.1 process?
A process is an instance of the execution program running on the processor. A process can use any resource that the Linux kernel can control to accomplish its task. All processes running on the Linux operating system are managed using a structure called task_struct, which is also known as the process descriptor. The process descriptor includes all the information that the process runs, such as the process ID, the process properties, and the resources needed to build the process. If you know the structure of the process, you can see what is important for process execution and performance. Figure 1-2 shows a summary of the process structure.
Figure 1-2 TASK_STRUCT Structure
1.1.2 Process Life cycle
Each process has its own lifecycle, such as creation, execution, end, and elimination. These phases are repeated countless times during system boot run. Therefore, the process life cycle is extremely important from a performance point of view. Figure 1-3 shows the typical life cycle of a process
Figure 1-3 Typical life cycle of a process
When a process creates a new process, the creation process (the parent process) issues a fork () system call. When a fork () system call is issued, it gets a process descriptor about the new process (the subprocess) and sets a new process ID. It copies all the data from the process descriptor of the parent process to the child process. The entire address space of the parent process is not replicated at this time, so the parent-child process shares the same address space. The exec () system call copies a new program to the child process's address space. Because a parent-child process shares the same address space, a new program writes data that causes a paging error "page fault" exception to occur. The kernel then assigns a new physical paging to the child process. This deferred operation is called "Copy on Write" when writing . A child process is usually the execution of its own program, which differs from what its parent process does. Such an operation avoids unnecessary overhead because replicating the entire address space is a very slow and inefficient operation that consumes a lot of processor time and resources. When the program execution completes, the child process calls the exit () system call end. System call exit () frees most of the process's data structures and sends a signal to notify the parent process. At this point the child process is called the zombie process "zombie process". The child process is not purged until the child process uses the wait () system call to let the parent process know that it has ended. When the parent process gets a notification of the end of the child process, all data structures of the child process are purged and the process descriptor is freed.
1.1.3 Threads
A thread is an execution unit that is produced by a single process. It runs in parallel with other threads in the same process. They can share the same resources as memory, address space, open files, and so on. They can access the same set of application data. Threads are also known as lightweight processes (light Weight process LWP). Because they share resources, threads can not modify their shared resources at the same time. Mutex implementation, lock, serialization, etc. are the duties of the user application. From a performance standpoint, creating a thread is less expensive than creating a process because the thread does not need to replicate resources when it is created. On the other hand, processes and threads have many similar features in scheduling algorithms. The kernel uses a similar approach when dealing with them.
Figure 1-4 Processes and threads
In the current implementation of Linux, threads support portable operating system interfaces (POSIX). There are several implementations of threads in the Linux operating system. The following are some of the most commonly used threading implementations.
? Linuxthreads
Linuxthreads has been implemented as the default thread since Linux kernel 2.0. But Linuxthread has many implementations that are incompatible with the POSIX standard. The Native POSIX Thread Library (NPTL) is replacing Linuxthreads. Linuxthreads will not be supported in future Linux Enterprise distributions.
? Native POSIX thread Libray (NPTL) local POSIX line libraries
NPTL was originally developed by Red Hat. The NPTL is more compatible with the POSIX standard. With 2.6 kernel enhancements such as the new system call Clone (), signal processing implementations, etc., it can provide better performance and scalability than linuxthreads.
There are a lot of incompatibilities between NPTL and Linuxthreads. An application that relies on linuxthreads may not work in the NPTL implementation.
? Next Generation POSIX thread (NGPT) Next-generation POSIX threads
NGPT is the version of the POSIX line libraries developed by IBM. Currently in the maintenance phase, there is no development plan in the future.
Using the LD_ASSUME_KERNEL environment variable, you can set which line libraries the app uses.
1.1.4 process priority and nice values
Process Priority is a numeric value that allows the CPU to determine the order in which processes are executed based on dynamic and static priorities. A high-priority process can gain more opportunities to run on the processor. The kernel uses the heuristic algorithm "heuristic algorithm" to dynamically increase and lower the dynamic priority based on the behavior and characteristics of the process. The user process can indirectly change the static priority through the value of the nice of the process. A process with a high static priority can obtain a longer time slice of "Slice" (how long the process can run on the processor).
The nice value range in Linux is 19 (lowest priority) to 20 (highest priority) and the default value is 0. To change the nice value to a negative number, you must do so by logging on or using the SU command.
1.1.5 Context Exchange "contextual switching"
Process information is stored in the registers and caches of the processor during process execution. This set of data that is loaded into the register for the executing process is referred to as the context . To switch processes, the context of the currently executing process is staged, the context of the next execution process is restored to the register, and the process descriptor and kernel-mode stack "Kernel mode stacks" blocks are used to store the context, which is called context Exchange Switching ". Excessive context swapping occurs because the processor flushes the registers and caches each time to make a resource out of the new process, which can cause performance problems. Figure 1-5 illustrates how the context exchange works.
1.1.6 Interrupt Handling
Interrupt processing is one of the highest-priority tasks. Interrupts are typically generated by I/O devices such as network cards, keyboards, hard disk controllers, serial adapters, and so on. Interrupt control sends an event notification to the kernel (such as keyboard input, Ethernet frame arrival, etc.), which instructs the kernel to break the process in progress and handle interrupts as soon as possible because most devices require a quick response, which is critical to system performance. When an interrupt signal reaches the kernel, the kernel must switch between the currently executing process and the new process that handles the interrupt, which means that the interrupt triggers the context exchange, so a large number of interrupts can result in degraded system performance.
In Linux, there are two types of interrupts. A hard Interrupt, which is caused by a device that needs to be responded to (hard disk I/O interrupts, network adapter interrupts, keyboard interrupts, mouse interrupts). The other is a soft interrupt "Soft Interrupt" for tasks that can be deferred (TCP/IP operations, SCSI protocol operations, etc.). You can find information about hard interrupts in the /proc/interrupts .
In multiprocessor environments, each processor can be used to handle interrupts. Binding an interrupt to a physical process can improve the performance of the system. For more detailed information, refer to 4.4.2 CPU affinity for interrupt processing CPU Affinity.
1.1.7 Process Status
Each process has its own state to display the current situation of the process. Its state changes during process execution. There are several possible states:
? Task_running "in Operation"
This status indicates that the process is running on the CPU or waiting to run in the queue (run queue "Run").
? Task_stopped "Stop"
When a process receives certain signals (for example, SIGINT, SIGSTOP) and is paused in this state, the waiting process is re-run after receiving a recovery signal such as Sigcont.
? Task_interruptible "can be Interrupted"
In this state, the process is paused and waits for certain states to be reached. If a process that is in a interruptible state receives a stop signal, the state of the process is changed and the operation is interrupted. A typical example of a interruptible state process is waiting for the keyboard to be entered.
? Task_uninterruptible "Non-disruptive"
This state is basically similar to the interruptible state. However, the interruptible state process can be interrupted while sending a signal to an non-interruptible process without any reaction. A typical example of a non-interruptible process is waiting for hard disk I/O operations.
? Task_zombie "Zombie"
When the process exits with the system call Exit () , its parent process will know. The zombie status process waits for the parent process to notify it to release all data structures.
Figure 1-6 Process Status
Zombie Process
When a process receives a signal and has terminated, it usually takes some time to complete all tasks before the end (such as closing open files). In this usually short period of time, the process is a zombie (Zombie).
When the process finishes all the shutdown tasks, it reports to the parent process that it is about to terminate. But sometimes the zombie process does not terminate itself, in which case the status is displayed as (Zombie).
It is not possible to end such a process with the KILL command because the process is considered dead. If you can't clear the zombies, you can end their parent process, and the zombies will disappear as well. However, if the parent process is the INIT process, you cannot end it because the INIT process is an extremely important process. So you may need to reboot the system to clear this zombie process.
1.1.8 Process Memory Segment
A process needs to use its own area of memory to perform its work. The changes in work vary depending on the situation and the process usage. A process can have different workload characteristics and different data size requirements, and processes need to handle data in a variety of sizes. To meet this requirement, the Linux kernel uses a dynamic memory allocation mechanism. Process memory allocation structure 1-7.
Figure 1-7 Process address space
The memory area of the process is composed of several parts
? Text segment
Used to store execution code.
? Data segment
The data segment consists of three regions.
-Data: Storage of initialized data such as static variables.
―BSS: Stores 0 of initialized data, and the data is initialized to zero.
Heap "heap": This area is used by malloc () to allocate dynamic memory as needed. Heap to high address expansion.
? Stack segment
Used to store local variables, function arguments, function return addresses. The stack segment expands to a low address.
Use the pmap command to display memory allocations for the user process address space. You can use the PS command to display the size of this memory segment. Refer to 2.3.10 "Pmap" and 2.3.4 "Ps and Pstree".
1.1.9 Linux CPU Scheduler
The basic function of a computer is very simple to calculate. For computing, this means managing compute resources or processor and compute tasks (called threads or processes). The Linux kernel uses an O (1) algorithm that is very different from the algorithm O (n) used by the previous CPU scheduler, thanks to Lngo Molnar's great contribution. O (1) refers to a static algorithm, meaning that regardless of the number of processes, the execution time of the process is constant.
The scalability of this new scheduler is very good, regardless of the number of processes or the number of processors, the overhead of the system is very small. This algorithm uses up to two process priority series groups:
? Active "Active"
? Expired "Expired"
The scheduler allocates time slices based on the priority of the process and the priority interception rate of "Prior Blocking", and then they are placed in the active array "active array" in the order of precedence. When the time slices are exhausted, they are assigned a new time slice and placed in an expired array. When the time slices of all the processes in the active array are exhausted, two arrays are swapped and re-executed. For interactive processes (as opposed to real-time processes), high-priority processes with long slices of time can get more time than low-priority processes, but this does not mean that low-priority processes are ignored. In an enterprise environment, there are many processors and often a large number of threads and processes, which can greatly increase the scalability of the Linux kernel. The new O (1) CPU Scheduler was designed for the 2.6 kernel, but has been ported to the 2.4 kernel family. Figure 1-8 illustrates how the Linux CPU Scheduler works.
Figure 1-8 Linux 2.6 kernel O (1) Scheduler
Another big improvement for the new scheduler is the support for non-conforming Memory Architecture (NUMA) and symmetric multithreaded processors such as Intel Hyper-Threading technology. Improved NUMA support ensures that load balancing spans NUMA nodes only when a node is overloaded. This mechanism ensures, traffic over the comparatively slow scalability links in a NUMA system is minimized. While load balancing iterates through the processors in the dispatch domain Group "Scheduler Domain Group" at each scheduled tick, the load spans the dispatch domain "Scheduler domain" transfer only when the node is overloaded and requests load balancing.
Figure 1-9 O (1) CPU Scheduler structure
Translator Note: In the process of translation, deeply feel that process scheduling is a very complex task. If you want to learn more about the 2.6 kernel process scheduling also need to review the relevant information, here are some about the 2.6 kernel process scheduling articles, interested friends may wish to see.
Linux 2.6 Scheduling System Analysis: http://www.ibm.com/developerworks/cn/linux/kernel/l-kn26sch/index.html
NUMA Technology for Linux: http://www.ibm.com/developerworks/cn/linux/l-numa/index.html
Linux Scheduling domains:http://www.ibm.com/developerworks/cn/linux/l-cn-schldom/index.html
Inside the Linux scheduler:http://www.ibm.com/developerworks/linux/library/l-scheduler/
Linux Performance and Tuning Guide----1.1 Linux Process management