Summary of process process scheduling
Linux is a multi-process environment, not only user space can have multiple processes, and the kernel can also have kernel processes inside. The threads in the Linux kernel are no different from the process, so they are called thread and process alike. The Scheduler schedules CPU resources to be assigned to specific processes according to specific rules. It then occupies the resources of the CPU resources to apply for or use hardware or resources. So there are a couple of questions that are involved:
For the Scheduler:
When the scheduler is running, how can I determine which program will be dispatched to use CPU resources?
n How do you not let any one process starve?
n How to locate and respond to interactive processes faster?
L A single CPU has only one pipeline, but can I schedule multiple processes at once to use multiple CPU physical resources at once?
How does the scheduled CPU let it release resources? Is it free or does it have a related recycling mechanism?
For a process that wants to be dispatched:
How do I define my own probability of being dispatched?
How can I receive a signal while waiting to be dispatched?
How can I avoid the resources I want to occupy without being used by other processes when I am not scheduled? Or does the SMP environment have no processes that use the same resources at the same time?
Scheduling policy
There are two kinds of time-sharing system and realtime system. Linux itself is not a real-time system, but in an inclusive principle, Linux also implements the interface of real-time systems.
For the whole kernel, the scheduling strategy includes: Sched_normal, Sched_fifo, SCHED_RR, sched_batch four kinds. The standard scheduling strategy also has two types of Linux that are not implemented: Sched_idle, Sched_deadline. Sched_normal is the default scheduling strategy for the time-sharing we call most often.
The sched_idle process will be executed without any non-sched_idle processes present. This level is typically used for operations such as disk grooming that do not affect the user's background time insensitivity. But the Linux kernel is not implemented.
The Sched_normal is fully fair and optimized for user interaction to optimize the prioritization of the two cases. In general, we often use to do dynamic priority adjustment for the user.
Whether real-time or normal, the priority is represented by a numeric value. Normal static priority is all 0, the difference between the normal scheduler can use dynamic priority. The program priority for real-time scheduling is 1-99, which means that any real-time program has a higher priority than the normal program.
When using SCHED_RR, the time slices, although there are also priority numbers, but even the highest priority of the process when the time slice is exhausted when the CPU will be released. In Sched_fifo, a process with the highest priority will never release the CPU unless it is actively freed (except for IO completion). Both are preempted when there is a higher priority process.
The previous said that the Sched_idle scheduling method is not implemented, then how to implement the Linux background disk collation and other operations? The answer is a function-like Sched_batch scheduling method. This scheduling method does not run completely when there is a normal program, but it guarantees the execution of the normal program and the response of the interactive program. It is also suitable for compiling operations such as GCC.
Configuration of the Process scheduling policy
You can set the scheduling method via the API provided by the kernel, or you can use the command line. The command is Chrt. You can also configure the maximum time consumption for real-time processes, because if a bug occurs in the real-time process, it is almost impossible for the highest priority process to release the CPU, causing the system to die. Parameters such as Kernel.sched_rt_period_us can be configured with the SYSCTL call to configure the maximum CPU usage of the real-time scheduling process.
With Cgroup and process scheduling, you can also configure how CPU resources are configured according to Cgroup. This is also done through the Cgroup file system.
Kernel infrastructure provided by the kernel process of the dispatch process
Many of the operations in the kernel are done using some kernel infrastructure. such as Workqueue, Tasklet, Softirq. These infrastructures can typically accomplish specific tasks. Since it is used to accomplish the task, it must be involved in scheduling. The dispatched units are only kernel threads. So while these mechanisms are some call-to-use interfaces for the user, their execution is performed through a specific kernel daemon thread.
Soft interrupts, Tasklet and Workqueue
Linux interrupts are divided into the upper and lower parts, the lower part can be off interrupt, resulting in the upper part of the interrupt task. The upper part does not require a shutdown interrupt and can be scheduled for execution. The reason for this is that the shutdown time in the system must be short or the response will be lost. The resulting soft interrupt is added to the execution queue of the kernel daemon thread ksoftirqd. This thread will then schedule the execution of the associated soft interrupt. Tasklet is similar to soft interrupts, except that in SMP systems, soft interrupts can be performed by multiple CPUs, are reentrant, and Tasklet only allow one CPU at a time, and are not reentrant. Users can decide whether to use Tasklet or SOFTIRQ, depending on whether a soft interrupt is allowed to re-enter.
Special, SOFTIRQ and Tasklet can't sleep, so you can't use semaphores or other blocking functions. Because they are all executed by one kernel thread (KSOFTIRQD), the system will not be able to respond to other soft interrupts if it is blocked. The work queue Workqueue itself is provided to the user as an available unit, and a workqueue is a kernel thread. Kernel modules can generate a workqueue and then add their own tasks into it. You can also add a task to the kernel by using the workqueue that it already has. Workqueue is a container in which kernel modules can add tasks to existing workqueue. The Workqueue is dispatched to perform its own sub-task. Can be said to be a process in progress.
Resource lock
The kernel's resource locks are: Spin lock, Semaphore, mutex, read/write lock rwlock, sequential lock, RCU lock, Futex lock.
These locks are used to solve different types of problems, respectively:
• Multiple CPUs concurrently accessing the same resource in a soft interrupt. Because the soft interrupt can not sleep, so in multiple CPUs to seize the unified resources can not use other locks, only busy and so on, this is the spin lock.
When a normal process competes for resources, the resource can only have one or several processes acquired at the same time, whether read or written. This is the mutex and the semaphore (the semaphore is 1 o'clock is the mutex)
L don't want to go into the kernel every time when mutual exclusion is not very frequent. There's a futex lock.
L The same resource wants to read and write separate processing. is read and write locks and sequential locks and RCU
Different locks serve different purposes and scenarios. In fact, Linux is only a part of the application of the idea of resource locking, operating system principle is a discipline, there are many ways to deal with the problem of resource locking.
Resource locks are essentially synchronous and mutex issues. As can be seen from the above, most of them are dealing with simultaneous writing problems. So as long as the operation that is guaranteed to compare and write is atomic, the thread can be unlocked. Intel has implemented similar instructions, such as Cmpxchg8, to perform comparisons and writes in a single cycle to ensure that no concurrent write collisions occur.
The same idea, Linux also provides two sets of atomic operations, one set for integers, and one for bits. A reasonable use of atomic operations can avoid most of the lock application scenarios. Spin locks look expensive, a runtime requires two of the CPU idling wait, but when the amount of code to lock is very small, because of the lightweight spin lock, is much smaller than the cost of using semaphores. Therefore, spin locks are not only used for soft interrupts, but also for locking a small piece of code.
In addition to the spin lock, there is a lock need to be busy, such as sequential lock. Strictly speaking this is not busy, but use a clever and very simple idea, read the lock value before reading, read the lock value after reading, if not change, it means that the reading process, read the value is not written, reread. When you write, you change the lock value. The principle is equivalent to a spin lock, but can allow multiple writes, read operations to read the correct value after the completion of multiple write operations.
But when it comes to locking large chunks of logic, the semaphore is needed for this heavyweight lock. However, general logic should try to avoid large locks. In reality, large locks can also be avoided by fine-grained design.
RCU Lock directly does not block write, the previous order lock is already an improved read-write lock, but also can only have a write. However, the RCU lock allows the write operation to be not blocked, and multiple writes are not written to the same place, but a copy of the new data is written. Read and continue to read the old, so that the use of memory increases for the cost of reading and writing are not blocked.
There is also a lock futex that is used only by user-space processes. Using this lock can completely replace the various locks of user space. Because of its high efficiency, behavior and meet the requirements. The principle of Futex is actually to consider the user state before using the semaphore and other locks are a variable in the kernel, each time the query to enter the kernel state, but also to come out. Fitex's idea is to directly map the kernel state of this lock variable mmap to the user process space, so that each user process can directly query the value in their own space without entering the kernel to know if there is anyone in use. Read although it is easy for everyone to read, but the write takes into account the possibility of multiple processes operating a variable, Linux is provided by the API into the kernel to lock write. Although the final is to fall into the kernel, but its judgment part can not enter the kernel to complete, and most of the situation to determine that the resources are not concurrent access. Except for special application scenarios.
The problem with semaphores is that if multiple CPUs get read locks, the semaphore itself is constantly refreshed in the cache of each CPU, resulting in a decrease in efficiency. The way to solve the kernel defines a new semaphore: Percpu-rw-semaphore.
Mutex and synchronization
The concept of mutual exclusion and the concept of synchronization must be differentiated. Mutual exclusion only one process at a time can access the resource, there is no timing concept, and the synchronization contains multiple accesses to the resources of the process's order of precedence, having you end up with the turn I mean. Mutual exclusion It's just that you're not finished and I can't start. Semaphores are synchronous in concept, because processes that do not get resources are sleep-waiting. Other kernel locks are mutually exclusive (spin, order), because they are blocked, or are always available (RCU).
SMP Lock and Preemptive lock
There are two kinds of resources being preempted: Concurrent access of multiple CPUs under SMP system and a preemptive access under a CPU. Most applications use the same locks to lock data when they are developed. However, these two situations have different characteristics, in many cases, a CPU can be a preemptive lock can do more lightweight.
Preempt_enable (), preempt_disable (), preempt_enable_no_resched (), Preempt_count (), preempt_check_resched () With these functions you can complete the lock operation in a single CPU, and no other kind of lock is required.
Priority lock
Futex is a good choice for the user to use the lock, however the user's process has a different priority, and the lock ignores all priorities, and the semaphore can implement the synchronization concept, but the lock does not. However, there are times when you want the lock to be prioritized on the process, which is the function provided by the Pi-futex lock, called priority inheritance. is implemented using a Futex lock, but the prioritization of the decision process is increased to determine the priority of the unlock. The efficiency will decrease significantly after opening the function.
The SMP processing of spin lock
When a spin lock a lot of processes in the spin wait, you can judge very busy in the spin lock. The way to judge the spin is to find that the owner of the spin lock has changed, but that it has not become himself. At this point, you should sleep instead of continuing the spin.
Lg_local_lock, Lg_global_lock
Multi-process (thread)
There is no difference between threads and processes by the Linux kernel, and if you are implementing a thread with a separate dispatch unit, you must use the process to correspond in the kernel. It is well known that in the kernel, the resources that each process can access are usually unknown to other processes, while the user state requires multithreaded programming to share the kernel, and the Linux kernel solves this problem by using a mechanism that allows a process to specify which resources can be shared with other processes when it is created. This simulation enables multithreaded environments. Newer kernels can not only share resources, but also use unshare system calls to cancel sharing, which means that the kernel makes it possible for the user to run the thread out of the process independently from the bottom.
Process resource Limits
There are a large class of requirements that restrict the resources available to the process. Can limit CPU, memory, file, behavior and so on. Even system calls.
System call Limit: Seccomp_filter
Restricts visible system calls to processes using the Seccomp_filter feature.
The kernel communicates with the user program NetLink
Proc
Read Write for Device node
Ioctrl
System calls
Inter-application communication
Writev/readv
System V IPC
Pipeline
Fifo
Dbus
UNIX domain
Signal
POSIX IPC
Mailbox
Simulate real-world mailbox apps. One process can send messages to all other processes, but only the process can receive messages sent to itself. Each process has only one mailbox address, and the order in which the messages are processed is FIFO.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Introduction to Linux Kernel Engineering--process