Linux Kernel repetition-everything is a process

Source: Internet
Author: User

1. zero_page

2.6.24 the zero_page is removed from the kernel. However, we are preparing to add zero_page recently. The reason for this is why it was regarded as a chicken rib at the beginning, the reason for this error is that when the zero page is added to the reverse ing, the reference count of its page structure will be updated, resulting in a fixed cacheline erosion, simply taking the zero page as a chicken ribs is too cruel, so the 2.6.32 kernel looks for a better way to reverse the zero page, that is, we try our best to remove its reputation as a chicken rib. The specific implementation method is to change the vm_normal_page function, that is, the zero page is not treated as a normal page, but as an abnormal page. In this way, when the page is obtained based on the virtual address, once it is known that it is a non-normal page, for example, zero page, null is returned, while the vm_normal_page caller often performs further operations based on the returned value, for example, to update the reference count, once vm_normal_page is found to return NULL, the update operation is no longer performed. Therefore, the cacheline is washed out because the memory of the page structure is updated, this solves the problem of zero page. The benefit is that the zero page is highly efficient, and it does not need to be allocated or recycled from the memory manager. Remove the reputation of the zero page, so it can return to the kernel mainline. As for another side effect of the zero page, it will lead to additional page defects, after all, the zero page is write-protected, but whether it will lead to additional page defects depends on the user space program.

2. cpuidle-Check

CPU will execute cpu_idle when there is no task running on it. If it was previously, it would be enough to directly call halt, but it is not that simple now, because APM or ACPI are good, all of them are added with the energy-saving mechanism, so we cannot simply halt or stop the CPU at different levels. Therefore, the kernel needs a complete set of mechanisms, which is the cpuidle mechanism, the framework is to change the idle function (halt packaging) into a callback function, and then take different actions based on different "CPU status". What is the meaning of this function? In fact, stopping the CPU does not always mean saving energy, because "stop-start" itself also requires energy consumption, especially if a CPU is stopped for only a few milliseconds, This is not worthwhile, because the energy required to start it again is enough to offset the energy saved in milliseconds, the kernel must be able to monitor this situation, but after all, the kernel is not smart, it is unrealistic to know how long a CPU can be stopped in advance. Although hrtimer can be used to drive the CPU, for example, to set the time when the next timer expires to the time when the CPU is started, however, the interaction between multiple CPUs and the system behavior after the CPU is stopped are completely uncertain. For example, a user just stopped a CPU and instantly created a large number of processes, the result is that the CPU load changes instantly, which may be quite different from the previous kernel prediction. This may cause the CPU to be instantly started due to misprediction, A lot of energy is consumed between stopping and starting, and a short sleep time can't save much energy to make up for these waste. Although the above situation is sad, if the kernel is indifferent, even though the kernel cannot be intelligent But can people? A omnipotent person cannot do anything. All predictions are based on experience. If a person gives this ability to the operating system, the river can still be predicted based on experience, it can also be said that the so-called "heuristic algorithm" is to guess the following load based on the previous CPU load. Although it cannot be absolutely accurate, due to the local principle of global events, such predictions are always better than none. Specifically, if a CPU has a high historical load, so try not to let it enter a deep sleep state (considering ACPI supports different sleep depths). The actual operation is to implement a state machine in the idle callback function, different States indicate different sleep depths. The next state of the state machine is determined by the CPU's historical load, and the historical load is divided into different intervals, the load falling into a certain range indicates a certain next state, which is the principle of the sleep state machine propulsion. Facts have proved that the kernel is becoming complex, and even the idle function is called back.

3. Child-runs-first

Once, I sent a kernel patch to ensure that the promised sub-processes are preferentially executed under the CFS scheduler on a single-core CPU, the original thought was to avoid additional unnecessary write-time replication. In view of the fact that sub-processes generally call the exec Function Series in an instant, it is a history. Considering this avoids cow, the impact on cacheline is not taken into account. The problem is, should the parent process that is originally in the running state continue to run, or switch to the newly created child process? If yes, therefore, the related cacheline will be refreshed, which will also affect the efficiency. The problem is converted to whether cow has a great impact on the system efficiency or whether the cacheline is refreshed has a great impact on the system efficiency, the kernel developer's answer is: after all, many of them are now multi-core processors. When creating a new process, the multi-processor load balancing operations should be performed according to the load balancing policy, in this way, the sub-process may not run on the same processor as the parent process. To reduce the extra cow, it is hard for the sub-process to run first, the kernel will do a lot of complicated work, this is not worth the candle. In fact, the sub-process is very simple. Copying the memory of the parent process will benefit from cacheline if it runs on the same processor. However, there will always be more complicated or important factors that cause the sub-process to be allocated to another processor, in this way, there is no need to fulfill the child-runs-first commitment. In order to make full use of the hot cacheline of the parent process, the new kernel simply kills child-runs-first, which is not completely killed, but sets this kernel option to false by default, you can also move it back.

4. child no longer inherits the real-time priority of the parent process

This topic is derived from an article similar to the battle essay titled realtimekit and the audio problem.

This article is complaining about why Linux does not better support real-time audio. To achieve better audio quality, we have to coupling other kernel modules, for example, LSM is used to support better audio playback, but this will introduce insecure factors. Of course, real-time priority can be used, such as RR and FIFO, however, if a process can easily obtain a real-time priority at any time, then malicious processes should also be able to do so. In order to support high-quality audio playback, they should be given a real-time priority, however, if a process is disguised as an audio playback process, it can immediately go down the system. As a result, the article is passionate about it, isn't kernel developers able to cope with real-time audio playback in a more harmonious way? In fact, audio playback is just an example. Many other requirements also require real-time response. On the other hand, security is the most important and cannot be exploited maliciously. Therefore, some measures must be taken. As we all know, all processes in UNIX systems are a tree model, and init is the root of the tree. All other processes are descendants of init. Of course, some features of init are inherited. Priority is one of them, A child process inherits the priority of the parent process more or less. If a malicious process gets the real-time priority, its child process gets the real-time priority, if the malicious process desperately Fork sub-processes, a DoS attack will be completed, which of course needs to be avoided. Therefore, a process identifier is proposed in the new kernel patch, it is the reset identifier. Once a process has this identifier, No matter what priority it is to execute any scheduling policy, the priority of the child process fork will be returned to the average priority, all scheduling policies cannot be real-time scheduling, which avoids the use of fork bombs by malicious processes. The addition of this logo does not cut off the relationship between the father and the child. On the contrary, there are more contacts and the father no longer has the power to give everything to his son, when necessary, the son must be responsible for his father's distrust, and sometimes the father must pay some inheritance tax. The introduced mechanism avoids the exploitation of some vulnerabilities. I think this mechanism is necessary to ensure that some applications with high real-time requirements can safely use the RT scheduling policy instead of worrying about security issues.

5. Miscellaneous

First, let's take a look at the impact of fine-tuning the CFS scheduler's preemption Granularity on the system (CFS does not have the concept of time slice. The time slice of the virtual clock is dynamically adjusted as the number of processes in the system changes, not as static as O (n) or O (1) in the past). For example, if you only run a few servers, you can increase the granularity, in this way, a scheduling cycle will take a long time, and process switching is relatively infrequent, with low overhead. However, if you are running desktop and multimedia applications, You need to reduce the granularity, frequent process switching, and enhanced interaction, at the same time, the overhead also increases. Let's look at the user space driver, which is not as efficient as many people think. The fact is, the only disadvantage of a user space program is that the system call overhead is relatively large. As for other programs that are the same as the kernel, you must know that Linux is fully scheduled according to the process, of course, there will also be an unexpected customer with such interruptions. The user space driver requires two steps for efficiency: the first is the real-time application priority; the second is to lock the memory into the memory with mlock to eliminate the overhead of page replacement.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.