Differences in the design of asynchronous Io between Windows and Linux

Source: Internet
Author: User

From: http://www.douban.com/group/topic/11015963/

 

In the Windows operating system kernel, we must first understand four concepts: APC (asynchronous process call), DPC (delayed process call), and IRP (IO request packet) the preemptible Scheduling Based on priority is explained as follows:
1. APC. Asynchronous call is similar to the signal in Linux, but the execution of the signal processing function requires two steps: Setting and triggering, while APC only has one step, only the APC callback function needs to be discharged into the APC column of the thread, and it will always be executed. APC has two pairs of columns for each thread. One is the APC column of the user space and the other is the APC column of the kernel space. The APC function runs on the APC priority, it is generally higher than the priority of the user thread.

2. DPC. Latency call is generally used for the later processing of interruptions. Interruption can be run in the context of any process (thread). In general, the interrupt processing function should shut down the corresponding interrupt to prevent re-entry. In order to minimize the time of such uncertainty, windows only takes a short time in the interrupt handler function, and then queues a dpc to exit the interrupt. The DPC runs at a lower priority than the ISR, so it can be interrupted by hardware.

3. IRP. This is an extremely important concept in windows. It directly leads to the differences between the design concepts of windows and those of UNIX and UNIX. An IRP encapsulates an I/O operation process, and its internal structure is a stack. Each stack frame represents a driver program module, which encapsulates the callback function of the driver, IRP is the first-level down-layer driver transfer to complete an IO operation.

4. preemptible Scheduling Based on priority. This is a scheduling policy in windows. It is based entirely on the priority (there is a time slice, but it does not affect anything. The time slice has the same policy for all processes), instead of other. If a higher-priority thread is ready, it will immediately seize the current thread. Even interruptions have their own priorities. Oh, of course, they are the highest... the responsiveness of Windows depends on the improvement of dynamic priority, but this involves other aspects. It is not discussed with the topic of this article.
After understanding the above concepts, we can continue.

The procedure for windows I/O is as follows:
1. The application calls the readfile function.

2. NTDLL. dll puts user requests in the kernel space for further processing, which is taken over by the IO Manager

3. the IO manager creates an IRP, fills in its fields based on user requests, and then delivers the fields to the top driver.

4. The driver sends the IRP to the hardware device layer by layer and returns the result layer by layer. Note that the IO callback must be registered for each next layer.

At this point, if it is an asynchronous call, you can return to the user space for processing. If it is synchronous, it will be blocked in the IO manager. I/O operations are not complete yet. Next, let's look at it:

 
5. After the hardware operation is completed, an interruption occurs. If a DPC is interrupted, the system returns the result.

6. Run the DPC and call the completion routine of the driver.

7. Completion routines of each layer are called one after another, and the control flow is returned to the IO manager layer by layer.

8. queue the user APC in readfileex (in asynchronous mode). This APC will be called within a short period of time because it has a higher priority than the user thread.
It can be seen that the self-negotiation and independence of IRP make everything possible. IRP contains enough information to separate Io operations from the thread environment, this is also the overall design concept of Windows Kernel-modularization. Windows priority control is very clever, there is a concept called "interrupt request level", is to divide the CPU operation into different levels: PASSIVE_LEVEL---DISPATCH_LEVEL---PROFILE_LEVEL, windows assigns tasks among these priorities to simplify the operation. It is based on the principle that once a CPU runs on IRQL higher than passive_level, activity on the CPU can only be preemptible by activity with higher IRQL. This principle ensures that many operations can be performed securely. For example, the IO dispatch routine runs on passive_level, And the DPC that pulls an IRP from the IRP pair to process runs on dispatch_level, in this way, no dispatch routine will disturb an IRP operation, saving the lock operation. In a hardware interruption, CPU running at the hardware interruption priority puts a DPC into this CPU pair column. After the interruption is completed, the CPU is downgraded to the dispatch_level priority, and the DPC can be executed... priority Control is a major feature of windows.
In summary, we understand the major elements of Windows asynchronous IO: 1. independent self-consistent IRP structure; 2. Priority Control. Further abstracted to the design concept of Windows: modularization. Windows itself is a micro-kernel system, while the micro-kernel is modular.
In contrast, the implementation of asynchronous IO in Linux is to return directly at the original synchronization congestion point, and then retry later. The retry is a series of user threads when the old kernel does not support asynchronous Io, working queues are working queues after 2.6, while working queues are essentially a kernel thread. Therefore, Linux does not have the DPC concept corresponding to Windows. If you have to pull a similar one, that is softirq, but it only executes the image. In fact, softirq is executed by the softirqd kernel thread in most cases, in order not to allow softirq (DPC ?) For a long time, imagine a high-load server with multiple NICs. softirq occupies almost a large CPU resource, it is fair to receive the scheduling of the thread scheduler in a unified manner. In Windows, it seems unfair. Of course, you can also establish a kernel thread in the Windows Driver, but it is not a function provided by the system after all!
Linux and UNIX are in the same line, simple, fair, and unified quality. Everything is a file, everything is scheduled, and the interface is unified. Compared with windows, it seems a little green, but I still appreciate its priority control solution. Traditional UNIX is essentially synchronous because it is simple, controlled, and predictable. Therefore, it is stable. If a function in the kernel is blocked, the process cannot continue, in UNIX, schedule is directly scheduled, while in windows, schedule is awakened asynchronously, and priority is adjusted, because it is asynchronous in nature. Linux has recently learned the advantages of UNIX and windows, so direct scheduling and asynchronous wake-up can be seen in the kernel, which is good but messy and fast to change, I am worried that if Linux does not have a fundamental design concept, will it get out of control as its complexity increases and Code expands ?!
Is kernel stability related to their design philosophy? To put it simply, Windows is based on priority. DPC runs at dispatch_level and cannot be preemptible except for hard interruption of the user process. If it is sleep in the kernel ..... in Linux, kernel threads and user threads are uniformly scheduled. Of course, they will not be frequently suspended .... from the perspective of soft thread interruption, softirq in Linux cannot be sleep, because you cannot ensure that every softirq is executed in the kernel thread context, but this is not a problem, now all the new kernels will be interrupted to the thread. What else can't be done? In Windows, the interrupt request priority is actually an abstraction, which separates the actual execution hardware or software from the execution itself, which is very good, simply increasing or decreasing the priority and adding a basic principle can replace the many locking and unlocking operations in Linux to disable preemption.
Whether it is Linux or windows, it uses a unified and simple idea, but it is used in different places. The VFS provided by Linux should be thread-oriented as much as possible, and the priority control of windows should be implemented, objectiveness .....

Windows and Linux are two major systems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.