1. Typical I/O model
According to section 6.2 of the Unix Network programming Volume 1, the typical I/O model supported by Linux systems consists of the following 5 types:
- Blocking I/O (blocking I/O)
- Non-blocking I/O (nonblocking I/O)
- I/O multiplexing (I/O multiplexing, e.g. select and poll)
- Signal-driven I/O (signal driven I/O, e.g. SIGIO)
- asynchronous I/O (asynchronous I/O, e.g. the POSIX aio_* functions)
For an input operation, you can split into two typical stages:
1) Wait for data ready
2) copy data from kernel buffer to the application process buffer (before this, the data will be copied from socket buffer to kernel buffer)
All the I/O models contain these two phases, but wait for the data to be implemented differently, as illustrated below.
2. Blocking I/O Model
The blocking I/O model is the simplest I/O model, as the name implies, when the blocking IO model is used, the entire application process is blocked while waiting for the data of a socket to be ready.
Take the recvfrom function called UDP socket blocking mode as an example, as shown in the procedure.
It is visible that when Recvfrom is called, the process blocks until the function receives data and returns, while during blocking, the process is in a sleep state. Obviously, this IO model has a more serious performance problem.
3. nonblocking I/O Model
When the socket is set to non-blocking mode, we expect the kernel to complete the following actions:
When an I/O operation that I request cannot is completed without putting the process to sleep, does not put the process To sleep and return an error instead.
Take the recvfrom function called UDP socket nonblocking mode as an example, as shown in the procedure.
By visible, when Recvfrom is lowered in nonblocking mode, if the data is not ready, the function immediately returns a non-0 return value (such as Ewouldblock in) that represents the error.
The application should check the return value and call Recvfrom repeatedly until the Recvfrom function receives the normal data and returns a value of 0. this cyclic invocation of recvfrom on non-blocking FD is called polling.
In non-blocking mode, it is obvious that an application needs to actively poll kernel to determine whether the data is ready, which is a waste of CPU resources .
4. I/O multiplexing Model
In IO multiplexing mode, the application invokes select or poll before the actual data read and write operation, and if the socket read/write condition is not met, the program blocks at the Select or poll function. When the function returns, it indicates that the socket satisfies the read/write condition, and the application calls the real IO operation function to read and write the data.
As an example of the IO multiplexing mode for UDP sockets, the process is shown below.
By visible, when a select is called, the process blocks at that function, and if the subsequent data is ready, then select returns a readable FD, followed by a data read on the FD call Recvfrom.
From the point of view, the IO multiplexing model does not seem to have an advantage over the blocking IO model described earlier, and even one more system call.
In fact, if the application operates with only 1 FD, the IO multiplexing mode via select does not have the advantage, but when the process operates on more than 1 FD, the advantage of select will be reflected, at this time, these FD through select for unified Management, This greatly simplifies the programming implementation details.
However, in the current kernel Select implementation code, the FD limit it can manage is only 1024 by default, and it is internally traversed to determine whether an FD is readable/writable. Therefore, even though the kernel related code can be modified to increase its managed FD caps, traversing the FD array is still a linear operation. Therefore, when the number of FD is large, the IO multiplexing model implemented by select or poll also has performance problems .
5. Signal-driven I/O Model
Several of the IO models described above need to be blocked or polled to determine whether an FD is readable or writable, and they have a performance bottleneck in high-concurrency network scenarios.
In view of this, Linux kernel introduces the following IO model:
After the application creates a signal-driven socket, the kernel registers the callback function of the sigio signal with the signal registration mechanism supported by the operating system, after which the application can do other tasks that do not depend on the socket data. When an FD satisfies the read-write condition, kernel invokes the callback function actively, and the application can read and write data to the given FD in the callback function, or notify the main process in the callback function to read and write the data.
Take the Signal-driven IO model of the UDP socket as an example, as shown in the procedure.
The advantage of the signal-driven IO model is that process blocking is really avoided while waiting for the data to be ready, and the main process can perform other operations, and kernel will notify the application to read/write the data through the callback function after the target FD has the appropriate data ready.
From the processing process, it is already an asynchronous pattern, and the only difference between it and the POSIX-compliant asynchronous model described below is that, under the model, when FD is read/write, kernel notifies the application via a callback function While the POSIX specification Async model introduces a set of asynchronous IO operations functions and the IO operation completes, kernel notifies the application. That is , they differ only in the time kernel notifies the application process, and the rest of the process is similar.
It should be stated that not all Linux distributions support the Signal-driven IO model, and in practice it is necessary to use the system manual to verify that the model is available.
6. Asynchronous I/O Model
The asynchronous IO model is defined in the POSIX specification and, as a general rule, it works by invoking the asynchronous IO function to tell kernel to start the read/write operation, and the call to the asynchronous IO function returns immediately when the IO operation is completed (in the case of the read operation, The completion of the operation means that the data has been copied from the kernel buffer to the process buffer, and kernel notifies the application process based on how the asynchronous IO function parameters are set.
Take the asynchronous IO model of the UDP socket as an example, as shown in the procedure.
From the visible, asynchronous IO model, the application process will not be notified until the data from the socket corresponding to the copy process of the kernel buffer to process buffer has been completed.
The difference between the POSIX specification asynchronous IO model and the Signal-driven IO model has been explained before and is not mentioned here.
It is necessary to note that not all Linux distributions support the POSIX compliant asynchronous model, which needs to be confirmed before use.
7. Summary comparison of I/O models
The following is a comparison of the 5 typical I/O models.
The summary notes are as follows:
- Under the blocking model, the process is blocked from the application process calling IO operation function to function return;
- nonblocking model, when an application invokes an IO operation function, the function returns immediately, but the application needs to poll the kernel for read/write operations by constantly invoking the IO operation function;
- IO multiplexing model, when an application calls Select or poll, the process blocks until the Select managed FD is read/write, and when Select or poll returns, the application needs to invoke the real IO operation function for read/write operations;
- Under the Signal-driven model, the application registers a signal callback function with the kernel, and when the target FD is read/write, the kernel notifies the application through the callback to read/write the data. It avoids the process blocking ;
- Asynchronous IO Model, when an application invokes an asynchronous IO function, the function returns immediately, and when the function completes the real IO operation, kernel notifies the application to follow up. It also avoids the process blocking .
8. Additional Information
the POSIX specification defines synchronization (synchronous) and asynchronous (asynchronous) as follows:
A Synchronous I/O operation causes the requesting process to being blocked until that I/O operation completes.
An asynchronous I/O operation does not cause the requesting process to be blocked.
According to this definition, the first 4 IO models described above are synchronous because the real IO operation function blocks the process while the data is read/written, and only the asynchronous IO model really conforms to the definition of asynchronous IO operations .
in particular , the current popular web Server, such as Nginx, usually manages FD through the Epoll or kqueue provided by the kernel.
Taking Epoll as an example, its working mode is similar to the I/O multiplexing model described in this article, except that when the FD (s) is managed to satisfy the read/write condition, the kernel obtains the FD through callback notification Epoll, and the epoll_wait of the application call returns these read/write FD to the application While implementing IO multiplexing mode with select mode, FD conforming to read/write conditions is obtained by traversing the entire FD array through select, and it is obvious that the FD triggering mode in Epoll mode is more efficient. It is also because the callback trigger avoids linear traversal, the epoll can be managed with a large number of FD and does not affect the triggering performance .
Because Epoll is event-driven (its event-triggering approach is divided into two kinds of edge triggered and level triggered, the difference can be viewed through man epoll), the IO operation mode with Epoll is also known as Event-driven I/O model .
In Epoll mode, because epoll_wait is usually a blocking call, Epoll is a blocking model , and because of its managed FD read/write conditional triggering, the callback function actually reads the data/ The Write IO operation will still block (the block here refers to the process of copying data from the kernel buffer to the application process buffer, the main thread of the process is blocked, the blocking time depends on the amount of data), so the Epoll is a synchronous model from the POSIX specification to the synchronous/asynchronous definition .
"References"
- Unix Network Programming, Volume 1, chapter6.2:i/o Models
- Stackoverflow:what is the status of POSIX asynchronous I/O (AIO)?
- Wikipedia:asynchronous I/O
==================== EOF =================
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Welcome to the Csdn-markdown Editor