Original: http://www.ibm.com/developerworks/cn/linux/l-async/
The most common input/output (I/O) model in Linux® is synchronous I/O. In this model, when a request is made, the application blocks until the request is satisfied. This is a good solution because the calling application does not need to use any central processing unit (CPU) when it waits for the I/O request to complete. However, in some cases, I/O requests may need to overlap with other processes. This functionality is provided by the Portable Operating System interface (POSIX) asynchronous I/O (AIO) application Interface (API). In this article, we'll cover this API overview and find out how to use it.
10 Reviews:
M. Tim Jones ([email protected]), consultant engineer, Emulex
September 28, 2006
Develop and deploy your next application on the IBM Bluemix cloud platform.
Get started with your trial
AIO Introduction
Linux asynchronous I/O is a fairly new enhancement provided in the Linux kernel. It is a standard feature of the 2.6 kernel, but we can also find it in the patch for version 2.4 kernel. The basic idea behind AIO is to allow a process to initiate many I/O operations without blocking or waiting for any operations to complete. A process can retrieve the results of an I/O operation later or when it receives a notification that the I/O operation is complete.
I/O model
Before going into the AIO API, let's explore the different I/O models available on Linux. This is not an exhaustive introduction, but we will try to introduce some of the most commonly used models to explain the difference between them and asynchronous I/O. Figure 1 shows the synchronous and asynchronous models, as well as the blocking and non-blocking models.
Figure 1. Simple matrices for basic Linux I/O models
Each I/O model has its own usage pattern, which has its own advantages for a particular application. This section will briefly describe each of them.
Synchronization blocking I/OI/O-intensive vs. CPU-intensive processes
I/O intensive processes perform more I/O operations than do processing operations. CPU-intensive processes perform more processing operations than I/O operations. The Linux 2.6 Scheduler actually prefers I/O intensive processes because they typically initiate an I/O operation and then block, which means that other work can be effectively interleaved between the two.
One of the most common models is the synchronous blocking I/O model. In this model, the user-space application executes a system call, which causes the application to block. This means that the application will block until the system call is complete (data transfer is complete or an error occurs). The calling application is in a state where the CPU is no longer consumed and simply waits for a response, so it is very effective from a processing point of view.
Figure 2 shows the traditional blocking I/O model, which is one of the most commonly used models in the current application. Its behavior is very easy to understand, and its usage is very effective for typical applications. When the read
system call is invoked, the application blocks and the kernel is context-switched. The read operation is then triggered, and when the response returns (from the device we are reading from), the data is moved to the buffer of the user space. The application will then unblock ( read
call back).
Figure 2. Typical flow of a synchronous blocking I/O model
From the application's point of view, the read
call lasts for a long time. In fact, when the kernel performs read operations and other work, the application is indeed blocked.
Synchronous non-blocking I/O
A slightly less efficient variant of synchronous blocking I/O is synchronous non-blocking I/O. In this model, the device is opened in a non-blocking form. This means that the I/O operation does not complete immediately, and the read
operation may return an error code stating that the command is not immediately satisfied ( EAGAIN
or EWOULDBLOCK
), as shown in 3.
Figure 3. Typical process for synchronizing non-blocking I/O models
The non-blocking implementation is that the I/O command may not be immediately satisfied, requiring the application to call many times to wait for the operation to complete. This can be inefficient, because in many cases, when the kernel executes this command, the application must be busy waiting until the data is available, or trying to perform other work. As shown in 3, this method can introduce the latency of the I/O operation because there is a certain interval between the data being available in the kernel to the return data of the user call read
, which results in a decrease in overall data throughput.
Asynchronous blocking I/O
Another blocking solution is non-blocking I/O with blocking notifications. In this model, a non-blocking I/O is configured, and then select
a blocking system call is used to determine when an I/O descriptor is operational. The select
interesting thing about making a call is that it can be used to provide notifications for multiple descriptors, not just one descriptor. For each prompt, we can request that the descriptor can write data, have read data available, and notify if there is an error.
Figure 4. Typical process for asynchronous blocking I/O models (SELECT)
select
The main problem with calling is that it is not very efficient. Although this is a convenient model for asynchronous notifications, it is not recommended for high-performance I/O operations.
Asynchronous non-blocking I/O (AIO)
Finally, the asynchronous nonblocking I/O model is a model that handles I/O overlap. The read request returns immediately, stating that read
the request has been successfully initiated. When the read operation is completed in the background, the application then performs other processing operations. When read
the response arrives, a signal is generated or a thread-based callback function is executed to complete the I/O process.
Figure 5. Typical processes for asynchronous nonblocking I/O models
The ability to overlap compute operations and I/O processing in a process to perform multiple I/O requests takes advantage of the difference between processing speed and I/O speed. When one or more I/O requests are suspended, the CPU can perform other tasks, or, more commonly, the I/O already completed while initiating additional I/O.
The next section will delve into this model, explore the APIs used by this model, and then show a few commands.
Back to top of page
Motivation for asynchronous I/O
From the classification of the front I/O model, we can see the motive of AIO. This blocking model requires blocking the application at the start of the I/O operation. This means that processing and I/O operations cannot overlap at the same time. The synchronous nonblocking model allows processing and I/O operations to overlap, but this requires the application to check the status of I/O operations against the recurring rules. This leaves asynchronous nonblocking I/O, which allows processing and I/O operations to overlap, including notification of completion of I/O operations.
In addition to blocking, select
functions provide functionality (asynchronous blocking I/O) similar to AIO. However, it is blocking the notification event rather than blocking the I/O call.
Back to top of page
Introduction to AIO on Linux
This section explores the asynchronous I/O model of Linux to help us understand how to use this technology in applications.
In the traditional I/O model, there is an I/O channel identified with a unique handle. In UNIX®, these handles are file descriptors (this pair is equivalent to files, pipes, sockets, and so on). In blocking I/O, we initiate a transport operation that returns the system call when the transfer operation is complete or an error occurs.
AIO on Linux
AIO first appeared in the 2.5 kernel and is now a standard feature of the 2.6 version of the product kernel.
In asynchronous nonblocking I/O, we can initiate multiple transport operations at the same time. This requires a unique context for each transport operation, so that we can distinguish which transfer operation is complete when they are complete. In Aio, this is a aiocb
(AIO I/O Control Block) structure. This structure contains all the information about the transfer, including the user buffers that are prepared for the data. When generating I/O (called completion) notifications, the aiocb
structure is used to uniquely identify the completed I/O operation. The presentation of this API shows how to use it.
Back to top of page
AIO API
The API for AIO interfaces is very simple, but it provides the necessary functionality for data transfer and gives two different notification models. Table 1 shows the interface functions for AIO, which is described in more detail later in this section.
Table 1. AIO Interface API
API Functions |
Description |
aio_read |
Request asynchronous read operation |
aio_error |
Check the status of an asynchronous request |
aio_return |
Gets the return status of the completed asynchronous request |
aio_write |
Request asynchronous write operation |
aio_suspend |
Suspends the calling process until one or more asynchronous requests have completed (or failed) |
aio_cancel |
Canceling an asynchronous I/O request |
lio_listio |
Initiating a series of I/O operations |
Each API function aiocb
starts or checks with a struct. This structure has many elements, but listing 1 gives only the elements that you need (or can) use.
Listing 1. Related fields in the AIOCB structure
struct AIOCB { int aio_fildes; File descriptor int aio_lio_opcode; Valid only for Lio_listio (R/W/NOP) volatile void *aio_buf; Data Buffer size_t aio_nbytes; Number of Bytes in Data Buffer struct sigevent aio_sigevent;//Notification Structure/ * Internal fields */
...};
sigevent
The structure tells the AIO what to do when the I/O operation is complete. We will explore this structure in the AIO display. Now we'll show how the API functions of each AIO work and how we should use them.
Aio_read
aio_read
The function requests a valid file descriptor for an asynchronous read operation. This file descriptor can represent a file, socket, or even pipeline. aio_read
the prototype of the function is as follows:
Aio_read (struct AIOCB *AIOCBP);
aio_read
The function returns immediately after the request is queued. If the execution succeeds, the return value is 0, and if an error occurs, the return value is-1, and errno
the value is set.
To perform a read operation, the application must aiocb
initialize the structure. The following short example shows how to populate the aiocb
request structure and use it aio_read
to perform asynchronous read requests (now temporarily ignore notifications). It also shows aio_error
the usage, but we will explain it later.
Listing 2. Examples of asynchronous read operations using Aio_read
#include <aio.h> .... int FD, RET; struct AIOCB MY_AIOCB; FD = open ("file.txt", o_rdonly); if (FD < 0) perror ("open"); /* Zero out the AIOCB structure (recommended) */bzero ((char *) &MY_AIOCB, sizeof (struct AIOCB)); /* Allocate A data buffer for the AIOCB request */my_aiocb.aio_buf = malloc (bufsize+1); if (!my_aiocb.aio_buf) perror ("malloc"); /* Initialize the necessary fields in the AIOCB */my_aiocb.aio_fildes = FD; My_aiocb.aio_nbytes = BUFSIZE; My_aiocb.aio_offset = 0; ret = Aio_read (&MY_AIOCB); if (Ret < 0) perror ("Aio_read"); while ( Aio_error (&MY_AIOCB) = = einprogress); if (ret = Aio_return (&MY_IOCB)) > 0) {/* got ret bytes on the read */} else {/* read failed, consult E Rrno */}
In Listing 2, after opening the file from which to read the data, we emptied the aiocb
structure and then allocated a data buffer. And put a reference to the data buffer aio_buf
in. We will then aio_nbytes
initialize the size of the buffer. and aio_offset
set to 0 (the first offset in the file). We will aio_fildes
set the file descriptor to read the data from. After these fields have been set, the aio_read
request is read. We can then call aio_error
to determine aio_read
the state. As long as the state is EINPROGRESS
, keep busy waiting until the state changes. The request may now succeed or fail.
Using the AIO interface to compile the program
We can aio.h
find function prototypes and other required symbols in the header file. When compiling a program that uses this interface, we must use the POSIX Live Extension Library ( librt
).
Note that using this API is very similar to reading the contents of a standard library function from a file. In addition to aio_read
some asynchronous features, the other difference is the setting of the read operation offset. In a traditional read
invocation, offsets are maintained in the context of the file descriptor. For each read operation, the offset needs to be updated so that subsequent read operations can address the next piece of data. This is not possible for asynchronous I/O operations because we can perform many read requests at the same time, so you must specify an offset for each specific read request.
Aio_error
aio_error
The function is used to determine the state of the request. The prototype is as follows:
Aio_error (struct AIOCB *AIOCBP);
This function can return the following content:
EINPROGRESS
Stating that the request has not been completed
ECANCELLED
Stating that the request was canceled by the application
-1
, indicating that an error has occurred and that the specific cause of the error can be consultederrno
Aio_return
Another difference between asynchronous I/O and standard block I/O is that we cannot immediately access the return state of this function because we are not blocking on the read
call. In a standard read
call, the return state is provided when the function returns. But in asynchronous I/O, we're going to use a aio_return
function. The prototype of this function is as follows:
Aio_return (struct AIOCB *AIOCBP);
aio_error
This function is called only after the call determines that the request has completed, possibly successfully, or an error has occurred. The return value is equivalent to the return value of the aio_return
synchronization condition read
or write
system call (the number of bytes transferred, or the return value if an error occurs -1
).
Aio_write
aio_write
The function is used to request an asynchronous write operation. Its function prototype is as follows:
Aio_write (struct AIOCB *AIOCBP);
aio_write
The function returns immediately, stating that the request has been queued (the return value on success is, and the 0
return value on Failure is -1
set accordingly errno
).
This is read
similar to a system call, but there is a bit of a different behavior to be aware of. Recall that the read
offset to use is very important for the call. However, write
this offset is important only in the context of a file that does not have O_APPEND
an option set. If set O_APPEND
, the offset is ignored and the data is appended to the end of the file. Otherwise, the aio_offset
domain determines the offset of the data in the file to be written.
Aio_suspend
We can use aio_suspend
a function to suspend (or block) the calling process until the asynchronous request is complete, at which point a signal is generated, or another timeout operation occurs. The caller provides a aiocb
list of references, where any completion will result in a aio_suspend
return. aio_suspend
The function prototype is as follows:
Aio_suspend (const struct AIOCB *const cblist[], int n, const struct TIMESPEC *timeout);
aio_suspend
is very simple to use. We want to provide a aiocb
list of references. If any one is finished, the call will be returned 0
. Otherwise, it will return -1
, indicating that an error has occurred. See Listing 3.
Listing 3. Blocking asynchronous I/O using the Aio_suspend function
Aio_read Aio_suspend (Cblist, Max_list, NULL);
Note that aio_suspend
the second argument is the cblist
number of elements in the element, not aiocb
the number of references. cblist
NULL
will be ignored for any element in the aio_suspend
If a time-out is provided and the aio_suspend
timeout does occur, it will be returned -1
and errno
included EAGAIN
.
Aio_cancel
aio_cancel
The function allows us to cancel one or all of the I/O requests performed on a file descriptor. The prototype is as follows:
Aio_cancel (int fd, struct AIOCB *aiocbp);
To cancel a request, we need to provide the file descriptor and aiocb
reference. If the request is successfully canceled, the function returns AIO_CANCELED
. If the request is complete, the function returns AIO_NOTCANCELED
.
To cancel all requests for a given file descriptor, we need to provide a descriptor for the file and a reference to the aiocbp
pair NULL
. If all requests are canceled, the function returns, and AIO_CANCELED
if at least one request is not canceled, the function returns, and AIO_NOT_CANCELED
if no request can be canceled, the function returns AIO_ALLDONE
. We can then use it aio_error
to validate each AIO request. If the request has been canceled, it will be aio_error
returned and -1
will be errno
set to ECANCELED
.
Lio_listio
Finally, AIO provides a way to use lio_listio
API functions to initiate multiple transmissions at the same time. This function is important because it means that we can start a large number of I/O operations in a system call (one kernel context switch). From a performance point of view, this is very important, so it's worth taking a moment to explore. lio_listio
the prototype of the API function is as follows:
Lio_listio (int mode, struct AIOCB *list[], int nent, struct sigevent *sig);
mode
Parameters can be either LIO_WAIT
or LIO_NOWAIT
. LIO_WAIT
will block this call until all I/O is complete. After the operation is queued, LIO_NOWAIT
it is returned. list
is a aiocb
list of references, the maximum number of elements is nent
defined by. Note that list
the element can be NULL
, and lio_listio
it will be ignored. sigevent
A reference defines a method that generates a signal when all I/O operations are complete.
lio_listio
the request for is slightly different from the traditional read
or the write
requested operation that must be specified, as shown in Listing 4.
Listing 4. Use the Lio_listio function to initiate a series of requests
Lio_listio (Lio_wait, List, max_list, NULL);
For read operations, aio_lio_opcode
the value of the field is LIO_READ
. For write operations, we want to use them LIO_WRITE
, but LIO_NOP
they are also valid for non-performing operations.
Back to top of page
AIO Notifications
Now that we've seen the available AIO functions, this section provides an in-depth look at the methods you can use for asynchronous notifications. We will explore the notification mechanism for asynchronous functions through signal and function callbacks.
Using signals for asynchronous notifications
Using Signals for interprocess communication (IPC) is a traditional mechanism in UNIX, and AIO can also support this mechanism. In this example, the application needs to define a signal handler, which is invoked when the specified signal is generated. The application then configures an asynchronous request to generate a signal when the request is completed. As part of the signal context, a specific aiocb
request is provided to record multiple requests that may occur. Listing 5 shows this method of notification.
Listing 5. Using signals as a notification for AIO requests
void Setup_io (...) {int FD; struct Sigaction sig_act; struct AIOCB MY_AIOCB; .../* Set up the signal handler */Sigemptyset (&sig_act.sa_mask); Sig_act.sa_flags = Sa_siginfo; Sig_act.sa_sigaction = Aio_completion_handler; /* Set up the AIO request */bzero ((char *) &MY_AIOCB, sizeof (struct AIOCB)); My_aiocb.aio_fildes = FD; My_aiocb.aio_buf = malloc (buf_size+1); My_aiocb.aio_nbytes = buf_size; My_aiocb.aio_offset = Next_offset; /* Link the AIO request with the Signal Handler */my_aiocb.aio_sigevent.sigev_notify = sigev_signal; My_aiocb.aio_sigevent.sigev_signo = SIGIO; My_aiocb.aio_sigevent.sigev_value.sival_ptr = &my_aiocb; /* Map the Signal to the Signal Handler */ret = sigaction (SIGIO, &sig_act, NULL); ... ret =Aio_read(&MY_AIOCB);} void Aio_completion_handler (int signo, siginfo_t *info, void *context) {struct AIOCB *req; /* Ensure it s our signal */if (Info->si_signo = = SIGIO) {req = (struct AIOCB *) info->si_value.sival_ptr; /* Did the request complete? */if (Aio_error(req) = = 0) {/* Request completed successfully, get the return status */ret =Aio_return(req); }} return;
In Listing 5, we set the aio_completion_handler
signal handler in the function to capture the SIGIO
signal. The initialization aio_sigevent
structure then generates a SIGIO
signal to be notified (this is sigev_notify
specified by the SIGEV_SIGNAL
definition in). When the read operation is complete, the signal handler extracts from the si_value
structure of the signal aiocb
and checks the error status and return status to determine if the I/O operation is complete.
For performance, this handler is also an ideal place to continue I/O operations by requesting the next asynchronous transfer. In this way, once the data transfer is complete, we can start the next data transfer operation immediately.
Using callback functions for asynchronous notification
Another type of notification is the system callback function. This mechanism does not generate a signal for notifications, but instead invokes a function of the user space to implement the notification function. We set the reference in the sigevent
structure to aiocb
uniquely identify the specific request that is being completed. See listing 6.
Listing 6. Use thread callback notifications for AIO requests
void setup_io (...) {int FD; struct AIOCB MY_AIOCB; .../* Set up the AIO request */bzero ((char *) &MY_AIOCB, sizeof (struct AIOCB)); My_aiocb.aio_fildes = FD; My_aiocb.aio_buf = malloc (buf_size+1); My_aiocb.aio_nbytes = buf_size; My_aiocb.aio_offset = Next_offset; /* Link the AIO request with a thread callback */my_aiocb.aio_sigevent.sigev_notify = Sigev_thread; My_aiocb.aio_sigevent.notify_function = Aio_completion_handler; My_aiocb.aio_sigevent.notify_attributes = NULL; My_aiocb.aio_sigevent.sigev_value.sival_ptr = &my_aiocb; ... ret = aio_read (&MY_AIOCB);} void Aio_completion_handler (sigval_t sigval) {struct AIOCB *req; req = (struct AIOCB *) sigval.sival_ptr; /* Did the request complete? */if ( aio_error (req) = = 0) {/* Request completed successfully, get the return status */ret = Aio_return (req); } return;
In Listing 6, after creating your own aiocb
request, we use the SIGEV_THREAD
request for a thread callback function as the notification method. Then we will specify a specific notification handler and load the context that will be transferred into the handler (in this case, a reference to the aiocb
request itself). In this handler, we simply refer to the arriving sigval
pointer and use the AIO function to verify that the request has been completed.
Back to top of page
System optimization for AIO
The proc file system contains two virtual files that can be used to optimize the performance of asynchronous I/O:
- The/proc/sys/fs/aio-nr file provides the current number of system-wide asynchronous I/O requests.
- The/proc/sys/fs/aio-max-nr file is the maximum number of concurrent requests allowed. The maximum number is usually 64KB, which is sufficient for most applications.
Back to top of page
Conclusion
Using asynchronous I/O can help us build applications that are faster and more efficient than I/O. If our application can overlap processing and I/O operations, AIO can help us build applications that can use the available CPU resources more effectively. Although this I/O model differs from the traditional blocking pattern used in most Linux applications, the asynchronous notification model is conceptually simple enough to simplify our design.
[Reprint] Use asynchronous I/O to dramatically improve application performance