Buffer I/O and non-buffer I/O
File read/write mainly involves the following five operations: open, close, read, write, and locate. In Linux, two APIs are provided: fopen, fclose, fread, fwrite, and fseek, and POSIX-defined system APIs: open, close, read, write, and seek.
Among them, POSIX defines the system API, while C standard API is based on the encapsulation of system API, and provides additional buffering function. Therefore, they can also be called buffer I/O functions and non-buffer I/O functions.
In addition to the previously described buffer IO functions, the C standard library <stdio. h> also provides a series of encapsulated IO functions, such as puts, putchar, and printf.
Why is the buffer added? This is mainly because the OS needs to convert from user State to kernel state during IO operations, and the conversion process is relatively slow, therefore, you can reduce the number of times to convert to the kernel state through buffering. So how does the buffer I/O function work?
- When opening a file with fopen, in addition to allocating a file handle, an additional buffer is applied.
- When reading a file, the system first reads the buffer, and then returns the part required by the user. The remaining part is still in the buffer. You can directly return it from the buffer when you read the file again next time.
- When writing a file, it is first written to the buffer, and then written to the file after the buffer is full.
So, how should we choose which group of I/O functions?
Non-buffered I/O functions are written to the kernel every time they are read and written. Calling a system call is much slower than calling a function in a user space, therefore, it is necessary to open up the I/O buffer in the user space.
- When using the buffer I/O library function, always note that the I/O buffer and the actual file may be inconsistent. If necessary, call fflush ().
- I/O functions are also used to read and write devices, such as terminals or network devices. In this case, a faster response is usually required, and I/O functions are not buffered.
PS: strictly speaking, even posix I/O functions still have kernel I/O buffering, so write is not necessarily directly written to files, it may also be written to the kernel I/O buffer. As to whether it is written to a file or the kernel buffer, there is no big difference between processes. We don't need to pay too much attention to this.
Blocking I/O and non-blocking I/O
There are two methods for reading and writing files: blocking and non-blocking. Blocking is a common method. At this time, the function will be blocked until the operation is completed. Example:
# Include <unistd. h>
# Include
<Stdlib. h>
Int main (void)
{
Char buf [10];
Int n = read (STDIN_FILENO, buf, 10 );
Write (STDOUT_FILENO, buf, n );
Return 0;
}
When you execute this function, the read function is blocked until you enter data on the screen and press enter (STDIN has data available at this time.
A major problem with IO blocking is that concurrency cannot be achieved. When multiple I/O operations are performed at the same time, when the previous file data is unavailable (usually Socket and other IPC operations), subsequent I/O operations cannot be performed.
Non-blocking IO can solve this problem well. To use non-blocking IO operations, you need to specify the O_NONBLOCK flag during open. In this way, if the device is not readable,-1 is returned. The caller should try to read the data again (again ). This kind of behavior is called Poll. The caller only queries it, instead of blocking it, so that the caller can monitor multiple devices at the same time:
# Include
<Unistd. h>
# Include
<Fcntl. h>
# Include
<Stdlib. h>
Int main (void)
{
Char buf [10];
Int fd, n;
Fd = open ("/dev/tty", O_RDONLY | O_NONBLOCK );
While (1)
{
N = read (fd, buf, 10 );
If (n> = 0)
Break;
Sleep (1 );
}
Write (STDOUT_FILENO, buf, n );
Close (fd );
Return 0;
}
PS: in order to make the example function simple, I have not considered handling exceptions (such as open failure) Here, and these are necessary in actual projects.
Non-blocking I/O has a disadvantage. If no data has been reached for all devices, the caller needs to query them repeatedly, which keeps occupying the cpu. Therefore, when non-blocking I/O is used, it is usually not continuously queried in a while Loop (this is called Tight Loop), but every delay is waiting for a moment to query, in order to avoid too much useless work, other processes can be scheduled for execution when the wait is delayed.
However, this introduces a new problem, which may lead to insufficient data reading in a timely manner. In my previous example, I Sleep for one second during each loop. If the data is available at the beginning of Sleep, but cannot respond immediately at this time, you need to wait until the end of Sleep to output the result.
To solve this problem, you need to use the select function, which can monitor multiple devices at the same time in a blocking manner, and set the timeout time for blocking wait. Because select is usually used in socket programming scenarios, this is not a good example. If you will introduce socket programming in detail later, you can refer to this Article "select" to learn how it works.