The first thing to understand is the concept of non-buffering: the so-called non-buffering, does not mean that the kernel does not provide buffering, but only a simple system call, not a function library call. In-system check disk reads and writes will provide a block buffer (in some places also known as the kernel cache), when writing data with the Write function, call the system call directly, write data to block buffer to queue, when the block buffer reaches a certain amount, the data will be written to disk. So so-called unbuffered I/O means that the process does not provide buffering (but the kernel still provides buffering). Each time a write or read function is called, a direct system call is called.
While buffered I/O is a process to improve the input and output stream, provides a stream buffer, when using the Fwrite function network disk write data, the data is written to the stream buffer, when a certain condition, such as a stream buffer full, or refresh the stream buffer, this time will send the data once to the kernel to provide the block buffer, It is then written to disk by block Buffering. (Double buffering)
As a result, buffered I/O will have fewer calls to the system than non-buffered I/O when it writes the same amount of data to the disk.
Look at normal, how the read-write file interacts with the disk is a process.
When an application attempts to read a piece of data, if the piece of data is already in the page cache, the piece of data can be returned to the application immediately without having to go through the actual physical read disk operation. Of course, if the data is not stored in the page cache before the application is read (that is, the kernel cache mentioned above), then the data needs to be read from disk to the page cache first. For a write operation, the application will also write the data to the page cache (which is said to write to the page cache, if it is called standard library I/O to write, then the first is written to the standard library buffer area, if the standard library buffer is full, after writing to the page buffer; , whether the data is written to disk immediately depends on the write mechanism used by the application: If the user is using a synchronous write mechanism, then the data is immediately written back to disk, the application waits until the data is written, and if the user is using a deferred write mechanism, then the application does not have to wait until the data is fully Write back to the disk, and the data is only written to the page cache. In the case of a deferred write mechanism, the operating system periodically brushes the data that is placed in the page cache to disk. Unlike the asynchronous write mechanism, the deferred write mechanism does not notify the application when the data is fully written to disk, and the asynchronous write mechanism is returned to the application when the data is fully written to disk. So the delay write mechanism of the province is the risk of data loss, and the asynchronous write mechanism will not have this concern.
Let's talk about I/O without buffering
Without caching, the disk files are not read directly, like the read () and write () functions, they are all part of the system call, but there is no cache at the user level, so called no cache io, but for the kernel, it is cached, but the user layer does not see it.
With no cache is relatively, if you want to write data to the file (write to disk), the kernel first writes the data to the buffer memory set in the kernel, if the buffer memory length is 100 bytes, you call the system letter:
ssize_t write (int fd,const void * buf,size_t count);
Write operation, set each write length count=10 bytes, then you have to call 10 times this function to fill the buffer, when the data is still in the buffer, and not write to the disk, the buffer full before the actual IO operation, the data is written to disk, so the above said "without cache" " Not without the cache, but not directly into the disk that's what this means (since no disk is written, the call system call can see what is written in the file because the kernel control is shared)
So, since the operation without caching actually has a cache in the kernel, what about the IO operation with the cache?
With the cache IO also called standard IO, conforms to ANSI C standard IO processing, does not depend on the system kernel, so the portability is strong, we use the standard IO operation many times to reduce to the read () and write () the system call number, with the cache Io actually is in the user layer to establish a buffer zone, The allocation of this buffer and the optimization of the length of the details are standard IO library for you to deal with, do not worry about, or use the above example to illustrate the operation process:
It says to write the data to the file, the kernel cache (note that this is not the user layer buffer) area length is 100 bytes, we call the IO function without the cache write () is called 10 times, so that the system is inefficient, now we create another buffer in the user layer (user layer buffer or call stream cache), Assuming the stream cache length is 50 bytes, we use the standard C library function fwrite () to write the data into this stream buffer, the stream buffer is 50 bytes into the kernel buffer, and then call the system function write () writes the data into the kernel buffer, if the kernel buffer is also filled, or the kernel for fflush operation, then the kernel buffer in the data will not be written to the file (essentially a disk), see here, you should understand that, the standard IO Operation fwrite () and finally fall off with no cache io operation write, here made two calls fwrite () Write 100 bytes That is two times the system call write ().
If you see there is no point, it is more trouble, I hope the following two summary can help:
No cache IO operation Data flow path: data--Kernel buffer--disk
Standard IO operation Data flow path: data--stream buffer--kernel buffer--disk
Here is a netizen's opinion, for reference:
Without the cached I/O to the file descriptor operation, the I/O with the cache below is for the stream.
The standard I/O library is I/O with cache, which is described by the ANSI C standard. Of course, standard I/O will eventually invoke the above I/O routines. The standard I/O library handles many details in lieu of users, such as cache allocation, performing I/O with optimized lengths, and so on.
The purpose of the standard I/O caching is to reduce the number of calls to read and write, which is automatically cached for each I/O stream (standard I/O functions typically call malloc to allocate the cache).
The following is what I found on the Internet to understand the two, I think it is still in place:
The following is a discussion of the difference between buffered and unbuffered basic system io such as Open,write
The cached file operation is the implementation of the standard C library, and the standard library automatically allocates memory and reads a fixed-size content stored in the cache the first time a cached file operation function is called. So each subsequent read and write operation is not for the files on the hard disk directly, but for in-memory cache. When a file is read from the hard disk or written to the hard disk there is a mechanism to control the standard library. File operations without caching are usually system-provided system calls, more low-level, directly from the hard disk to read and write files, due to the IO bottleneck, the speed is not good, and the atomic operation requires the programmer's own assurance, but the use of the correct word efficiency is not bad. In addition, the cached file IO in the standard library is provided by the calling system without the cache IO implementation.
"The term without buffering refers to a system call in the kernel that each read and write call. All disk I/O is passed through the kernel's block buffer (also known as the kernel's buffer cache), with the exception of I/O to the original disk device. Since read or write data is buffered by the kernel, the term "unbuffered I/O" refers to the fact that the user's process does not automatically buffer the two functions, and each read or write makes a system call. "--------from <unix environmental programming >
The program opens with open and write and writes "Hello World" to the file Test.txt, using fopen and fwrite to manipulate the file test2.txt. Program execution to open and fopen, sleep 15 seconds, then with LS view generated file No, then opened with open Test.txt appeared, with fopen open Test2.txt also appeared; when the program finishes write and Fwrite, During 15 seconds of sleep, the test.txt is viewed with cat, with the content "Hello,world", but at this point the cat looks at the test2.txt and its contents are empty. After the end of sleep, close (FD) was executed, and then the cat was used to view the test2.txt and found its contents: "Hello,world". This example proves that open and write are not buffered, that is, the program executes its IO operation immediately, does not stay in the system-provided buffer, do not have to wait until the close operation is finished. Compared with the fopen and fwrite are buffered, (generally) to wait until the fclose operation is completed before execution.
The relevant source code examples are as follows:
#include <unistd.h> #include <iostream> #include <fcntl.h> #include <string> #include &
Lt;sys/types.h> #include <sys/stat.h> using namespace std;
int main () {int fd;
FILE *file;
Char *s= "hello,world\n"; if ((Fd=open ("Test.txt", o_wronly| o_creat,s_irusr| S_IWUSR)) (==-1) {cout<< "Error Open File" <<endl; return-1;} if ((File=fopen ("Test2.txt", "W") ==null) {cout << "Error Open File."
<<endl;
return-1; } cout<< "File has been opened."
<<endl;
Sleep (15);
if (Write (Fd,s,strlen (s)) <strlen (s)) {cout<< "Write Error" <<endl;
return-1; } if (Fwrite (s,sizeof (char), strlen (s), file) <strlen (s)) {cout<< "Write Error in 2."
<<endl;
return-1;
} cout<< "After write" <<endl;
Sleep (15); cout<< "after sleep."
<<endl;
Close (FD);
return 0; }
With ssize_t write (int filedes, const void *buff, size_t nbytes) and size_t fwrite (const void *ptr, size_t size, size_t nobj, FIL E *FP) is about the difference between I/O with cache and I/O without caching for UNIX systems.
The first thing to be clear is that the so-called cache does not refer to the buff parameters of the above two functions.
When the data is written to the file, the kernel writes the data to the cache, and if the cache is not full, it is not queued to the output queue until the cache is full or the kernel needs to re-use the cache again before it is queued to the input queue, until it arrives at the head of the team, and then the actual I/O operation, that is This technique is called delayed writing.
Now suppose the kernel cache is 100 bytes, if you use write, and the buff size is 10, when you want to write 9 same buff to the file, you need to call 9 write, that is, 9 times the system call, this time also did not write to the hard disk, if you want to write to the hard disk immediately, Call Fsync to perform the actual I/O operation.
Standard I/O, which is the I/O with cache, File*,file actually contains all the information needed to manage the flow: The actual I/O file descriptor, pointer to the stream cache (standard I/O cache, allocated by malloc, also known as the cache of the user-state process space, Different from the cache set by the kernel), the cache length, the current number of bytes in the cache, error flags, etc., assuming that the length of the stream cache is 50 bytes, to write the above data to a file, then only 2 system calls (Fwrite call the Write system call), because the data is written to the stream cache, The kernel cache is filled in after it is full or when fflush is called, so the system calls write 2 times.
Fflush sends all the uncommitted data to the kernel (kernel buffer), Fsync writes all of the kernel buffers to the file (disk). As to whether the file is written in the kernel buffer is no different for the process, if process A and process B open the same file, process a writes to the kernel I/O buffer data from process B can also be read, because the kernel space is process-shared,
The I/O buffers for the C standard library do not have this feature because the user space of the process is completely independent. (Personally, this is a very important sentence)
Read and write without the cache is explained in relation to the Fread/fwrite stream function, because Fread and fwrite are user functions (3), so they do a cache of data at the user level, and read/ Write is a system call (2) so they are not cached at the user level, so read and write are non-cached IO, in fact, the kernel is still cached, but the user layer can not see it.
The above describes the library buffering mechanism, which also refers to the concept of kernel buffers, in the end the value of the existence of kernel buffering is very good:
Why is it always necessary to swap data from a kernel buffer to a user buffer or vice versa?
A: User processes are data that runs in user space and cannot manipulate kernel buffers directly. When a user process makes a system call, it switches from the user state to the kernel state, and then returns to the user state after the kernel has finished processing .The application of buffer technology can obviously improve the system efficiency. The data exchange between the kernel and the peripheral devices, the data exchange between the kernel and the user space is time consuming, and the use of buffers is to optimize these time-consuming operations. In fact, the core to the user space operation itself is not buffer, is the I/O library with buffer to optimize the operation. Read, for example, is time-consuming when reading from the kernel, so take one piece at a time to avoid falling into the kernel multiple times.
The main idea of applying kernel buffers is to read a large amount of data in a buffer at a time, and get the data from the buffer when needed.
Switching between administrator mode and user mode takes time, but in contrast, disk I/O operations consume more time, and the kernel uses buffer technology to increase the speed of access to the disk for efficiency. A disk is a collection of blocks, and the kernel buffers the chunks of data on the disk. The kernel copies the data blocks on the disk into the kernel buffers, and when a process in a user space reads data from disk, the kernel typically does not read the disk directly, but instead copies the data in the kernel buffer into the buffer of the process. When the data block required by the process is not in the kernel buffer, the kernel joins the corresponding data block to the request queue, suspends the process, and then serves other processes. After a period of time (in fact a short time), the kernel reads the corresponding block of data from the disk into the kernel buffer, then copies the data into the process's buffer, and finally wakes up the suspended process.
Note: Understanding the principle of kernel buffer technology is helpful for better mastering the system call Read&write,read copy data from the kernel buffer to the process buffer, and write copies the data from the process buffers to the kernel buffers, which are not equivalent to the data being exchanged between the kernel buffer and the disk.
Theoretically, the kernel can write to the disk at any time, but not all write operations will cause the kernel to write. The kernel will temporarily present the data to be written in a buffer, accumulate to a certain number and write again. Sometimes it can lead to unexpected situations, such as a power outage, when the kernel is too late to write the data in the kernel buffers to the disk, and the updated data is lost.
The result of applying kernel buffering technology is to improve the I/O efficiency of disk, to optimize the write operation of disk, and to write buffer data to disk in time.