First, the article for the reason
Recently read the "UNIX Environment Advanced Programming", the previously more obscure knowledge structure has been further strengthened, especially the first two chapters on the non-buffered file I/O and buffered standard I/O, for Read, write, fread, fwrite, printf and so on, these functions have a new understanding. A great feeling is that we often programming development are only focus on the upper logic, although a project after the project, it seems to do a lot of things, but in the dead of night to think carefully, whether we really mastered these knowledge points, for each knowledge point to achieve the mechanism of whether we can say it completely. These things can best reflect a person's basic knowledge is solid, I found that the Internet company's interview most like to ask these basic knowledge, by a very basic function will be a number of layers of progressive extension of a lot of problems. Most of the time we may be very repulsive, even the basic knowledge, thinking about the use of the time, I will check, I focus on the upper logic is good, so as to help improve my development efficiency. There seems to be nothing wrong with this idea, but often this is the source of the bottleneck, and the most frightening thing for programmers is to encounter bottlenecks. Because the bottleneck of this thing is difficult to realize, blindly pursue practice and give up theoretical study, it is easy to encounter bottlenecks. (Personal opinion, do not like to spray)
This article is self-reading "UNIX Environment Advanced Programming" file I/O and standard I/O two reading notes, file I/o a chapter without buffering, but the back of the appearance can be buffered, make me a little dizzy, deliberately write down their understanding of this. If there is anything wrong, please point out, if you think this article is helpful to you, move finger recommendation, or powder me under, your attention is my writing the biggest motive force. ^_^
Second, the buffer mechanism
As we all know, CPU and memory data exchange is much larger than disk operation, through the caching mechanism, can reduce the number of disk read and write, improve the efficiency of concurrent handlers, therefore, the cache is an effective way to improve the efficiency of task storage and processing. Many times we can see that the cache is not only used in the operating system, but also in the Web technology, server-side, distributed systems and other fields play an important role.
From a macro point of view, the Linux operating system is divided into the user state and the kernel state, when processing I/O operations, both provide a cache. a user-state called a standard I/O cache, also known as a user-space cache, is called a buffer cache, also called a page cache, in the kernel state . Now that the cache is available, why does the book have no I/O cache and I/O cache, in fact, "without I/O cache" refers to the user space does not have a buffer for these I/O operations, and the kernel is buffered, so you will not be confused.
Iii. system I/O and standard I/O
System I/O, also known as file I/O, or kernel state I/O, refers to a file through a file descriptor, a file corresponding to a file descriptor . A file descriptor is represented by a non-negative integer, and the 0, 1, and 2 systems default to standard input, standard output, standard error, and some UNIX systems specify the descriptor's upper value Open_max, which are defined in the header file <unistd.h>. When a file is read or written, the file is identified by using the file descriptor returned by the open or create system call and passed as a parameter to the read or write system call.
#include <unistd.h>ssize_t read (intvoid *buf, size_t nbytes); ssize_t Write ( intconstvoid *buf, size_t nbytes);
Standard I/O, also known as user-state I/O, refers to files through a file stream (stream), generally using the fopen and Freopen functions to open a stream, return a pointer to the file object, other functions if you want to reference this stream, The file pointer is passed as a parameter. A process has pre-defined three streams, and these three streams are automatically used by processes, which are standard input streams, standard output streams, and standard error streams, which are the same files referenced by the three file descriptors specified by system I/O for three streams. When reading or writing a file, unlike system I/O, only the read and write system call functions are defined, and standard I/O defines multiple functions that programmers can use flexibly to their needs. These functions can be divided into one-character I/O, each line of I/O and direct I/O (or binary I/O, one object I/O, record-oriented I/O, structure-oriented I/O).
1) I/O for one character at a time
#include <sdio.h>/*Input Function*/intGETC (FILE *FP),MacrointFgetc (FILE *FP),functionintGetCharvoid) equivalent to GETC (stdin)/*output Function*/intPUTC (intC, FILE *FP)intFPUTC (intC, FILE *FP)intPutchar (intc) equivalent to PUTC (c, stdout)
2) Each line I/O
#include <stdio.h>/**/char *fgets (charint N, FILE * Restrict FP)char *gets (char *buf)/**/int Char *restrict str, FILE *restrict fp)int puts (constChar *str)
3) Direct I/O
#include <stdio.h>size_t fread (void *restrict ptr, size_t size, size_t nobj, FILE *restrict F P) size_t fwrite (constvoid *restrict ptr, size_t size, size_t nobj, FILE *restrict FP)
In this way, we've probably learned about the system I/O and standard I/O reference files, as well as some common I/O functions. The following is a detailed look at what the user and kernel states do when they call an I/O function, a further understanding of the role of caching in I/O operations, and the difference in execution efficiency between user-state I/O and kernel-state I/O.
Iv. flow of I/O operations
As shown, both the user process space and the kernel process space read and write disk operations go through the buffer cache, the role of caching mentioned before, is to reduce the number of disk read and write, improve the efficiency of I/O. When reading and writing a file, first look at the operating process of system I/O.
1, System I/O: is a kernel system call, does not involve the user state participation. Take the label of the figure as an example:
(3) called the Write function to write data to the file, BUF is the data to be written, such as write (FD, ' abc ', 3). The buffsize must be set before calling. different buffsize can affect I/O efficiency , and this is the problem.
(5) Delay write: When the cache is full or the kernel is rewriting the buffer, the data is written to the output queue, and so on when the data to the queue header, it really triggers the disk write operation.
(6) Pre-reading: When a sequential read is detected, the kernel tries to read more data than the application requires, and it is assumed that the application will soon read the data. This allows you to quickly populate the data you want to read the next time the buffer has no data.
(4) Call read from the buffer cache to read the required data into the logical unit for processing.
The above, is the system I/O involved in the four-step operation.
2. Standard I/O: a standard library function that is implemented by ISO C, which calls the underlying system call.
(1) write the data in the logical unit to the file, according to the requirements, there are three types of function can be called, in Fputc, fputs, fwrite for example, these functions do not have to control the size of the buffer, but the system automatically applied, when the user defined the corresponding I/O function, Depending on the type of cache (full-buffered, row-buffered, or unbuffered), the system automatically calls function request buffers such as malloc, which is the standard I/O cache.
(3) (5) When the user buffer is full, such as system I/O operations, call write copies data from the standard I/O cache to the kernel buffer and then to the disk.
(4) (6) With the system I/O operation, call read into the user buffer from the kernel buffer.
(2) There are also three types of functions can be called, in Fgetc, Fgets, fread as an example, read into the logical unit for subsequent processing.
As can be seen, the standard I/O implementation mechanism is based on system I/O, so it seems that standard I/O is less efficient than system I/O, but the fact is that standard I/O is not much slower than system I/O, and there are many other advantages, the following one by one (the most important of this article is the next section).
V. I/O efficiency
System I/O efficiency is limited by the number of read, write system calls, and the number of system calls is limited by the size of the kernel buffer, that is, buffsize, by setting different buffsize, the system CPU time is different, its minimum value appears in buffsize=4096 place, The reason is that the test uses a Linux ext2 file system with a block length of 4096 bytes, or the maximum buffer size that the buffer can request, and we treat 4096 bytes as the best I/O length. If you continue to expand the buffer size, there is little impact on this time. So, one of the biggest problems with system I/O operations is the need to manually control the size of the cache and the optimal I/O length, and the other is that system calls typically take more time than normal function calls . Because the system calls the specific kernel to do this: 1) kernel capture call, 2) Check the validity of system call parameters, 3) transfer data between user space and kernel space.
As a result, the purpose of introducing standard I/O is to avoid frequent system calls through standard I/O caches that are not improperly selected by Buffsize . Depending on the different needs of the user, choose different I/O functions, and then according to different cache types, automatically call malloc and other cache allocation functions allocated the appropriate cache, and so on after the allocation of cache full, then call system I/O from the standard I/O cache to the kernel cache copy data, thus further reducing the number of system calls.
However, different standard I/O functions, different cache types can also bring different efficiencies. For example, when choosing the best I/O length of the system, that is, the size of the buffsize and the file system block length consistent, you can get the best time. When using standard I/O functions, each time a character function fgetc, FPUTC and each line function fgets, fputs function compared to spend more CPU time, and each time a single byte call system I/O will take more time, if it is a 100M file, It takes about 200 million function calls, which causes 200 million system calls (from the user buffer to the kernel buffer to the disk), and the FGETC version performs 200 million function calls, but only about 25,222 system calls, so the time is greatly reduced.
Above all, standard I/O functions, although based on system I/O implementations, greatly reduce the number of system calls and do not care about the choice of buffer size, which improves the efficiency of I/O overall. In addition, standard I/O provides a variety of cache types that allow programmers to choose different cache requirements for different application requirements, increasing the flexibility of programming, which is equivalent to calling system I/O directly when no cache is selected.
OK, about the content of the above, of course, about I/O operation this block there are many points to note, and there are many more advanced I/O functions, which are encountered in the later to do a summary. Finally, if you think this article will help you to powder me, or that sentence, your attention is the greatest motivation of my writing.
I/O efficiency of Linux Quest