(2) learning the file IO of APUE and the io of apue together
.
.
.
.
.
We discussed standard I/O yesterday. Today we mainly talk about system I/O.
1. file descriptor
The SYSIO contains an integer number, which is the file descriptor. For the kernel, all opened files are referenced by file descriptors. Its function is similar to the FILE struct in STDIO, but their working principle is completely different. It is actually an array subscript saved by the kernel, so it won't be a negative number. Below I will use a picture to represent its function.
Figure 1 SYSIO file descriptor
The figure is hard to find a drawing software under Ubuntu. The quality is not very good. Let's take a look at it first.
I explained what I drew.
The image consists of two parts: Standard I/O and system I/O. The system I/O part has an array, and the pointer in this array points to the structure of the specific description file information in the kernel, and then reference the specific file (inode) through these struct ). The file descriptor provided by the operating system is the subscript of this array. The length of this array is 1024, which means that a process can open up to 1024 files. Of course, this limit can be modified using the ulimt (1) command. The specific usage will not be described here.
When a file descriptor is generated, the minimum available value is used first. Assume that the currently occupied file descriptors are 1, 2, 3, and 5, then the generated file descriptor is 4.
Note that the file descriptor array above holds one copy for each process. Therefore, in theory, each process can open up to 1024 files, instead of all processes in the system, only 1024 files can be opened.
2. fileno (3)
1 #include <stdio.h>2 3 int fileno(FILE *stream);4 5 Feature Test Macro Requirements for glibc (see feature_test_macros(7)):6 7 fileno(): _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE
This function is used to obtain the sysio file descriptor from the FILE struct pointer of STDIO.
3. fdopen (3)
1 #include <stdio.h>2 3 FILE *fdopen(int fd, const char *mode);4 5 Feature Test Macro Requirements for glibc (see feature_test_macros(7)):6 7 fdopen(): _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE
This function is the opposite of the functions of the above flieno (3) function. The function is to convert the FILE descriptor of SYSIO to the FILE struct pointer of STDIO. The function of the mode parameter is the same as that of the mode parameter in fopen (3.
Although the two functions can be converted between STDIO and SYSIOWe do not recommend that you operate the same file in two ways at the same time.. Because the private data processed by STDIO and SYSIO between files is not synchronized, if two methods are used to operate the same file simultaneously, unpredictable consequences may occur, for details, refer to the example of merging system calls mentioned in the previous blog.
4. open (2)
1 open - open and possibly create a file or device2 3 #include <sys/types.h>4 #include <sys/stat.h>5 #include <fcntl.h>6 7 int open(const char *pathname, int flags);8 int open(const char *pathname, int flags, mode_t mode);
To operate a file or device using SYSIO, you must first obtain a file descriptor using the open (2) function.Note that when using this function, you must include a lot of header files identified above the function in the blog..
Parameter List:
Pathname: Path of the file to be opened.
Flags: Specifies the file operation method. Multiple options are linked by the bitwise OR (|) operator.
Comparison options:O_RDONLY, O_WRONLY, O_RDWR
Option: There are many options. Here we only introduce the commonly used options. To view the complete options, refer to the man manual.
Option |
Description |
O_APPEND |
Append to the end of the file. |
O_CREAT |
Create a new file. |
O_DIRECT |
Minimize buffering. Buffer-based write acceleration. cache-based read acceleration. |
O_DIRECTORY |
Make sure to open a directory. If the pathname is not a directory, opening fails. |
O_LARGEFILE |
Add this parameter when opening a large file, and define off_t as 64 bit. You can also use the macro definition mentioned in the previous blog to specify the length of off_t during compilation. |
O_NOFOLLOW |
If pathname is a symbolic link, it is not expanded. That is to say, the symbolic link file itself is opened, not the file pointed to by the symbolic link. |
O_NONBLOCK |
Non-blocking format. Blocking means that the data cannot be read, and non-blocking means that the system tries to read the data, regardless of whether the data can be read. |
Mode: 8-digit File Permission. This parameter is required when flags contains the O_CREAT option. Otherwise, this parameter is not required. Of course, the system does not directly use this parameter when creating a file, but calculates the final File Permission through the following formula:
mode & ~(umask)
The specific umask value can be obtained through the umask (1) command. Using this formula, you can avoid creating files with excessive permissions in the program.
I don't know if this function is interesting. No in C LanguageFunction overloadSo why are the two open (2) functions much like overloaded functions? In fact, they are implemented using the variable length parameter list.
Suddenly let me think of an interview question: how can we determine whether a function is implemented with overload or variable length parameters? The answer is to pass several more parameters to it. If an error is reported, the function must be overloaded. Otherwise, the variable length parameter is implemented.
5. close (2)
1 close - close a file descriptor2 3 #include <unistd.h>4 5 int close(int fd);
Disable the file descriptor.
The parameter is the file descriptor to be closed. Note that when a file descriptor is disabled, it cannot be used any more. Although the value of the fd variable has not changed, the kernel has released the relevant resources, this fd is equivalent to a wild pointer.
Return Value:
Success is 0, and failure is-1. However, it rarely checks its return value, and generally does not think it will fail.
6. read (2)
1 read - read from a file descriptor2 3 #include <unistd.h>4 5 ssize_t read(int fd, void *buf, size_t count);
This is a function used by SYSIO to read files. It reads count bytes of data from the file descriptor fd to the space pointed to by the buf.
Returned value: the number of bytes successfully read. 0 indicates that the object is read to the end.-1 indicates that an error occurs and errno is set.
Note that the return values of the read (2) function and the fread (3) function in STDIO are different. fread (3) returns the number of objects successfully read, while read (2) the function returns the number of bytes successfully read.
7. write (2)
1 write - write to a file descriptor2 3 #include <unistd.h>4 5 ssize_t write(int fd, const void *buf, size_t count);
Write (2) is a function used by SYSIO to write data to a file. It is used to write count bytes of data in the buf to the file corresponding to the file descriptor fd.
Returned value: the number of successfully written bytes. 0 does not indicate that the write fails. It only indicates that nothing is written.-1 indicates that an error occurs and errno is set.
Note that the return value of the write (2) function is different from that of the fwrite (3) function in STDIO. fwrite (3) returns the number of objects successfully written, while write (2) the number of successfully written bytes returned by the function.
Let's think about it. Why is the write value 0? In fact, there are many reasons, one of which is that when the write is blocked, and the write (2) system call in the blocking happens to be interrupted by a signal, then write (2) if no data is written, the returned value is 0. As for what is blocking and what is a signal, LZ will explain it in the following blog.
8. lseek (2)
1 lseek - reposition read/write file offset2 3 #include <sys/types.h>4 #include <unistd.h>5 6 off_t lseek(int fd, off_t offset, int whence);
As you can see from the previous blogFile Location pointerThis concept is a tag set by the system to facilitate reading and writing files. As we read and writing files through the function, it will automatically shift to the end of the file.
So does it mean that when we read a part of the file, we can't go back and read the same part of the content again?
Actually, it is not. Through the lseek (2) function, we can control the file position pointer as we like.
Parameter List:
Fd: file descriptor to be operated;
Offset: the offset of whence;
Whence: relative position; three options: SEEK_SET, SEEK_CUR, and SEEK_END
SEEK_SET indicates the start position of the file;
SEEK_CUR indicates the current position of the file position pointer;
SEEK_END indicates the end of the file;
Return Value:
If the operation succeeds, the system returns the offset of the position pointer of the file after the operation is completed. If the operation fails, the system returns-1 and sets errno;
The offset parameter and return value of this function encapsulate the basic data type, which is more advanced than fseek (3) of the standard library.
Write a pseudocode to illustrate how to use this function.
1 lseek (fd,-1024, SEEK_CUR); // offset 1024 bytes (fd, 1024, SEEK_SET) from the current position of the file pointer to the forward ); // offset 1kb3 lseek (fd, 1024UL * 1024UL * 1024UL * 5UL, SEEK_SET) from the start position of the file; // generate a 5 GB empty file
9. time (1)
I have discussed the efficiency of STDIO and SYSIO before, so I will talk about the time (1) command here.
This command is not used to view the current system time. If you want to view the system time, you must use the date (1) command. This is not what we will discuss today, so we will not talk about it.
The function of the time (1) command is to monitor the user time of a program, so that we can roughly analyze the execution efficiency of the program.
1 while ((readlen = read(srcfd, buf, BUFSIZE)) > 0) { 2 pos = 0; 3 while (readlen > 0) { 4 writelen = write(destfd, buf+pos, readlen); 5 if (writelen < 0) { 6 err = errno; 7 goto e_write; 8 } 9 pos += writelen;10 readlen -= writelen;11 }12 }
This is the core code of a program that imitates the cp (1) command. The buf is a char array used as the cache for data read/write. When the buf capacity is different, the efficiency of file copying is also different, because too frequent execution of system calls and excessive Cache Usage will reduce the efficiency. If you test the maximum copy efficiency of a buf by constantly modifying the buf capacity, you can use the time (1) command to monitor the execution time of the program.
>$ gcc -Wall mycp_sysio.c -o mycp_sysio
>$ time ./mycp_sysio rhel-server-6.4-x86_64-dvd.iso tmp.iso
real1m30.014s
user0m0.003s
sys1m29.003s
Sys is the time consumed by the program in kernel mode, that is, the time consumed by executing system calls.
User is the time consumed by the program in the user State, that is, the time consumed by the code of the program.
Real is the total waiting time of the user, and is the scheduling time of sys + user + CPU, so the real time is a little longer than sys + user. A program improves the user experience by improving the responsiveness, which generally refers to improving the real time.
10. File Sharing
File Sharing means that multiple processes process the same file together. In Figure 1, the second file table item and the third file table item point to the same inode, but these two file table items are called file sharing only when they come from different table items.
11. Atomic operations
Generally speaking, an atomic operation is to finish multiple actions in one breath without interrupting them. Either it completes all the steps or one step. Here we will use creating a temporary file to give you a question.
1 tmpnam, tmpnam_r - create a name for a temporary file2 3 #include <stdio.h>4 5 char *tmpnam(char *s);
If you need to create a temporary file, you must first provide a file name in the operating system, and then create the file.
The tmpnam (3) function is used to obtain the file name of a temporary file. Why is the file name generated by the operating system through this function? This is because there are many processes and temporary files in the system.
However, this function is only responsible for generating a temporary file name that does not exist in the current system, and is not responsible for creating a file, so the task of creating a file should be created by ourselves using fopen (3) or open (2).
Suppose that when we get the file name, the temporary file is not actually created on the disk, and another process gets the same file name as ours, in this case, it is a problem to create a file again.
This is because the action of getting the file name and creating the fileNot atomicIf you get a unique file name and create a file in one breath, it will not be interrupted, after the file is created, another process will no longer get the same file name.
1 tmpfile - create a temporary file2 3 #include <stdio.h>4 5 FILE *tmpfile(void);
Since the tmpnam (3) function cannot be used to create temporary files, is there no atomic method to avoid the problem described above? Of course, there is a way to use the tmpfile (3) function to create a temporary file.
The tmpfile (3) function is used to obtain the FILE name and create a temporary FILE in one go. It directly returns a created FILE struct pointer to us, in this way, Mom no longer has to worry that our file names will be preemptible by others. :)
Of course, there are many places in the system that require atomic operations, not just creating temporary files, so there are other functions in the system that provide atomic operations. We will explain them again when we encounter them. We will not detail them here.
12. dup (2), dup2 (2)
1 dup, dup2 - duplicate a file descriptor2 3 #include <unistd.h>4 5 int dup(int oldfd);6 int dup2(int oldfd, int newfd);
These two functions are used to copy file descriptors. In Figure 1, file descriptors 3 and 6 point to the same file table item.
For example, dup (2) is used to redirect the output.
1 # include <stdio. h> 2 # include <unistd. h> 3 4 int main (void) 5 {6/* requires that the following content be not changed, output content to the file */7 8 puts ("dup test. "); 9 10 return 0; 11}
The puts (3) function writes the parameter string to the stdout (file descriptor is 1) of the standard output, and the default target of the standard output is our shell. If you want to output the puts (3) parameter to a file, the idea is: first open a file to obtain a new file descriptor, and then close the standard output file descriptor (1 ), the dup (2) function family is used to generate a new file descriptor. At this time, the file descriptor No. 1 is not the standard output file descriptor, but the descriptor of the file we created ourselves. Do you remember we mentioned before that file descriptors give priority to the least available range. The currently opened file descriptors in the process include standard input (0), standard output (1), standard error (2), and self-opened files (3 ), when we disable file descriptor 1, the currently available minimum file descriptor is 1, so the newly copied file descriptor is 1. The standard library function puts (3) is printed to the specified file exactly when the system calls the write (2) function to print the descriptor to file 1.
1 # include <stdio. h> 2 # include <unistd. h> 3 # include <fcntl. h> 4 5 # include <sys/types. h> 6 # include <sys/stat. h> 7 8 int main (void) 9 {10 int fd =-1; 11 12 fd = open ("tmp", O_WRONLY | O_CREAT | O_TRUNC, 0664 ); 13/* if error */14 15 # if 016 close (1); // close standard output 17 dup (fd); 18 # endif19 dup2 (fd, 1 ); 20 close (fd); 21 22/* requires that the output content be included in the file */23 24 puts ("dup test. "); 25 26 return 0; 27}
As the requirement of the question is that the content below mentioned in puts (3) cannot be modified, in principle, the file descriptor No. 1 must be closed (2) after it is used ), therefore, a memory leak occurs here, but it does not affect the interpretation and testing of the dum (2) function family.
The above code can be implemented using the close (2) + dup (2) method or dup2 (2) method.
Dup (2) and dup2 (2) have the same effect. The difference is that the dum2 (2) function can use the second parameter to specify the number of the new file descriptor.
If the new file descriptor has been opened, close it and open it again.
If the two parameters are the same, the dup2 (2) function returns the original file descriptor without shutting down it.
In addition, the close (2) + dup (2) method is not atomic, while the dup2 (2) action is atomic, in the case of concurrency, problems may occur. In the subsequent blog, we will discuss concurrency through signal and multithreading.
13. sync (2)
1 sync, syncfs - commit buffer cache to disk2 3 #include <unistd.h>4 5 void sync(void);
The function of the sync (2) function family is to push the buffer and cache to the disk. It is usually used when the device is about to be detached. The functions of this function family are not very commonly used, so we will discuss them in detail when using them.
14. fcntl (2)
1 fcntl - manipulate file descriptor2 3 #include <unistd.h>4 #include <fcntl.h>5 6 int fcntl(int fd, int cmd, ... /* arg */ );
This is a butler-level function that reads or modifies the operation method of opened files based on different cmd and arg. For specific commands and parameters, refer to the man manual, which will not be described in detail in a large amount.
15. ioctl (2)
1 ioctl - control device2 3 #include <sys/ioctl.h>4 5 int ioctl(int d, int request, ...);
The design principle of all files in Linux abstracts all devices into one file. When some operations on a device cannot be abstracted into actions such as opening, closing, reading, and skipping, other actions are controlled by the ioctl (2) function.
For example, if a sound card device is abstracted as a file, the action of recording and playing audio can be abstracted as the read and write operations on the sound card file. However, functions such as configuration frequency and tone cannot be abstracted as file operations. Therefore, you need to use the ioctl (2) function to control the sound card device, the specific control commands are provided by the driver.
16./dev/fd
/Dev/fd is a virtual directory that contains the file descriptor used by the current process. If ls (1) is used for viewing, the file descriptor information used by the ls (1) process is displayed. Opening the file is equivalent to copying the file descriptor.
The I/O part of the file is over now. If you have any questions, please make sure that you are correct. :)