File IO and system programming
This article is the author read Tlpi (the Linux Programer interface Summary), in order to highlight the focus, avoid a cut, I will not too much to introduce the basic concepts and usage, I will focus on the principles and details. Therefore, for the readers of this article, at least asked to read Apue, or actually have written the relevant code of the programmer, because the knowledge is a bit fragmented, so I will try to present to the reader in the form of FAQs.
System Programming Overview How do I determine the version of GLIBC?
Can be divided into the following two ways:
The first is to view directly, first through the LDD to locate the location of the glibc, and then by running the GLIBC library directly to see its version number.
[e-mail protected] ~]# ldd/bin/ls .... libc.so.6= =/lib64/libc.so.6(0x00007f62209f3000) ....... [Email protected] ~]#/lib64/libc.so.6GNU CLibrary(GNU libc) Stable release version2.17, byRoland McGrath et al. Copyright (C) -Free software Foundation, inc.this isFree software; See the source forCopying conditions. There isNO warranty; notEven formerchantabilityorFITNESS forAparticular purpose.compiled byGNU CC Version4.8.5 20150623(Red Hat4.8.5-4). Compiled onA Linux3.10.0System on .- Geneva- -. Available extensions:the C stubsAdd- onVersion2.1.2. CryptAdd- onVersion2.1 byMichael Glad andOthers GNU Libidn bySimon Josefsson Native POSIX ThreadsLibrary byUlrich Drepper et al bind-8.2.3-T5B RTusingLinux kernel aiolibc abis:unique ifunc forBug reporting instructions, please see://www.gnu.org/software/libc/bugs.html>.
As you can see, the version of GLIBC is 2.17, and if we need to detect the glibc version in our source code, we might use some of the higher-version glibc library functions. So you can use the following way the second way is through __GLIBC__
and __GLIBC_MINOR__
these two constants, is the compile constant, can be tested with the help of #ifdef
preprocessing instructions, and can be judged by the if at runtime, because this is two constants, then there is a problem, if the A system is compiled , to run on the B system, then these two constants are useless, because the compile time has determined its value, unless it is compiled again on the B system. To deal with this possibility, the program can call gnu_get_libc_version
this function to determine the GLIBC version used by the runtime. The code is as follows:
#include <gnu/libc-version.h>#include <stdio.h>#include <assert.h>#include <unistd.h>intMain () {//compile-time get GLIBC version, a machine on the compile, B machine running on the problem. Need to use runtime to get GLIBC version printf("Major version:%d \ Minor version:%d\n", __glibc__,__glibc_minor__);//Get the GLIBC version of the runtime printf("glibc runtime version:%s\n", Gnu_get_libc_version ());Charbuf[65535] ={0};///glibc unique function to get glibc version, size_t confstr (int name, char *buf, size_t len);ASSERT (Confstr (_CS_GNU_LIBC_VERSION,BUF,sizeof(BUF)) >0);printf("glibc version:%s\n", buf);}
Need to include gnu/lib-version.h
this header file, __GLIBC__
is the major version number, the __GLIBC_MINOR__
minor version number, in addition to the use gnu_get_libc_version
of functions, you can also use GLIBC-specific functions to get the GLIBC version number.
How do I print a system data type value?
For basic types in C, it is convenient to print from printf, but Linux systems redefine many system data types through TypeDef, and for these types, unless we know what kind of primitive type of typedef this type is, Otherwise it is difficult to print correctly through printf, which often results in a lot of compiler warnings when printing. For example, the following scenario.
#include <unistd.h>#include <stdio.h>#include <stdlib.h>#include <sys/types.h>intMain () {pid_t mypid; Mypid = Getpid ();printf("mypid=%ld\n", mypid);return 0;} Gcc-wall1.c to compile, the following warnning will appear.1.c:in function ' main ':1.C:Ten:2: Warning:format '%ld ' expects argument of type 'Long int' But argument2Has type ' pid_t ' [-wformat=]printf("mypid=%ld\n", mypid);
A common coping strategy is to cast to a long type and then use it %ld
. However, there is an exception in some compilation environments where off_t
size and long long
equivalent, so off_t
data for the type should be strongly converted to long long using %lld
printing.
#include <unistd.h>#include <stdio.h>#include <stdlib.h>#include <sys/types.h>int main(){ pid_t mypid; mypid = getpid(); printf("mypid=%ld\n",(long)mypid); 0x9999; printf("myoff=%lld\n",(longlong)test); return0;}
This solves the problem of warnning the print system data type. In fact, the C99 standard is provided for %z
printing size_t and ssize_t types, but not all UNIX implementations are supported.
File I/O: How does the general-purpose I/O model View information about all file descriptors opened by a process?
With/proc/pid/fdinfo, you can view the open file descriptor information for a process with a process number of PID, as follows:
[root@localhostls /proc/854/fdinfo/0 1 2 3 4 5 6 7 8
As shown in the PID 854 process, its open file description symbol, by reading the above file, you can also know more about the descriptor information.
[[Email protected] ~]# CAT/PROC/854/FDINFO/3POS: 0Flags: 02004002mnt_id: 7[[Email protected] ~]# CAT/PROC/854/FDINFO/4POS: 0Flags: 02004002mnt_id: TenEventfd-count:0[[Email protected] ~]# CAT/PROC/854/FDINFO/5POS: 0Flags: 02004000mnt_id: TenINotify WD:2Ino608b968 sdev:fd00000 Mask: -AFCE Ignored_mask:0Fhandle-bytes:c Fhandle-type:BayiF_handle: theB908060000000037477d08inotify WD:1Ino608b979 sdev:fd00000 Mask: -AFCE Ignored_mask:0Fhandle-bytes:c Fhandle-type:BayiF_handle: -B908060000000036477d08
respectively Open the Descriptor 3, Description 4, descriptor 5, where each descriptor displays the information is different, for 3 is a normal file descriptor, the output information shows its file offset, the file's open flag information, the file's mount point ID, through cat /proc/854/mountinfo
You can view the mount path information for each mount point ID. For descriptor 4, an additional eventfd-count, which indicates that this is a EVENTFD system call generated Descriptor Descriptor 5 is a inotify generated descriptor, if the IO multiplexing mechanism produced by FD, then a TFD field will be more There is also a SIGNALFD generated descriptor that will have an extra sigmask field.
What is the role of the O_async logo?
O_async is called the signal-driven I/O mechanism, which is only valid for certain types of files, such as, and 终端
FIFOS
so on socket
, when the file descriptor can implement IO operations (data is ready to be stored in the kernel buffer), the system generates a signal notification process. The process then begins to read the data (copying the data from the kernel buffer to the user buffer). In Linux, when you open a file, you specify that O_async has no real effect and must be called fcntl
with the F_SETFL
following code:
int fd_flags;fcntl(0fcntl(0,F_GETFL);fcntl(0,F_SETFL,(fd_flags|O_ASYNC));
is to set the O_async mechanism for the standard input, then the user input data will cause the system to send a signal notification process.
The magical magic of O_excl and o_creat?
By looking at the open man document you will find that the explanation for the role of the O_EXCL flag is that, in conjunction with the parameter _creat parameter, a dedicated user creates the file and returns an error if the file to be created exists. The most important function of this flag is to check whether the file exists and create the file as a two-step atomic operation. First we look at how to create a file using the O_creat case.
if (access(filename,F_OK)) { //文件不存在的情况下, open(filename,O_CREAT,0666else { //文件存在的情况下,输出错误信息}
Above is the case that does not use the O_EXCL flag, how to determine whether a file exists, does not exist then open this file. If in the NFS
scenario, or multi-process, multi-threaded scenario, the first access to determine whether the file does not exist, the discovery does not exist then the open opened the file, but before Open is called, other processes or threads created the file. Then start executing open, then there is a problem, open opens an already existing file. In order to avoid this problem you have to lock the two steps, in addition you can use the O_EXCL flag with the O_CREAT
flag, the above two steps atomized.
What is a file hole and what does it do?
The file hole is actually the file offset more than the end of the file, and then perform the IO operation, then the file offset to the end of the file space to form a so-called hole, is not occupy disk space. Until the subsequent need to write data to allocate disk block storage data, virtual machine disk format is a sparse storage, that is, the user set a fixed disk size, but actually did not occupy so much space, but the use of how much to occupy, until more than this fixed size, The end of the file hole.
How do I allocate disk space?
Allocate disk space? There is no mistake to allocate disk space, usually we know how to allocate memory space, but very little know how to allocate disk space, then why do we need to allocate disk space such a concept?, when we use write, it will trigger the allocation of disk space, if the disk is full, Or reach the edge of the file hole, cause write to fail, for which we can pre-allocate disk space, if the allocation fails, you can not write operations. Linux provides a special system call as follows:
#define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h>- int fallocate(intint mode, off_t offset, off_t len);
This system call is used to give FD, which allocates disk space, provided by the POSIX standard posix_fallocate
. But the former is more efficient. The mode parameter determines whether to allocate disk space or reclaim disk space, and the mode value is as follows:
- Falloc_fl_keep_size Allocating space
- Falloc_fl_punch_hole free Space
Delve deeper into how file I/O atomically appends data to a file?
If you're just talking about appending data to a file, it's easy to think of the following two workarounds:
- Lseek navigates to the end of a file, and then begins write writing to the data
- Use the O_append flag when opening a file, and then start write writing to the data
But if you ask me how to atomically append data to a file, then only the second method, using Lseek positioning method is not atomic, if Lseek to the end of the file, and another process to the file appended to the data, then Lseek returned at this time the value is no longer the end of the file, There is a problem writing data at this time. The cause of this problem is that Lseek and write are not atoms. Instead of using the O_APPEND flag, adding this flag, each write is actually atomic Lseek and write operations.
How to atomically read data from a file-specific POS?
It's easy for me to think of Lseek to a specific POS and then read the data, but unfortunately this is not atomic, because multi-process is the POS for shared files, so it is possible that the POS was changed before you started writing the data. For this purpose, the Linux kernel provides a set of atomic system calls for atomic positioning and reading and function prototypes as follows:
#include <unistd.h> ssize_t pread(intvoidcount, off_t offset); ssize_t pwrite(intvoidcount, off_t offset);
These two system calls are basically equivalent to the following:
off_t orig;orig = lseek(fd,0,SEEK_CUR);lseek(fd,offsetread(fd,buf,len);lseek(fd,orig,SEEK_SET)
Pread basically contains several of these operations, but for pread these operations are atomic, and when the operation is complete, it reverts back to the original file offset. In addition to this atomicity advantage, the cost of a system call for a single pread operation is lower than the overhead of several of these operations.
What is Scatter-gather I/O?
This term also has the same concept at the Linux kernel level, but what we're talking about here is not the concept of the Linux kernel. We usually have encountered in the process of programming, we need to be distributed in the memory of the place in many places to write data to the file or network card?, there is no need to read from the file descriptor, and put these data in order to place the requirements of multiple buf, such requirements are often encountered in network programming, We know that the network protocol often has a concept is Baotou and the package body, usually Baotou is placed in one place, the package is placed in another place, so that the sending of data need to send several times to send data, but with the scatter-gather I/O only need one operation to complete, Avoids the overhead of multiple system calls. Linux provides us with such a system call, as follows:
#include <sys/uio.h> ssize_t readv(intconststructint iovcnt); ssize_t writev(intconststructint iovcnt);struct iovec { void *iov_base; /* Starting address */ size_t iov_len; /* Number of bytes to transfer */};
struct iovec
Sets the location and length of multiple buf that need to be written/read. Linux also provides PREADV/PWRITEV positioning Scatter-gather I/O for atomicity.
What is non-blocking I/O?
We usually read and write files when using Read/write, as if there is no blocking situation, it is not, it is because the kernel buffer to ensure that normal file I/O is not blocked, so open ordinary files generally ignore the blocking flag, but for the device, Pipeline and socket file is blocked, for example, when reading the standard input device, if the user does not input data, then read will block, read the socket, if the peer does not send data, then read will block. Sometimes I don't want my read operation to be blocked, and I want to be able to tell me right away when there is no data, so I can do other things instead of wasting time waiting. Therefore, with the concept of non-blocking IO, the descriptor can be fcntl by setting the O_NONBLOCK flag of the descriptor so that if the read operation is initiated, the kernel does not have the data ready to return immediately, and a eagain or ewouldblock error occurs.
How do I create a temporary file?
Some programs need to create temporary files for them to use during run time, the program terminates immediately after the deletion, Linux provides us with a series of functions for creating temporary files, its function prototype is as follows:
#include <stdlib.h> int mkstemp(char *template); #include <stdio.h> FILE *tmpfile(void);
Mkstemp requires the user to provide a file name template, the last six characters in the template must be xxxxxx, and the six characters will be replaced to guarantee the uniqueness of the file name. However, it is important to note that the generated temporary file name is passed back through the template, so the template requirement should be a character array. The return value of the mkstemp is the file descriptor for this temporary file. Tmpfile is also used to create temporary files, but it returns the IO to be buffered.
File I/O and system programming