[APUE] standard IO Library (lower), apueio
I. Efficiency of standard I/O
Compare the user CPU, system CPU, and clock time of the following four programs
Program 1: System IO
Program 2: Standard IO getc version
Program 3: Standard I/O fgets version
Result:
[Note: This table is truncated from APUE. in the above table, "the best time in Table 3-1 is" program 1 "". "the single-byte time in Table 3-1 refers to the result of running time when BUFSIZE is 1 in" program 1 ". The fgetc/fputc version program is not released here]
For each of the three standard I/O versions, the CPU time of each user is greater than that of the best read version, because each read of a character version has a cycle of 1.5 million times, in each version that reads a row, there is a loop that will execute 30000 times. In the read version, the loop only needs to be executed for 180 times. Because the system CPU time is the same, the difference in user CPU time leads to the difference in clock time. The reason why the system CPU time is the same is that all these programs have the same number of read/write requests to the kernel.
The last column in the Table above is the number of bytes in the text space of each main function (machine commands generated by c compilation ). It can be seen that the getc/putc version is replaced by a large macro in the text space, so the number of commands required exceeds the number of commands used to call the fgetc/fputc function. According to the user's CPU time, there is not much difference between the getc/putc version and fgetc/fputc version in this test.
The speed of using each line of IO is about twice the speed of each character version (including the user's CPU time and clock time ). If the fgets/fputs function is implemented using getc/putc, it is expected that the time of the fgets version will be close to that of the getc version. It is expected that the version of each row will be slower, because in addition to the existing 60000 function calls, more than three million macro calls are required. In this test, each line of parameters is implemented using memccoy. To improve the efficiency, the memccpy function is compiled by sink.
[Key]Fgetc and program 1 BUFSIZE = 1 versions are much faster. They both use function calls for about 3 million times, the reason for the large speed gap is that program 1 executes 3 million function calls, which also executes 3 million system calls, although the fgetc version executes 3 million function calls, it only causes 360 system calls. System calls are time-consuming compared with common function calls.
Ii. Binary IO
To read binary files, we can implement it through getc/putc, but the entire structure must be iterated. Fputs/fgets ends when a null character is encountered. The structure may contain null bytes, so fgets/fputs cannot be used. In summary, the following two functions are provided to execute binary IO operations.
# Include <stdio. h> size_t fread (void * ptr, size_t size, size_t nobj, FILE * fp); size_t fwrite (const char * ptr, size_t size, size_t nobj, FILE * fp ); returned value: Number of read or write objects
Common usage:
- Read or write a binary array. For example, write the 2nd to 5th elements of a floating point group to a file:
float data[10];if (fwrite(&data[2], sizeof(float), 4, fp) != 4) { fprintf(stderr, "fwrite error"); }
Where,Specify size as the length of each array element, and nobj as the number of elements to be written.
Read or write a structure. Example:
struct { short count; long total; char name[NAMESIZE]; } item;if (fwrite(&item, sizeof(item), 1, fp) != 1) { fprintf(stderr, "fwrite error");}
For reading, if an error occurs or the end of the file is reached, fread may return less than nobj. In this case, call ferro + feof to determine the situation. If the return value is less than nobj, an error occurs.
UseThe limitation of binary IO is that it can only be used to read data that has been written on the same system.. However, many heterogeneous systems are connected over the network and usually read data from another system. In this way, these two functions cannot work. The reason is:
- In a structure, the displacement of the same Member may vary with the Compilation Program and system, and some compilers have optimization options to align or closely wrap the structure (save storage space) this makes it easy to access members in the structure at runtime. This means that even in a single system, the binary storage method of a structure may vary depending on the different compiler options.
- The binary format used to store multi-byte integers and floating point values may be different between different system structures.
Iii. Stream locating
Standard IO streams can be located in two ways.
- Ftell and fseek. Both functions assume that the file location can be stored in a long integer.
- Fgetpos and fsetpos. These two functions are introduced by ansi c. These two functions introduce a new abstract data type fpos_t, which records the location of the file. In a non-UNIX system, this data type can be defined as the length required to record the location of a file.
Programs that need to be transplanted to non-UNIX systems should use fgetpos and fsetpos.
# Include <stdio. h> long ftell (FILE * fp); Return Value: Success indicates the number of offset bytes at the current position relative to the first part of the FILE. The error is-1 Lint fseek (FILE * fp, long offset, int whence); Return Value: Success is 0, error is not 0 void rewind (FILE * fp );
For a binary file, its position indicates that the measurement starts from the start position of the file and is measured in bytes. When ftell is used for binary files, the returned value is the location of this byte. To use fseek to locate a binary file, you must specify a byte offset and explain the displacement. Whence and lseek functions are the same: SEEK_SET indicates starting from the starting position of the file, SEEK_CUR indicates starting from the current position, and SEEK_END indicates starting from the end of the file.
For text files, the current position of their files may not be measured by a simple byte displacement. In non-UNIX systems, text files may be stored in different formats. to locate a text file, whence must be SEEK_SET, and there are only two types of offset values: 0 (indicates that the file is rewound to its starting position), or the value returned by the ftell of the file. You can use the rewind function to set a stream to the starting position of the file.
# Include <stdio. h> int fgetpos (FILE * fp, fpos_t * pos); int fsetpos (FILE * fp, const fpos_t * pos); Return Value: Success is 0, error is not 0
Fgetpos stores the current position in the object pointed to by the pos. When you call fsetpos later, you can use this value to redirect the stream to this position.
Iv. Format IO
1. format the output
# Incldue <stdio. h> in printf (const char * format ,...);
Return Value: the number of output characters for success. The error value is negative.
Int fprintf (FILE * fp, const char * format ,...);
Return Value: the number of output characters for success. The error value is negative.
Int sprintf (char * buf, const char * format ,...);
Return Value: number of characters stored in the array
Sprintf sends formatted characters to the array buf. Sprintf automatically adds a null byte to the end of the array, but this byte is not included in the return value.Sprintf may cause the cache directed by the buf to overflow.
The three variants of the printf family are similar to the above three, except that the variable parameter is changed to arg.
# Include <stdarg. h> # include <stdio. h> int vprintf (const char * f o r m a t, va_list arg); int vfprintf (FILE * f p, const char * f o r m a t, va_list arg); two functions are returned: if the result is successful, the number of output characters is returned. If an output error occurs, the return value is int vsprintf (char * B u f, const char * f o r m a t, va_list arg); Return: number of characters stored in the array
2. format the input.
Three scanf functions:
#include <stdio.h>int scanf(const char *format, ...);int fscanf(FILE *fp, const char *format, ...);int sscanf(const char *buf, const char *format, ...);
V. Implementation Details
In UNIX, standard IO eventually calls system IO. Each IO stream has a file descriptor associated with it. You can use fileno to obtain the file descriptor corresponding to the stream.
# Include <stdio. h> int fileno (FILE * fp); Return Value: FILE descriptor associated with the stream
To understand the implementation of standard IO in the system, it is best to start with the stdio. h header file.
[Note: The following example is not provided here]
Vi. Temporary Files
The standard IO Library provides two functions to help create temporary files
# Include <stdio. h> char * tmpnam (char * ptr); Return Value: pointer FILE * tmpfile (void) pointing to a unique path name; Return Value: FILE pointer upon success; error: NULL
Tmpnam generates a file name with the current one (changing the file name does not refer to ptr! This function is used to generate a unique file) A different valid path name string. Each time you call it, it generates a different path name. The maximum number of calls is TMP_MAX. TMP_MAX is defined in <stdio. h>.
If ptr is NULL, the generated path name is stored in a static zone, and the pointer pointing to the static zone is returned as the function value. This static zone will be overwritten when tmpnam is called again next time. (This means that if we call this function multiple times and want to save the path name, we should save the copy of the path name instead of the copy of the pointer.) If ptr is not NULL, it points to an array of at least L_tmpnam characters. (The constant L_tmpnam is defined in <stdio. h>) the generated path name is stored in this array, and ptr is also returned as a function value.
Tmpfile creates a temporary binary file. The file is automatically deleted when the file is closed or the program ends.
Tempnam is a variant of tmpnam. It allows callers to specify directories and prefixes for generated pathnames.
# Include <stdio. h> char * tempnam (const char * directory, const char * prefix); Return Value: pointer to a unique path name
There are four different options for directories: (the priority ranges from high to low)
(1) If the environmental variable TMPDIR is defined, use it as the directory.
(2) If the directory parameter is not NULL, use it as the directory.
(3) Use the string P_tmpdir in <stdio. h> as the directory.
(4) use the local directory, usually/tmp, as the directory.
If the prefix is not NULL, it is usually a string containing up to five characters and is used as the first few characters of the file name.
This function calls the malloc function to allocate a dynamic storage area and stores the constructed path name. When this path name is no longer used, the sub-storage zone can be released.