UNIX advanced environment programming (6) standard I/O function library and I/O function library

Last Update:2015-02-28 Source: Internet

Author: User

Tags rewind

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

UNIX advanced environment programming (6) standard I/O function library and I/O function library

The standard I/O function library hides the buffer size and allocation details so that we don't have to worry about whether the pre-allocated memory size is correct.

Although this makes this function library easy to use, if we are not familiar with the function principle, we may also encounter many problems.

1. stream and FILE entity (Streams and FILE Objects)

In the previous chapter, I/O is concentrated on file descriptors. Each opened file corresponds to a file descriptor and operations are performed on the file through the file descriptor.

The standard I/O library is used, and the focus of the discussion is on streaming ).

Brief Introduction to the stream:

When we open or create a file, we say we have a stream associated with the file.
Stream supports single-byte and multi-byte character sets. The orientation attribute of stream determines whether to use a single character set or multiple character sets.
When a stream is created, no orientation is specified. When a wide character set IO function is used, the stream orientation is set to support wide character sets. When a single character set IO function is used, stream orientation is set to support single character set.

Only two functions can modify the stream's orientation:

Freopen clears the stream orientation;
Fwide is used to set the orientation of a stream.

Fwide function declaration:

#include <stdio.h>
#include <wchar.h>
int fwide(FILE* fp, int mode);

Function return value:

The returned integer indicates that the multi-Byte Character Set is supported;
A negative number indicates that the single-byte character set is supported;
0 indicates that the orientation of stream is not set.

The different modes determine the different behaviors of the function fwide:

If the mode is negative, fwide tries to set the specified stream to support the single-byte character set;
If the mode is an integer, fwide tries to set the specified stream to support multi-byte character sets;
If the mode is 0, fwide does not try to set the orientation of the stream, but returns a value representing the orientation of the current stream.

When we open a stream, the fopen function returns a pointer to the FILE object. A file object is usually a struct that contains the information required by all control flows, including:

The file descriptor used by the actual IO;
A pointer to the buffer used by the stream;
Buffer size;
The number of characters in the buffer;
Error flag;
.

2 cache (Buffering)

Buffer is used to call the read and write System calls as little as possible.

The standard IO Library provides three types of buffering:

Full cache (Fully buffered): In this cache mechanism, the actual IO operation occurs when the cache is full. Files being written to the hard disk are completely cached in the buffer. The cache space is usually obtained by calling the malloc function during the first IO operation;

Line buffered: In this cache mechanism, the actual IO operation occurs when a new Line of characters is read or output, so only one character can be output at a time. The buffer size is fixed. Therefore, the actual IO may still occur even if the current row is not read or the output ends. When the buffer is full; once an input occurs (from a cache-free stream or a row-cached stream), the output stream cached in the buffer will be immediately output (flush ).

Flush: the content in the standard I/O cache is immediately written to the hard disk or output. In terminal devices, flush may also discard data in the cache.

Unbuffered: the input or output content is not cached. For example, if we use the fputs function to output 15 characters, we want these 15 characters to be printed as quickly as possible. For standard error output, the request is non-Cache output.

The following cache features are required for iso c:

The above standards clearly do not specifically describe various situations. Generally speaking:

We can use the setbuf and setvbuf functions to change the stream caching mechanism.

Function declaration:

#include <stdio.h>
void setbuf(FILE* restrict fp, char* restrict buf);
int servbuf(FILE *restrict fp, char* restrict buf, int mode, size_t size);

Function return value:

OK: 0;
Error: Non-0

These functions must be called after the stream is opened and before other stream operations are executed.

Function functions:

Setbuf can enable or disable the cache. When the cache is enabled, the buf points to a BUFSIZ (stdio. h) buffer, which is usually fully cached when enabled. If the current stream is associated with a terminal device, some systems will also use row cache;

Servbuf can specify the type of cache to be opened. The mode parameter can be set to the following value. If it is set to no-cache, The buf and size parameters are ignored.

The function behavior is summarized as follows:

Generally, we should let the system select the buffer size and allocate it automatically, so that the standard IO library will automatically release the memory when closing the stream.

Flush function.

Function declaration:

#include <stdio.h>
int fflush(FILE *fp);

Function functions:

So that no data written to the hard disk in the cache of the stream is transmitted to the kernel.

In a special case, if fp is NULL, fflush will flush all cached data.

3. Open a stream)

Functions fopen, freopen, and fdopen are used to open a standard input/output stream.

Function declaration:

#include <stdio.h>
FILE *fopen(const char *restrict pathname, const char* restrict type);
FILE *freopen(const char *restrict pathname, const char *restrict type, FILE *restrict fp);
FILE *fdopen(int fd, const char *type);

Function details:

The fopen function opens the specified file;
The freopen function opens the specified file to the specified stream. If the stream has been opened, the stream is closed first. If orientation is set for the previously opened stream, the stream is cleared. The function freopen is usually used to open a file to a predefined stream, such as standard input, standard output, or standard error output;
Fdopen inputs a file descriptor and associates the descriptor with a standard IO stream. The function fdopen is mainly used to associate pipelines and network connections to a stream. These special types of files cannot be opened using the fopen function. We must first use a specific function to obtain the file descriptor, then, use the fdopen function to associate it with a stream.

The value of the type parameter is shown in the following table. There are 15 values in total, which have the same effect:

Table description:

To enable the standard IO system to distinguish between text files and binary files, because the kernel does not distinguish between file files and binary files, therefore, character B does not affect kernel behavior.
The type parameter of the fdopen function is slightly different from that of other functions. Because the file descriptor has been opened, opening the file stream does not cut off the file to a length of 0.
The append mode of the standard IO library function cannot be used to create a new file, because to obtain a file descriptor, you must first open an existing file.
Multiple processes can write the same file in append mode at the same time.

When you open a stream to read and write files, there are two restrictions:

After the input, if the fflush, fseek, fsetpos, or rewind functions are not called, the output cannot be followed.
After the output, if you do not call the fseek, fsetpos, or rewind function, you cannot input the function immediately.

The following table lists the six methods to open a stream:

Note that when creating a new file in w and a modes, the File Permission flag cannot be specified like the open or create function.

One solution is to adjust our umask.

By default, the opened stream is completely cached. If the stream is associated with a terminal device, it is a row cache.

As mentioned earlier, we open a stream and call the setbuf or setvbuf function to modify the cache mode before other operations.

Close stream

Function declaration:

#include <stdio.h>
int fclose(FILE* fp);

Function details, before closing the stream:

All cached data to be output will be output;
All cache data with input will be discarded;
If the cache used by the stream is allocated by the standard IO library, the cache will be released;
If the process ends normally, All cached data will be flushed (output or written to the hard disk) and all opened streams will be closed.

4. Read and Write a Stream)

When we open a stream, we have three read/write methods available:

One character read/write at a time
One read/write row: Use the fgets and fputs Functions
Direct read/write: Use the fread and fwrite functions to read and write data of a fixed length each time.

Input Function

Function declaration:

#include <stdio.h>
int getc(FILE* fp);
int fgetc(FILE* fp);
int getchar(void);

Function return value:

OK: Next character
EOF: End of the file, generally-1
Error: Negative

Function details:

The difference between getchar and getc is that the former must be implemented as a function, while the latter can be implemented as a macro;
The Return Value of the function converts unsigned char to int. Here, unsigned is used to convert to int without negative numbers. The purpose of returning an integer is to allow all possible values to be returned, including the error code and the end of the file;
The object ending character EOF is often defined as a negative number, and the error code is also a negative number. Therefore, we cannot judge from the return value whether it reaches the end of the file or report an error.
To distinguish the two cases above, we need to call the function ferror or feof.

Function declaration:

#include <stdio.h>
int ferror(FILE* fp);
int feof(FILE* fp);    // Both return: nonzero(true) if condition is true, 0(false) otherwise
void clearerr(FILE* fp);

In most implementations, the FILE object maintains two flags:

An error flag
A file Terminator flag

Both flags can be cleared by calling clearerr.

After reading a stream, we can call the ungetc function to compress the read characters.

Function declaration:

#include <stdio.h>
int ungetc(int c, FILE* fp);

Function return value: c if OK, EOF on error

Function details: Only one character can be squashed back.

Use Cases:

This operation is often used in the following scenarios: For an input stream, we need to determine how to process the current character based on the next character.

Output Function

The output functions correspond to the input functions we have discussed one by one.

Function declaration:

#include <stdio.h>
int putc(int c, FILE* fp);
int fputc(int c, FILE* fp);
int putchar(int c);

5. Line-by-Line input/output operations (Line-at-a-Time IO)

The fgets and gets functions provide row-by-row input.

Function declaration:

#include <stdio.h>
char *fgets(char* restrict buf, int n, FILE* restrict fp);
char *gets(char* buf);

Function details:

Both functions read a row of data into the buffer.
The gets function reads data from the standard input stream, and fgets reads data from the specified input stream.
Fgets needs to specify the buffer size. The data in a read row must not exceed n-1 characters and end with NULL. If fgets reads data of this row with a length greater than n, only n-1 characters are read for this time and end with null. The remaining characters are read when fgets is called next time.
Gets functions are not recommended because they do not perform cross-border checks.

Functions fputs and puts provide the row-by-row output function.

Function declaration:

#include <stdio.h>
int fputs(const char* restrict str, FILE* restrict fp);
int puts(const char* str);

Function details:

The fputs function outputs a string ending with null to the specified stream. The final null byte is not output;
The puts function will also output a string ending with null to the standard output. The final null byte will not be output, and a line break will be output after the output ends;
Therefore, we do not recommend using the puts function to Prevent Automatic output of a line break. However, when using fputs, remember to process the line break as necessary.

6. Standard input/output efficiency analysis

Comparison criteria:

Copy a certain amount of data from the standard input to the standard output.

User CPU time)
System CPU time (System CPU)
Clock time
Program text size

Code:

Use the getc and putc versions:

# Include "apue. h"

Int

Main (void)

{

Int c;

While (c = getc (stdin ))! = EOF)

If (putc (c, stdout) = EOF)

Err_sys ("output error ");

If (ferror (stdin ))

Err_sys ("input error ");

Exit (0 );

}

Use fgets and fputs versions:

# Include "apue. h"

Int

Main (void)

{

Char buf [MAXLINE];

While (fgets (buf, MAXLINE, stdin )! = NULL)

If (fputs (buf, stdout) = EOF)

Err_sys ("output error ");

If (ferror (stdin ))

Err_sys ("input error ");

Exit (0 );

}

Test data: 95.8 M 3 million lines

Test results (compared with the data in Chapter 3, skip this section before, you can check it yourself ):

Result description:

It can be found that the CPU time of the standard I/O library function User is larger than the best time of the read version, because the character-by-character read/write requires million cycles, and the row-by-row read/write requires 3 million cycles, in the first row, the latest read version executes the 25224 cycle;
The reason for the difference in clock time lies in the difference in user State time and the difference in time waiting for IO completion;
The CPU time of the System is basically the same as that of the previous version, because the number of kernel requests is basically the same. Therefore, when you do not care about the buffer size and allocation, or you only need to care about the buffer size of a row, you can obtain the optimal buffer selection.
The last column shows the size of the compiled Assembly file.
Row-by-row reading and writing is much faster than character-by-character reading and writing, because fgets and fputs are implemented using memccpy, And the memccpy function is implemented using assembly, which is more efficient.
The fgetc version is much faster than the read version's worst time (BUFFSIZE = 1) because the read version will execute the 200million function call, because there is no cache mechanism, therefore, 25224 million system calls will be executed accordingly, and million function calls will be executed in fgetc. However, due to the caching mechanism, only system calls will be executed. We know that the overhead of system calls is much greater than that of function calls.

7. Summary

The standard I/O function library is divided into two sections. This is the first article, mainly about

Basic concepts of stream
Basic stream operations, including opening, closing, and reading/writing
The read/write efficiency of the standard I/O Library is compared.

References:

Advanced Programming in the UNIX Envinronment 3rd

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More