Advanced Programming in UNIX environment-standard I/O Library

Source: Internet
Author: User

I think the first function that people have learned from C language should be printf. But do we really understand it? Recently, I/O is hard to understand Linux and network programming. I have never realized that unix I/O and C standard library I/O functions are not the same thing at all. Unix I/O is also called low-level I/O, or unbuffered I/O. It is part of the operating system kernel and also called by the system; c-standard I/O functions are also buffered I/O and advanced I/O functions. They are generally used to encapsulate these system calls for efficiency. In the past, getchar () was often used to generate carriage return errors. That is the concept of a buffer when the standard I/O implementation is not understood. I found this article on the internet, referring to advanced programming in the Unix environment, and wrote it in great detail.

All the I/O functions mentioned in the previous article "Advanced Programming in UNIX environment-file descriptor analysis" are for file descriptors. For standard I/O libraries, their operations are performed around the stream. When you open or create a file using the standard I/O library, we have combined a stream with a file.

1. stream and file objects

When a stream is opened, the standard I/O function fopen returns a pointer to the file object. This object is usually a structure that contains all the information required by the I/O library to manage the stream: The file descriptor used for actual I/O, pointing to the stream cache pointer, the length of the cache, the number of characters in the cache, the error mark, and so on.

The application does not need to check the file object. To reference a stream, you must pass the file pointer as a parameter to each standard I/O function. In the book "Advanced Programming in UNIX environments", we call the pointer to a file object (type: file *) as a file pointer.

 

Ii. Standard I/O Library cache (need to understand)

The purpose of the standard I/o cache is to use as few read and write calls as possible to accelerate the read and write operations on files. But unfortunately, the most confusing standard I/O Library is its cache. To describe the caching mechanism in detail, you must first understand why this cache can provide file operation efficiency.

The user program calls the standard I/O library function to read and write files, and these library functions need to pass the Read and Write requests to the kernel through the system call, and finally the kernel drives the disk or device to complete the I/O operations. The standard I/O Library allocates an I/O buffer for each opened file to accelerate read/write operations. This buffer can be found through the file structure of the file, most of the time a user calls a read/write function, the read/write requests are read and written in the I/O buffer. Only a few requests need to be sent to the kernel. Taking fgetc/fputc as an example, when a user program calls fgetc to read a byte for the first time, the fgetc function may read 1 kb of bytes to the I/O buffer through the system call, then, the first byte in the I/O buffer is returned to the user, pointing the read/write position to the second character in the I/O buffer, and then the user calls fgetc, it is read directly from the I/O buffer, instead of the kernel. When the user reads all the 1 K bytes and calls fgetc again, the fgetc function will read 1 K bytes into the I/O buffer again. The reason why the standard I/O library will pre-read some data from the kernel and put it in the I/O buffer is that the user program will use the data later, the I/O buffer of the standard I/O Library is also in the user space. Reading data directly from the user space is much faster than reading data into the kernel. On the other hand, when a user program calls fputc, it is usually written to the I/O buffer, so that the fputc function can return quickly. If the I/O buffer is full, fputc transmits the data in the I/O buffer to the kernel through a system call, and the kernel writes the data back to the disk. Sometimes, the user program wants to immediately pass the data in the I/O buffer to the kernel and write the data back to the device. This is called the flush operation. The corresponding library function is fflush, the fclose function will also perform the flush operation before closing the file.

Fgets/fputs indicates the role of the I/O buffer. When using the fgets/fputs function, the user program also needs to allocate a buffer (buf1 and buf2 in the figure ), note that the user program buffer and the C standard library I/O buffer are distinguished.

Figure 1 I/O cache area

The standard I/O Library provides three types of cache:

1)Full Cache: If the buffer is full, write it back to the kernel. Regular files are generally fully buffered.

2)Row cache:If there is a line break in the data written by the user program, write this line back to the kernel, or if the buffer is full, write it back to the kernel. Standard input and standard output are usually used as row buffering for terminal devices.

Row cache has two restrictions:

The first one is that the cache length in the row cache is fixed. The system generally defaults to 1 K. Therefore, if the row cache is full, even if no new line break is written, the system will also perform the I/O operation. You can see from the example below.

The second is to request a non-Cache stream from (a) or (B) at any time through the standard input/output library) if a row-cached stream (which requires obtaining data from the kernel in advance) obtains input data, it will refresh all rows of cached output streams.

Example 01.c

# Include <stdio. h>

Int main ()

{

Printf ("Hello World ");

Whlie (1 );

Return 0;

}

During compilation and execution, no output is displayed on the terminal. If whlie (1) is removed, the Hello world will be printed on the terminal.

 

Example 02.c

# Include <stdio. h>

Int main ()

{

Printf ("Hello world \ n ");

Whlie (1 );

Return 0;

}

During compilation and execution, the terminal prints Hello world.

 

Example 03.c

# Include <stdio. h>

Int main ()

{

Printf ("Hello world... Hello World"); //... represents 1024-11*2 bytes

Whlie (1 );

Return 0;

}

During compilation and execution, the terminal prints Hello world... Hello world. The preceding three examples show that the length of the row cache is fixed. When the data written to the cache is a line break or the length exceeds the cache length, the system performs the I/O operation.

3)Without cache:Each time a user program calls a database function for write operations, it must be written back to the kernel through system calls. Standard Error output is usually unbuffered, so that the error messages generated by the user program can be output to the device as soon as possible.

If we do not like these system defaults for any stream, we can change the cache type by calling one of the following two functions.

Bytes -----------------------------------------------------------------------------------------------------------------

Void setbuf (File * FP, char * BUF );

Or

Int setvbuf (File * FP, char * Buf, int mode, size_t size );

Return Value: 0 if successful, and 0 if an error occurs.

Bytes -----------------------------------------------------------------------------------------------------------------

Second, the options of setbuf and setvbuf functions are described. It can be seen that the setvbuf function is more powerful.

Figure 2 options of setbuf and setvbuf Functions

 

Iii. Standard I/O library functions 1. Enable and disable I/O Stream Functions

The following three functions can be used to open a standard stream:

Bytes ----------------------------------------------------------------------------------------------------------------

File * fopen (const char * pathname, const char * type );

File * freopen (const char * pathname, const char * type, file * FP );

File * fdopen (INT filedes, const char * type );

Returns the result of the three functions: the file pointer if the operation succeeds, and null if an error occurs.

Bytes -------------------------------------------------------------------------------------------------------------

The differences between the three functions are:

(1) fopen open a file indicated by pathname.

(2) freopen opens a specified file (whose path name is indicated by pathname) on a specific stream (indicated by FP). If the stream has already been opened, close the stream first. This function is generally used to open a specified file into a predefined stream: standard input, standard output, or standard error.

(3) fdopen obtains an existing file descriptor (we may obtain this file descriptor from open, dup, dup2, fcntl or pipe functions ), and combine a standard I/O stream with the descriptor.

The following function is used to close a standard stream:

Bytes --------------------------------------------------------------------------------------------------------------

Int fclose (File * FP)

Bytes ---------------------------------------------------------------------------------------------------------------

2. Read and Write I/O flow Functions

1) byte I/O functions

Bytes ----------------------------------------------------------------------------------------------------------------

Int GETC (File * stream );

Int fgetc (File * stream );

Int getchar (void );

Returned value: the read bytes are returned successfully. If an error occurs or the end of the file is read, The EOF is returned.

Bytes ----------------------------------------------------------------------------------------------------------------------

L The first and third are not functions, but are implemented by macro definition using fgetc. For example:

# Define GETC (_ stream) fgetc (_ Stream)

# Define getchar fgetc (stdin)

L so fgetc can be passed as a parameter to another function.

L when fgetc succeeds, a byte is returned, which should have been unsigned char type. However, because the return value in the function prototype is int type, this Byte must be converted to int type before returning, so why should we specify that the returned value is of the int type? Because an error occurs or fgetc will return EOF (-1) when the end of the file is read, and the returned value stored in the int type is 0 xffffffff. If the byte 0xff is read, the conversion from unsigned char type to int type is 0x000000ff. Only the specified return value is int type can the two cases be distinguished. If the specified return value is unsigned char type, when the returned value is 0xff, The EOF or 0xff cannot be distinguished. If you want to save the return value of fgetc, you must save it in the int type variable. If it is written as unsigned char c = fgetc (FP );, therefore, EOF and 0xff bytes cannot be distinguished based on the value of C. Note: When fgetc reads the end of a file, it returns EOF, but uses this returned value to indicate that it has been read to the end of the file. It does not mean that each file has an EOF at the end (according to the above analysis, EOF is not a byte ).

Bytes ---------------------------------------------------------------------------------------------------------------

Int putc (int c, file * stream );

Int fputc (int c, file * stream );

Int putchar (int c );

Returned value: If C is returned successfully, the error is EOF

Bytes ---------------------------------------------------------------------------------------------------------------

L Similarly, the first and third functions are not functions, but are implemented by macro definition using fgetc.

2) string-based I/O functions

Bytes ----------------------------------------------------------------------------------------------------------------

Char * fgets (char * s, int size, file * stream );

Char * gets (char * s );

Return Value: if the result is successful, the pointer s points to the returned pointer. If an error occurs or the end of the file is read, null is returned.

Bytes ---------------------------------------------------------------------------------------------------------------

L both functions specify the cache address, and the read strings are placed in it. Gets is read from the standard input, and fgets is read from the specified stream.

L gets is not recommended for programmers. It exists only to be compatible with previous programs. The code we write should not call this function.

L let's talk about the fgets function. The parameter S is the first address of the buffer, and the size is the length of the buffer, this function reads a row ending with '\ n' (including' \ n') from the file indicated by stream and saves it to the buffer zone S, add '\ 0' at the end of the row to form a complete string. If a row in the file is too long, fgets reads size-1 characters from the file and does not read '\ n ', the read size-1 characters and a '\ 0' character are stored in the buffer. The remaining half of the file can be read when fgets is called next time. If a fgets call reaches the end of the file after reading several characters, add '\ 0' to the read string to the buffer and return it. If fgets is called again, null is returned, you can determine whether to read the end of the file. Note: For fgets, '\ n' is a special character, while' \ 0' does not have any special character. If you read '\ 0', it is read as a common character. If the file contains the '\ 0' character (or 0x00 bytes ), after calling fgets, you cannot determine whether '\ 0' in the buffer zone is a character read from the file or an ending character automatically added by fgets, therefore, fgets is only suitable for reading text files, and is not suitable for binary files. All characters in a text file must be visible and cannot contain '\ 0 '. You can use fread to implement binary files.

Bytes ---------------------------------------------------------------------------------------------

Int fputs (const char * s, file * stream );

Int puts (const char * s );

Returned value: a non-negative integer is returned. If an error occurs, an EOF is returned.

Bytes ------------------------------------------------------------------------------------------------

L The string stored in the buffer s ends with '\ 0'. fputs writes the string to the file stream, but does not write the ending' \ 0 '. Unlike fgets, fputs does not care about the '\ n' character in the string, which can contain' \ n' or '\ n '. Puts writes the string s to the standard output (excluding '\ 0' at the end), and then automatically writes a' \ n' to the standard output.

L

3) binary I/O functions

L I/O functions in string units are not suitable for binary text. Of course, we can use fgetc and fputc to implement binary files, but we must loop the entire binary file, which is obviously inefficient. Therefore, the standard Io Library provides the following two functions for operating binary files:

Bytes ----------------------------------------------------------------------------------------------

Size_t fread (void * PTR, size_t size, size_t nmemb, file * stream );

Size_t fwrite (const void * PTR, size_t size, size_t nmemb, file * stream );

Returned value: number of records read or written. The number of records returned when successful is nmemb. If an error occurs or the number of records returned when the end of the file is read is less than nmemb, 0 may be returned.

Bytes --------------------------------------------------------------------------------------------------

L The basic problem with binary I/O is that it can only be used to read data already written on the same system. The reason is:

(1) In a structure, the displacement of the same Member may vary with the Compilation Program and System (due to different aligning requirements ). Indeed, some compilation programs have an option that allows close packaging structure (which saves storage space while performance may degrade) or is precisely aligned, this makes it easy to access members in the structure at runtime. This means that even on a single system, the binary storage method of a structure may vary depending on the options of the Compilation Program.

(2) The binary format used to store multi-byte integers and floating point values may be different between different system structures.

3) binary I/O functions

L I/O functions in string units are not suitable for binary text. Of course, we can use fgetc and fputc to implement binary files, but we must loop the entire binary file, which is obviously inefficient. Therefore, the standard Io Library provides the following two functions for operating binary files:

Bytes ----------------------------------------------------------------------------------------------

Size_t fread (void * PTR, size_t size, size_t nmemb, file * stream );

Size_t fwrite (const void * PTR, size_t size, size_t nmemb, file * stream );

Returned value: number of records read or written. The number of records returned when successful is nmemb. If an error occurs or the number of records returned when the end of the file is read is less than nmemb, 0 may be returned.

Bytes -------------------------------------------------------------------------------------------------

L The basic problem with binary I/O is that it can only be used to read data already written on the same system. The reason is:

(1) In a structure, the displacement of the same Member may vary with the Compilation Program and System (due to different aligning requirements ). Indeed, some compilation programs have an option that allows close packaging structure (which saves storage space while performance may degrade) or is precisely aligned, this makes it easy to access members in the structure at runtime. This means that even on a single system, the binary storage method of a structure may vary depending on the options of the Compilation Program.

(2) The binary format used to store multi-byte integers and floating point values may be different between different system structures.

3. Locate the I/O Flow Function

Two methods are used to locate the standard I/O Stream:

(1) ftell and fseek. These two functions have existed since V7, but they both assume that the file location can be stored in a long integer.

(2) fgetpos and fsetpos. These two functions are newly introduced by ansi c. They introduce a new abstract data type fpost, which records the file location. In a non-Unix system, this data type can be defined as the length required to record the location of a file. Therefore, fgetpos and fsetpos should be used for applications transplanted to non-Unix systems.

Bytes ----------------------------------------------------------------------------------------------------

Int fseek (File * stream, long offset, int whence );

Returned value: 0 is returned for success,-1 is returned for error, and errno is set

 

Long ftell (File * stream );

Returned value: the current read/write location is returned successfully. If an error occurs,-1 is returned and errno is set.

 

Void rewind (File * stream );

Move the read/write location to the beginning of the file

Bytes -------------------------------------------------------------------------------------------------------

The whence and offset parameters of fseek jointly determine where the Read and Write locations are moved. The meaning of the whence parameter is as follows:

Seek_set

Move the offset byte from the beginning of the file

Seek_cur

Move the offset byte from the current position

Seek_end

Move the offset byte from the end of the file

 

Offset can be positive or negative. negative values indicate moving forward (to the direction starting with the file), and positive values indicate moving backward (to the end of the file, if the number of bytes to move forward exceeds the beginning of the file, an error is returned. If the number of bytes to move backward exceeds the end of the file, the size of the file will be increased when the file is written again, the Bytes between the end of the original file and the read/write position after fseek are all 0.

Bytes -------------------------------------------------------------------------------------------------------

Int fgetpos (filef * P, fpos_t * POS );

Int fsetpos (filef * P, const fpos_t * POS );

Two functions return: 0 if successful, and 0 if an error occurs

Bytes -----------------------------------------------------------------------------------------------------------

Fgetpos stores the current value of the file location indicator into the object directed by the POS. When you call fsetpos later, you can use this value to locate the stream again.

 

4. format the I/O Flow Function

L format the input function:

Bytes ----------------------------------------------------------------------------------------------------

Int printf (const char * format ,...);

Int fprintf (File * stream, const char * format ,...);

Int sprintf (char * STR, const char * format ,...);

Int snprintf (char * STR, size_t size, const char * format ,...);

 

Int vprintf (const char * format, va_list AP );

Int vfprintf (File * stream, const char * format, va_list AP );

Int vsprintf (char * STR, const char * format, va_list AP );

Int vsnprintf (char * STR, size_t size, const char * format, va_list AP );

 

Returned value: the number of bytes formatted for output (excluding '\ 0' at the end of the string) is returned successfully. If an error occurs, a negative value is returned.

Bytes --------------------------------------------------------------------------------------------------------

L formatting output functions:

Bytes ---------------------------------------------------------------------------------------------------------

Int scanf (const char * format ,...);

Int fscanf (File * stream, const char * format ,...);

Int sscanf (const char * STR, const char * format ,...);

 

# Include <stdarg. h>

 

Int vscanf (const char * format, va_list AP );

Int vsscanf (const char * STR, const char * format, va_list AP );

Int vfscanf (File * stream, const char * format, va_list AP );

Return Value: return the number of parameters for successful match and value assignment. The number of parameters for successful match may be less than the value assignment parameter provided. If the return value is 0, none of them match, when an error occurs or the end of a file or string is read, EOF is returned and errno is set.

Bytes ---------------------------------------------------------------------------------------------------------

Here is just a little trick of printf. We add # After %, and the value printed to the terminal will automatically add 0 and 0x in front. For example, if the pintf ("% # X", 1) statement is in progress, 0x1 is printed.

 

5. Create a temporary file I/O Flow Function

In many cases, the program creates temporary files in the file form. These temporary files may save the intermediate results of this calculation, or may be backups before key operations. This is the benefit of temporary files.

Standard I/O provides two functions to create temporary files

Bytes ---------------------------------------------------------------------------------------------------------

Char * tmpnam (char * PTR );

Returns a pointer to a unique path name.

 

File * tmpfile (void );

Return: If successful, it is the file pointer. If an error occurs, it is null.

Bytes -----------------------------------------------------------------------------------------------------------

L The tmpnam function returns a valid file name that does not have the same name as any existing file. Each call generates a different file name, but the maximum number of calls in a process is tmp_max [defined in stdio. H ]. If PTR is not null, it is considered that the length of the string PTR is at least l_tmpnam [IN stdio. h has definition], the generated file name will be placed in the string PTR, so the return value is the PTR value; If PTR is null, the generated file name is stored in a static zone, the static zone will be rewritten during the next call.

L tmpfile creates a temporary binary file (type: WB +), which is automatically deleted when the file is closed or the program ends.

L note that tmpnam only creates a temporary file and does not open it. Therefore, if you want to use it, open it as quickly as possible, this reduces the risk of another program opening a file with the same name. In addition to creation, tmpfile is opened in both read and write mode.

 

Iv. References

1. Advanced Programming in UNIX environment

2. Linux programming (Third edition)

3. Standard I/O library functions

Advanced Programming in UNIX environment-standard I/O Library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.