Advanced Programming in UNIX environment-differences between standard I/O Library buffers and kernel Buffers

Source: Internet
Author: User
Tags unix domain socket
1. the UNIX tradition of the I/O buffer of the C standard library is everything is a file, the keyboard, display, serial port, disk, and other devices have a special device file in the/dev directory, these device files can be opened, read, written, and closed just like normal files (files stored on disks). The function interfaces are the same. User Programs call C standard I/O library functions to read and write common files or devices. These library functions send read and write requests to the kernel through system calls, the kernel drives the disk or device to complete I/O operations. The C standard library allocates an I/O buffer for each opened file to accelerate read/write operations. This buffer can be found through the file structure of the file, most of the time a user calls a read/write function, the read/write requests are read and written in the I/O buffer. Only a few requests need to be sent to the kernel. Taking fgetc/fputc as an example, when a user program calls fgetc to read a byte for the first time, the fgetc function may read 1 kb of bytes to the I/O buffer through the system call, then return the first byte in the I/O buffer to the user and specify the read/write location
The second character in the I/O buffer. After you call fgetc, you can directly read it from the I/O buffer without entering the kernel, when you call fgetc again after reading all the 1 K bytes, The fgetc function enters the kernel again to read 1 K bytes to the I/O buffer. In this scenario, the relationship between the user program, the C standard library, and the kernel is like the relationship between CPU, cache, and memory in "Memory Hierarchy, the reason why the C standard library will pre-read some data from the kernel and put it in the I/O buffer is that the user program will use the data later, the I/O buffer of the C standard library is also in the user space, directly
Reading data from a user space is much faster than reading data into the kernel. On the other hand, when a user program calls fputc, it is usually written to the I/O slow-forward zone, so that the fputc function can return quickly. If the I/O buffer is full, fputc transmits the data in the I/O buffer area to the kernel through a system call, and the kernel finally writes the data back to the disk or device. Sometimes, the user program wants to pass the data in the I/O buffer to the kernel immediately and write the kernel back to the device or disk. This is called the flush operation. The corresponding library function is fflush, the fclose function will also perform the flush operation before closing the file.

We know that the main function is called as follows by the startup code: exit (main (argc, argv ));.

When the main function returns, the startup code will call exit. The exit function first closes all file * pointers that have not been closed (the flush operation is required before closing ), then, the kernel exits the current process by calling the _ exit system.

There are three types of I/O buffers in the c Standard Library: Full buffer, row buffer, and no buffer. When a user program calls a library function for write operations, different types of buffers have different features.

Full Buffer: If the buffer is full, write it back to the kernel. Regular files are generally fully buffered.

Row Buffer: if the data written by the user program contains a line break, this line is written back to the kernel, or if the buffer is full, it is written back to the inner core. Standard input and standard output are usually used as row buffering for terminal devices.

No buffer: each time a user program calls a database function for write operations, it must be written back to the kernel through a system call. Standard Error output is usually unbuffered, so that the error messages generated by the user program can be output to the device as soon as possible.

In addition to writing full buffers and line breaks, row buffering also automatically performs flush operations. If:
User Programs call library functions to read from unbuffered files
Or read from the buffer file, and this read operation will cause the system to call to read data from the kernel

If your program does not want to rely entirely on the automatic flush operation, you can call the fflush function to manually perform the flush operation.

# Include
<Stdio. h>
IntFflush (File
* Stream );
Returned value: 0 is returned for success, EOF is returned for error, and errno is set
The fflush function is used to ensure that data is written back to the kernel to avoid data loss during abnormal Process Termination, such as fflush (stdout). As a special case, call fflush (null) you can flush all the I/O buffers that open files.

2. user program buffer zone
The buffer allocated on the function stack, such as char Buf [10];, strcpy (BUF, STR); the string pointed to by STR may exceed 10 characters, leading to overwrite, this write out-of-bounds error may not occur at the time, but a segment error occurs when the function returns because the write out-of-bounds overwrites the return address stored on the stack frame, and jumps to an invalid address when the function returns, therefore, an error occurs. A memory segment, such as Buf, allocated by the caller and passed to the function for reading or writing is often referred to as a buffer. The buffer overflow error is called as a buffer overflow ). If the error only occurs in the current segment, it is not serious. What's more serious is that the buffer overflow bug is often exploited by malicious users, so that when the function returns, it jumps to the first
An address set in advance and an instruction set in advance. If it is cleverly designed, you can even start a shell and execute any command as you like, if such a bug exists in a program that runs with the root permission and is compromised, the consequences will be very serious.

Fgets/fputs indicates the role of the I/O buffer. When using the fgets/fputs function, you also need to allocate a buffer area (buf1 and buf2 in the figure) in your program ), note that the user program buffer and the C standard library I/O buffer are distinguished.

3. Kernel Buffer
1) terminal Buffering

The terminal device has an input and output queue buffer, as shown in

Take the input queue as an example. After the characters entered by the keyboard are filtered by line rules, the user program reads the characters from the queue in the FIFO order. Generally, when the input queue is full, the entered characters will be lost, and the system will trigger an alarm. The terminal can be configured to echo mode. In this mode, each character in the input queue is sent to both the user program and the output queue. Therefore, when you type a character in the command line, this character can not only be read by the program, but also be displayed on the screen.
Note that the above is the case where the user process (the shell process is also) calls unbuffer I/O functions such as read/write. When printf/scanf is called (the underlying implementation is also read/write) when a user program calls scanf to read the keyboard input, all the characters that start to be input are stored in the I/O buffer of the C standard library, when we encounter a line break (both standard input and standard output are row-buffered), the system calls read to read the buffer content to the kernel terminal input queue. When printf is called to print a string, if the statement contains a line break, the string called write in the I/O buffer is immediately written to the output queue of the kernel and printed to the screen. If the printf statement does not contain a line break, the above discussion shows that the fflush operation will be performed when the program exits.

 
2) Although the write system calls are located at the bottom layer of the C standard library I/O buffer, they are called unbuffered I/O functions, however, a kernel I/O buffer can be allocated at the underlying layer of the write operation. Therefore, the write operation may not be directly written to a file, but may also be written to the kernel I/O buffer, you can use the fsync function to synchronize files to the disk. As to whether the files are written or the kernel buffer, there is no difference for the process. If process a and process B open the same file, data written by process a to the kernel I/O buffer can also be read from process B, because the kernel space is shared by the process,
The I/O buffer of the C standard library does not have this feature, because the user space of the process is completely independent.

3) to reduce the number of disk reads, the kernel caches the tree structure of the Directory, which is called the dentry (directory entry) cache.

4) FIFO and Unix domain socket are identified by special files in the file system. A fifo file has no data blocks on the disk and is used to identify only one channel in the kernel. Each process can open this file for read/write, in fact, it is the reading and writing kernel channel (the root cause is that the Read and Write Functions pointed to by this file structure are different from those of conventional files), thus implementing inter-process communication. UNIX
The principle of domain socket and FIFO is similar. A special socket file is also required to identify the channel in the kernel. The file type s indicates the socket, and these files do not have data blocks on the disk. UNIX domain socket is currently the most widely used IPC Mechanism. For example:

4. Infinite recursion of stack overflow or a defined large array may cause the program to crash (segment error) when the stack space reserved by the operating system for the program is exhausted. 5. Take writing a file as an example, shows the relationship between C standard I/O library functions (printf (3), putchar (3), fputs (3), and system call write (2.
Hierarchical relationship between library functions and system calls

System functions such as open, read, write, and close are called unbuffered I/O functions because they are located at the underlying layer of the I/O buffer of the C standard quasi-library. When a user program is reading and writing files, it can call both the C standard I/O library function and the underlying unbuffered I/O function. Which function group is better?

The unbuffered I/O function is used in the kernel for each read/write operation. Calling a system call is much slower than calling a user space function, therefore, it is necessary to open up the I/O buffer in the user space. It is easier to use the C standard I/O library function, saving the trouble of managing the I/O buffer by yourself.

When using C standard I/O library functions, you must always note that the I/O buffer may be inconsistent with the actual file, and you must call fflush (3) when necessary ).

We know that the tradition of UNIX is everything is a file. I/O functions are not only used to read and write conventional files, but also for read and write setup and backup, such as terminals or network devices. When reading and writing a device, you usually do not want to buffer data. For example, writing data to a file representing a network device means you want the data to be sent out through a network device, instead of writing data to the buffer zone, the application will be notified immediately when the network device receives the data, therefore, the unbuffered I/O function is usually directly called for network programming.

C Standard library functions are part of the C standard, while unbuffered I/O functions are part of the UNIX standard, c Standard library functions should be available on all platforms that support C language (except for C compilers on some platforms that do not fully comply with C standards ), the unbuffered I/O function can be used only on UNIX platforms. Therefore, the C standard I/O library function is in the header file stdio. h declaration, and read, write and other functions in the header file unistd. h. In a non-UNIX operating system that supports C language, the underlying level of the standard I/O library may be supported by another set of system functions. For example, the underlying level of a Windows system is Win32.
Api. The system functions used to read and write files are readfile and writefile.

System functions such as open, read, write, and close are called unbuffered I/O functions because they are located at the underlying layer of the I/O buffer of the C standard quasi-library. When a user program is reading and writing files, it can call both the C standard I/O library function and the underlying unbuffered I/O function. Which function group is better?

The unbuffered I/O function is used in the kernel for each read/write operation. Calling a system call is much slower than calling a user space function, therefore, it is necessary to open up the I/O buffer in the user space. It is easier to use the C standard I/O library function, saving the trouble of managing the I/O buffer by yourself.

When using C standard I/O library functions, you must always note that the I/O buffer may be inconsistent with the actual file, and you must call fflush (3) when necessary ).

We know that the tradition of UNIX is everything is a file. I/O functions are not only used to read and write conventional files, but also for read and write setup and backup, such as terminals or network devices. When reading and writing a device, you usually do not want to buffer data. For example, writing data to a file representing a network device means you want the data to be sent out through a network device, instead of writing data to the buffer zone, the application will be notified immediately when the network device receives the data, therefore, the unbuffered I/O function is usually directly called for network programming.

C Standard library functions are part of the C standard, while unbuffered I/O functions are part of the UNIX standard, c Standard library functions should be available on all platforms that support C language (except for C compilers on some platforms that do not fully comply with C standards ), the unbuffered I/O function can be used only on UNIX platforms. Therefore, the C standard I/O library function is in the header file stdio. h declaration, and read, write and other functions in the header file unistd. h. In a non-UNIX operating system that supports C language, the underlying level of the standard I/O library may be supported by another set of system functions. For example, the underlying level of a Windows system is Win32.
Api. The system functions used to read and write files are readfile and writefile.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.