UNIX advanced environment programming (14) file IO, unix programming 14io

Source: Internet
Author: User

UNIX advanced environment programming (14) file IO, unix programming 14io

Spring is coming. In addition to work and study, everyone must exercise and exercise more.

Haidao, which was shot in haidao Huaxi, yuandu ruins park last weekend.

 

Go to the topic.

O_DIRECT and O_SYNC are flags that the system calls open. Open a file with a specific file descriptor by specifying the flag parameter.

These two flags have a great impact on the performance of the write disk. Therefore, we have a detailed understanding of these two flags.

First, let's look at an example of using an open function.

/* Open new or existing file for reading and wrting,    sync io and no buffer io; file permissions read+    write for owner, nothing for all others */fd = open("myfile", O_RDWR | O_CREAT | O_SYNC | O_DIRECT, S_IRUSR | S_IWUSR);if (fd == -1)    errExit("open");
Export O_DIRECT: no buffer input or output. O_SYNC: open the file in synchronous IO mode. The two flags are described in detail below. I. O_DIRECT, bypassing the buffer cache, direct IO Direct IO: Linux allows applications to bypass the buffer high-speed cache when performing disk I/O. Data is directly transferred from user space to files or disk devices, which is called direct I/O) or raw IO ). Application scenarios: The high-speed cache and I/o optimization mechanisms of the database system are all self-contained and do not require the kernel to consume CPU time and memory to complete the same task. Disadvantages of using direct IO: The performance may be greatly reduced. The kernel has made a lot of Optimizations to the buffer zone, including pre-reading in order and executing IO on the cluster disk block, allow multiple processes accessing the same file to share the cache buffer. Usage: Specify the O_DIRECT flag when calling the open function to open a file or device. Note the possible inconsistencies:If a process opens a file with the O_DIRECT flag, and another process opens the same file with a normal (that is, using the cache buffer, there is no consistency between the data read and written by direct IO and the content in the buffer cache. Avoid this scenario as much as possible. Restrictions to use direct IO:
  • The memory boundary of the buffer used to transmit data must be aligned to an integer multiple of the block size.
  • The start point of data transmission, that is, the offset between the file and the device. It must be an integer multiple of the block size.
  • The length of the data to be transmitted must be an integer multiple of the block size.

Failure to comply with any of the above restrictions will cause the EINVAL error.

 

Ii. O_SYNC: write files synchronously

Function: Force refresh the kernel buffer to the output file. This is necessary, because to ensure data security, make sure that the data is actually written to the disk or disk hardware to inform the cache.

First, we should familiarize ourselves with the definition of synchronous IO and system calls.

Definition of synchronous I/O data integrity and synchronous I/O file integrity synchronization I/O: an I/O operation has either been successfully transferred to the disk or diagnosed as unsuccessful. SUSv3 defines two types of synchronous IO completion (in English, because the translator can't bear to use the original article ...)
  • Synchronized IO data integrity completion: ensure that a file update is passedSufficient information (partial file metadata)To the disk to facilitate subsequent data acquisition.
  • Synchronized IO file integrity completion: ensure that a file update is passedAll information (all file metadata)To the disk, even if some subsequent operations on file data are not required.
System Call used to control file I/O kernel buffer

1 fsync

Function: fsync () system calls will refresh all metadata related to buffered data and fd to the disk. Calling fsync forces the file to be In the Synchronized IO file integrity completion state. Function declaration:
#includeint fsync(int fd);
Callback function return value:
  • 0: success
  • -1: error
Return time: fsync () is returned only after the transmission of the disk device (or at least its high-speed cache) is completed. 2 fdatasync: fdatasync () is called by the system. It only forces the file to be In the synchronized IO data integrity compeletion state. Function declaration:
#includeint fdatasync(int fd);
Callback function return value:
  • 0: success
  • -1: error
Difference from fsync: fdatasync () may reduce the number of disk operations, from two fsync () call requests to one. For example, if the file data is modified and the file size remains the same, the call to fdatasync only forces data update. In contrast, fsync () the call forces the metadata to be transmitted to the disk, and the metadata and file data are resident in different areas of the disk. to update the data, you need to repeatedly perform seek operations on the entire disk. 3. sync system calling: System calling will refresh all kernel buffers (data blocks, pointer blocks, metadata, etc.) containing updated file information to the disk. Function declaration:
#includevoid sync(void);
Detail: If the kernel buffer with changed content is not explicitly synchronized to the disk within 30 s, a long-running kernel thread will ensure that it is refreshed to the disk. This approach aims to avoid the long-standing inconsistency between the buffer zone and related disk file content. 4. Synchronize all writes: When O_SYNC calls the open () function, any subsequent output will be synchronized if the O_SYNC flag is specified.
fd = open(pathname, O_WRONLY | O_SYNC);
Role: After open is called, each write call will automatically refresh the file data and metadata to the disk, that is, perform the write operation according to the requirements of Synchronized IO file integrity completion. 5. O_SYNC performance comparison scenario: Write 1 million bytes to a new file created on an ext2 File System to compare the write time. Comparison result: the conclusion is as follows:
  • Using the O_SYNC flag (or frequently calling fsync (), fdatasync () or sync () has a significant impact on performance.
  • The direct performance decline is a significant increase in the total running time: When the buffer is 1 byte, the running time is more than 1000 times different.
  • There is a huge difference between the total usage and CPU time when the write operation is executed using the O_SYNC mark (1030-98.8 ), the reason is that the system blocks the program when passing data to the disk in each buffer zone.
3. I/O buffer hierarchies first summarize the two buffers used by the stdio function library and kernel. Then, the two-layer buffer mechanism and various buffer types are illustrated in the figure.
  • First, use the stdio library to pass user data to the stdio buffer zone, which is located in the user-mode memory zone.
  • When the buffer zone is full, the stdio library calls the write () System Call to transmit data to the kernel high-speed buffer zone, which is located in the kernel-mode memory zone.
  • Finally, the kernel initiates disk operations.
Shows the hierarchy.

 

In, the dotted box on the left is a call that can explicitly force refresh Various Buffers at any time. The right side shows the call that promotes automatic Refresh: enable synchronization by disabling the stdio buffer and in the system call of the file output class, so that each write () call is immediately refreshed to the disk. 4. Summary The buffer of input and output data is completed by the kernel and stdio library. Sometimes you may want to block buffering, but you need to understand its impact on application performance. Various system calls and database functions can be used to control the kernel and stdio buffer, and a one-time buffer Refresh can be executed. In Linux, The O_DIRECT identifier of open () allows a specific application to skip the buffer cache.

Although the title is changed to the UNIX advanced environment (xx), I plan to replace the book I have read and referenced with the Linux/UNIX System Programming Manual. I feel that this book is updated.

I am very busy at work. I spent most of my time on weekends outside, taking photos during running, even though I was just reading this article.

 

Refer:

Linux/UNIX System Programming Manual (previous)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.