Introduction to the direct I/O mechanism in Linux

Source: Internet
Author: User
Tags cpu usage

https://www.ibm.com/developerworks/cn/linux/l-cn-directio/

For traditional operating systems, normal I/O operations are typically cached by the kernel, which is called cache I/O. The file access mechanism described in this article is not cached by the operating system kernel, the data is transmitted directly in the disk and application address space, so the file access mechanism is referred to as direct I/O. A file access mechanism is provided in Linux, and direct I/O is a very efficient means for applications that store I/O caches in the user address space. This article will discuss the design and implementation of the direct I/O technology in Linux based on the 2.6.18 version of the kernel. What is cache I/O (Buffered I/O)

Cache I/O is also known as standard I/O, and most file system default I/O operations are cache I/O. In the Linux cache I/O mechanism, the operating system caches the I/O data in the file system's page cache, which means that the data is copied into the buffer of the operating system kernel before it is copied from the operating system kernel buffer to the application's address space. caching I/O has the following advantages:

    • Cache I/O uses the operating system kernel buffers to some extent separating the application space from the actual physical device.
    • Cache I/O can reduce the number of read disks, which improves performance.

When an application attempts to read a piece of data, if the piece of data is already in the page cache, the piece of data can be returned to the application immediately without having to go through the actual physical read disk operation. Of course, if the data is not stored in the page cache before the application is read, the data needs to be read from disk to the page cache first. for a write operation, the application writes the data first to the page cache, and whether the data is immediately written to disk depends on the write mechanism used by the application: If the user is using a synchronous write mechanism (synchronous writes), then the data is immediately written back to disk, The application waits until the data is written , and if the user is using a deferred write mechanism (deferred writes), then the application does not have to wait until the data is all written back to the disk, so that the data is written to the page cache. In the case of a deferred write mechanism, the operating system periodically brushes the data that is placed in the page cache to disk. Unlike the asynchronous write mechanism (asynchronous writes), the deferred write mechanism does not notify the application when the data is fully written to disk, and the asynchronous write mechanism is returned to the application when the data is fully written to disk. So the delay write mechanism itself is the risk of data loss, and the asynchronous write mechanism will not have this concern.

Disadvantages of Cache I/O

In the cache I/O mechanism, DMA can read data directly from disk to the page cache, or write data directly back to disk from the page cache, rather than directly between the application address space and the disk, so Data copying takes place between the application address space and the page cache during transmission, and the CPU and memory overhead of these data copy operations is very large.

For some special applications, bypassing the operating system kernel buffers and transferring data directly between the application address space and the disk will get better performance than using the operating system kernel buffers, which is one of the self-caching applications mentioned in the section below.

Self-caching applications (self-caching applications)

For some applications, it will have its own data caching mechanism, for example, it caches data in the application address space, which does not require the use of cache memory in the operating system kernel at all, and such applications are called Self-caching applications (self-caching applications). A database management system is a representation of this type of application. Self-caching applications tend to use logical representations of data rather than physical representations, and when system memory is low, self-caching applications allow the logical caching of this data to be swapped out instead of the actual data being swapped out on the disk. The self-caching application knows the semantics of the data to be manipulated, so it can take a more efficient cache substitution algorithm. Self-caching applications are likely to share a chunk of memory across multiple hosts, so self-caching applications need to provide a mechanism that effectively invalidates the cached data of the user's address space to ensure the consistency of the application's address space cache data.

Cache I/O is obviously not a good choice for self-caching applications. This leads us to the direct I/O technology in Linux, which is an important introduction to this article. The direct I/O technology in Linux is well suited for applications such as self-caching, which omits the use of the operating system kernel buffers in cache I/O technology and transmits data directly between the application address space and the disk. This allows the self-caching application to omit the complex system-level cache structure, while executing the program's own defined data read and write management, thereby reducing the impact of system-level management on application Access data . In the following section, we will focus on the design and implementation of the direct I/O mechanism provided in Linux, which provides good support for self-caching applications.

Features of direct I/O technology advantages of direct I/O

The main advantage of direct I/O is that by reducing the number of copies of the operating system kernel buffers and the application address space, the CPU used for file reads and writes and memory bandwidth consumption are reduced. This is a good choice for some special applications, such as self-caching applications. If the amount of data to be transferred is large, the data is transferred using direct I/O, without the participation of the operating system kernel address space copy operation, which will greatly improve performance.

Potential problems with direct I/O

Direct I/O does not always provide a satisfying performance leap. The overhead of setting up direct I/O is very high, and direct I/O does not provide the benefit of cached I/O. The read operation of the cache I/O can fetch data from the cache, while the read data operation of the direct I/O causes a synchronous read of the disk, which results in performance differences and causes the process to take a long time to execute, and for write data operations, the use of direct I/O requires write () The system call is executed synchronously, or the application will not know when it can use its I/O buffers again. Similar to direct I/O read operations, direct I/O writes can also cause the application to shut down slowly. Therefore, when applications use direct I/O for data transfer, they are typically used in conjunction with asynchronous I/O.

Summarize

Direct I/O access to files in Linux reduces CPU usage and memory bandwidth consumption, but direct I/O can sometimes negatively affect performance. So be sure to have a clear understanding of the application before using direct I/O, and only consider using direct I/O if the overhead of setting up buffered I/O is significant. Direct I/O is often used in conjunction with asynchronous I/O and is not described in detail in this article, and interested readers can refer to the relevant documentation in Linux 2.6.

Introduction to the direct I/O mechanism in Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.