linux-disk and network IO working mode analysis

Last Update:2016-12-28 Source: Internet

Author: User

Tags sendfile

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PIO and DMA

It is necessary to simply talk about the way data is transferred between the slow I/O device and the memory.

Pio we take the disk, long ago, the data transfer between the disk and memory is CPU-controlled, that is, if we read the disk file into memory, the data will be forwarded through the CPU storage, this way is called Pio. Obviously this approach is very unreasonable and requires a lot of CPU time to read the file, causing the system to almost stop responding when the file is accessed.
DMA later, DMA (direct memory access) instead of PIO, it can exchange disk and memory data directly without the CPU. In DMA mode, the CPU only needs to release instructions to the DMA controller, so that the DMA controller can handle the data transmission, the DMA controller transmits the data through the system bus, transmits and notifies the CPU, which greatly reduces the CPU occupancy, and saves the system resources. The difference between the transmission speed and the PIO is not very obvious, because it depends on the speed of the slow device.

To be sure, the PIO mode of the computer we have now rarely seen.

Standard file access Methods

Specific steps:

When the application calls the read interface, the operating system checks that the cache in the kernel has no required data and if it is already cached, it is returned directly from the cache, if not, read from the disk and then cached in the operating system's cache.

When the application calls the write interface, the data is copied from the user address space to the cache in the kernel address space, when the write is completed for the user program, and when it is written to disk, it is up to the operating system, unless the Sync Synchronization command is called.

Memory Mapping ( 减少数据在用户空间和内核空间之间的拷贝操作,适合大量数据传输)

The Linux kernel provides a special way to access disk files, which can associate an in-memory block of address space with the disk file we want to specify, thus converting our access to this memory into a disk file access, a technique known as memory Mapping.

The operating system associates an area of memory with a file on disk, and when it accesses a piece of data in memory, it is converted to a section of data that accesses the file. The purpose of this approach is also to reduce data replication operations from the kernel space cache to the user space cache, because the data for both spaces is shared .

Memory mapping refers to the location of the file on the hard disk in the process logical address space in the same size as a block of area one by one, when you want to access the memory of a piece of data, converted to access a file of a segment of data. The purpose of this approach is also to reduce the copying of data between user space and kernel space. When a large amount of data needs to be transmitted, the use of memory mapping to access the file will be better efficiency.

When you use a memory-mapped file to process a file that is stored on disk, you no longer have to perform I/O to the file, which means that you will no longer have to request and allocate a cache for the file when it is processed, and all file cache operations are managed directly by the system, due to the cancellation of loading file data into memory, The steps of data from memory to file writeback and freeing memory blocks make the memory-mapped file play a significant role in processing large data volumes .

Access steps

In most cases, using a memory map can improve disk I/O performance by not having to use a system call such as read () or write () to access the file, but rather by mmap () system calls to establish the association of Memory and disk files, and then to access the file as freely as it accesses memory. There are two types of memory mappings, shared and private, which can synchronize any memory writes to disk files, and all processes that map the same file share any process changes to the mapped memory, the latter mapping files can only be read-only files, so it is not possible to synchronize the memory write to the file, and multiple processes do not share modifications. Obviously, shared memory mapping is inefficient because if a file is mapped by many processes, then each modification synchronization will cost a certain amount of overhead.

Direct I/O ( 绕过内核缓冲区,自己管理I/O缓存区)

In Linux 2.6, memory-mapped and direct-access files are not inherently different because data is replicated two times from the process user-state memory space to the disk, that is, between the disk and the kernel buffer and between the kernel buffer and the user-state memory space. The purpose of introducing kernel buffers is to improve the access performance of disk files, because when a process needs to read a disk file, if the contents of the file are already in the kernel buffer, you do not need to access the disk again, and when the process needs to write data to the file, it is actually written to the kernel buffer to tell the process has been written successfully, The actual write to disk is deferred through a certain strategy.

However, for some more complex applications, such as database servers, in order to fully improve performance, they want to bypass the kernel buffer, by themselves in the user-state space to implement and manage I/O buffers, including caching mechanisms and write latency mechanism, to support the unique query mechanism , For example, the database can improve the query cache hit rate based on a more reasonable strategy. Bypassing the kernel buffers, on the other hand, can also reduce the overhead of system memory because the kernel buffers themselves are using system memory.

The application directly accesses the disk data without passing through the operating system kernel data buffer, which is done to reduce the duplication of data from the kernel buffer to the user program cache at a time. This approach is typically in the database management system implemented by the application in the cache management of the data.
The disadvantage of direct I/O is that if the data being accessed is not in the application cache, then each time the data is loaded directly from the disk, this direct load can be very slow. Typically, direct I/O is used in conjunction with asynchronous I/O to achieve better performance.

Access steps

Linux provides support for this requirement, that is, by adding the parameter option O_direct to the open () system call, files opened with it can bypass direct access to the kernel buffers, which effectively avoids the extra time overhead of CPU and memory .

By the way, an option similar to O_direct is O_sync, which is only valid for write data, which writes the data written to the kernel buffer immediately to disk, minimizing the loss of data when the machine fails, but it still passes through the kernel buffer.

sendfile/0 copies ( 网络I/O,kafka用到此特性The normal network transfer steps are as follows:

1) The operating system copies data from disk to the page cache of the operating system kernel
2) app copies data from the kernel cache to the app's cache
3) application writes the data back into the socket cache of the kernel
4) The operating system copies the data from the socket buffer to the network card cache and sends it over the

1. Copy data to kernel mode via DMA (Direct Memory Access) When invoking the read system call
2. Then the CPU control will copy the kernel mode data to buffer in user mode.
3. After the read call is complete, the write call first copy the data in buffer in user mode to the socket buffer in kernel mode
4, finally through the DMA copy of the kernel mode in the socket buffer data copy to the network card device transfer.

From the above process can be seen, the data in vain from kernel mode to user mode walk a lap, wasted two times copy, and these two times copy is CPU copy, that is, CPU resources.

Sendfile

Transferring files via Sendfile requires only one system call, when Sendfile is called:
1. First read data from disk to kernel buffer via DMA copy
2. Then copy the data from kernel buffer to Sokcet buffer via CPU copy
3, finally through the DMA copy to copy the data in the socket buffer in the network card buffer sent
Sendfile compared to the Read/write mode, the CPU copy is switched once by one mode. However, it is not necessary to copy the data from kernel buffer to the socket buffer from the above process.

To do this, the Linux2.4 kernel has improved the sendfile, as shown in

The improved processing process is as follows:
1. DMA copy Copy the disk data to kernel buffer
2. Append the position and offset of the data currently being sent to the socket buffer in kernel buffer
3. DMA gather copy copy the data from kernel buffer directly to the NIC based on the position and offset in the socket buffer.
After this process, the data is transferred from the disk 2 times after copy. (In fact, this zero copy is for the kernel, the data is zero-copy in kernel mode).
Many of the current high-performance HTTP servers introduce sendfile mechanisms, such as NGINX,LIGHTTPD.

Filechannel.transferto ( Java中的零拷贝)

In Java NiO, the Filechannel.transferto (long position, long count, Writeablebytechannel target) method transfers the data in the current channel to the destination channel target, In Linux systems that support zero-copy, the implementation of TransferTo () relies on the Sendfile () call.

Traditional way of comparing 0 copies:

The entire data path involves 4 data replication and 2 system calls, and if you use Sendfile to avoid multiple data replication, the operating system can directly copy data from the kernel page cache to the network card cache , which can greatly speed up the process.

Most of the time, we are requesting the Web server static files, slices, stylesheets, etc., according to the previous introduction, we know that in the process of processing these requests, the disk file data to go through the kernel buffer, and then reach the user memory space, because it does not require any processing of static data, So they are sent to the corresponding kernel buffer of the NIC and then sent to the NIC.

The data goes out of the kernel, loops around, and goes back to the kernel, without any changes, and looks like a waste of time. In the Linux 2.4 kernel, an attempt was made to introduce a kernel-level Web server program called KHTTPD, which only handles requests for static files. The purpose of introducing it is that the kernel wants the requested processing to be done in the kernel as much as possible, reducing the switching of the kernel state and the cost of user-state data replication.

At the same time, Linux provides this mechanism to the developer through system calls, which is the sendfile () system call. It can transfer specific portions of a disk file directly to the socket descriptor on behalf of the client, speeding up the request of a static file, and also reducing the CPU and memory overhead.

Support for Sendfile is not available in OpenBSD and NetBSD. Through the strace tracking saw Apache in processing 151 bytes of small files, using the mmap () system call to implement memory mapping, but when Apache processing large files, memory mapping will result in a large memory overhead, outweigh the costs , So Apache uses sendfile64 () to transfer files, and Sendfile64 () is an extended implementation of Sendfile (), which is available in versions later than Linux 2.4.

This does not mean that sendfile can play a significant role in any scenario. for the request of a small static file, the role of sendfile is not so important , through the stress test, we simulate 100 concurrent users request 151 bytes of static files, whether the use of Sendfile throughput rate is almost the same, visible When processing small file requests, the portion of the data that is sent during the entire process is much smaller than the large file request, so the optimization effect for this part is naturally not obvious .

Docs

Zero-copy&sendfile Analysis

Turn from: 1190000007692223

linux-disk and network IO working mode analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More