I/0 Model

Last Update:2018-12-06 Source: Internet

Author: User

Tags sendfile

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://hi.baidu.com/ywdblog/blog/item/85f0a2991623ae0e6e068c9a.html

High-performance servers generally adopt a non-blocking network I/O, single-process event-driven architecture. The core of this architecture is the event-driven mechanism. Currently, select, poll, and epoll are commonly used in Linux to implement event-driven calls. Select and poll are traditional UNIX event-driven mechanisms, but they have major drawbacks: in a large number of concurrent connections, if there are many cold connections, the performance of select and poll will decrease by a linear increase in the number of concurrency, because the caller needs to check whether there is an event for each connection every time the select and poll returns, when the number of connections is large, the system overhead will be very large. In addition, each time the select and poll return, a large amount of data is copied from the kernel to the user space. This overhead is also great. Therefore, select and poll are not the best solutions for network I/O processing.

Therefore, starting from linux2.5, a new system called epoll emerged, which not only can complete the select/epoll function, but also has higher performance and better functionality. Advantages of epoll: 1. Only file descriptor information of an event is returned at a time, so that the caller does not need to traverse the entire file descriptor queue, and the system does not need to copy a large amount of useless data from the kernel to the user space; 2. epoll allows you to set different event triggering Methods: edge triggering and level triggering, providing you with flexibility to use epoll. Epoll also has disadvantages: 1. In the case of few cold connections, the performance has no advantages over select and poll; 2. The current implementation is incomplete, epollhup events are not supported. In the case of edge triggering, the client network connection cannot be actively disconnected; 3. The Edge triggering method must be handled with caution. OtherwiseProgramThe function may not be completed. When the level trigger mode is adopted, the caller should avoid idling of epoll_wait.

I. Disk Io

Linux disk I/O adopts synchronous and asynchronous modes. Synchronization mode is a common read/write System Call. Both systems are considered to be low-speed system calls. execution may block the process. In a single-process server, this will degrade the system performance. Therefore, Linux provides asynchronous disk Io. Currently, there are two main asynchronous Io solutions in Linux: 1. posix aio (aio_read, aio_write,…) implemented by glibc ,....), The thread or real-time signal is used to notify Io completion. Users can also use the polling method to check whether Io is completed. 2. Linux AIO (io_setup, io_getevents,…) implemented by kernel ,...), The event notification mechanism (io_getevents) adopts a syntax similar to select. All of them share the same thing. They use multiple threads to process blocked disk Io and make it look asynchronous. The main differences between the two methods are: 1. posix aio is implemented in the user mode while Linux AIO is implemented in the kernel; 2. The event notification mechanism is different, as described above. The Performance Comparison Between posix aio and Linux AIO has not been done yet. The disadvantage of the asynchronous disk I/o mechanism in Linux is that both POSIX AIO and Linux AIO will create threads when processing new IO requests, which is time-consuming.

Linux disk I/O can also use memory ing. The advantage is: 1. It can avoid copying data between the user space and the kernel space; 2. If small files are merged into large files, memory ing for large files can reduce the number of open system calls. However, the disadvantages of memory ing are also obvious: 1. the usage of large files above 3G cannot be mapped to the memory; 2. If the accessed page is not in the memory, if you need to perform a page change operation, the process will be blocked. Although reasonable use of mincore can avoid process blocking, it is difficult to implement programming.

If data is transmitted from a disk to the network, sendfile can also be used. sendfile can directly transfer data from the disk to the network in the kernel space, avoiding memory replication.

2. Simultaneously process network I/O and disk I/O

The data server needs to process a large number of concurrent network connections at the same time, and processing these network connections requires a large number of disk Io, such as reading files. In this case, we propose two models:

1. Separate network Io and disk Io. This model uses a relayserver to process a large number of concurrent network connections, and another process (dataserver) to process disk Io. The two processes exchange information through one TCP connection, when a client request arrives, the relayserver forwards the request to the dataserver. The dataserver reads data from the disk. After reading the data, the data is sent back to the relayserver, And the relayserver forwards the request to the client.

2. Do not separate network Io and disk Io. This model means that only one process (dataserver) is used to simultaneously process disk I/O and network I/O.

The following describes the implementation solutions for these two models:

No matter which model is used, it involves processing a large number of disk Io. Based on the previous discussions on Linux disk I/O, we have tested the following disk I/O implementation methods:

1. posix aio, real-time signal notification. Because the disk I/O is usually completed in a short period of time, the process is frequently interrupted by signals, and the system calls are frequently and automatically restarted, resulting in a reduction in the overall performance. In addition, because parameters cannot be transmitted to the signal processing function, it is difficult for the signal processing function to implement more complex functions.

2, posix aio, thread notification. Each time an I/O request is completed, the system creates a thread with a high overhead.

3, posix aio, round robin. It is difficult to combine with the epoll logic of dataserver. It only requires continuous polling and has low performance.
Linux AIO is not tested and its performance is unknown. Memory ing cannot process large files and is not used.

From the above view, the asynchronous Io mechanism provided by Linux cannot meet our needs, so we have implemented asynchronous Io at the user layer. The main framework of asynchronous Io is to create threads in advance to form a thread pool. Processing Process: when the user sends an IO request, the worker thread in the thread pool is awakened, and the worker thread uses a general read/write System Call to process Io (to ensure that the disk Io can be completed, we use readn and writen). After I/O is completed, the worker thread sends a notification to epoll through an MPS queue and attaches the user Io request to the Completion queue. After epoll receives the notification, remove from the Completion queue and perform corresponding operations. According to our performance comparison, this method has higher performance than the previous implementations and is easier to implement. Therefore, we use this method for disk I/O processing, but the different details of the non-separation model and the separation model have changed.

For the non-separation model, our system uses epoll to manage the socket Descriptor and one pipeline Descriptor (Disk Io uses pipelines to notify epoll ). Epoll only processes socket read events, receives a request from the client, and then delivers the request to the disk Io. Disk I/O uses a thread pool. The difference is that the worker thread uses sendfile when reading data and directly sends the disk data to the corresponding network socket, if sendfile returns an eagain error or the data has not been sent, update the request and put the request back into the queue.

In the separated model, dataserver also uses epoll to manage socket descriptors and pipeline descriptors that communicate with disk Io. Epoll needs to handle socket read and write events, and is responsible for receiving requests from relayserver and sending data to relayserver. The disk Io is consistent with the previous model.

Comparison between the non-separation model and the separation model:

1. The performance of the non-separation model and the separation model is basically consistent with that of the 200 concurrent connections. The test shows that when the size of the requested read file is large (more than KB, the network throughput of the two models is almost 20-30 Mb/s ). In addition, the network throughput increases with the size of the requested file. In this case, the disk Io seriously lags behind the speed of network Io and CPU, so the network throughput is determined by the speed of disk Io. When a large number of files are read, the disk throughput is high based on the disk I/O features. Therefore, the larger the request file, the better the network throughput. When the size of the requested file is small (such as 16 KB and 64 KB), if the OS File Cache hit rate is high, the disk Io has almost no effect on the entire system, the network throughput can basically reach the physical upper limit of the network card. However, when the OS File Cache hit rate is very small, the overall system performance is subject to disk Io because the disk Io is difficult to efficiently process the reading of small pieces of data, in addition, the network throughput and Disk Throughput are low (4 Mb/s ). In general, no matter which model is used, disk Io is a bottleneck.

2. Advantages and disadvantages of a non-separated model. Advantages: 1. It is easier to implement a non-separated model. You can use sendfile and other efficient system calls. 2. Because you do not need to allocate a large amount of memory buffer to cache data read from the disk for each concurrent connection, the support for concurrent connections is not limited by the memory size, which is highly scalable. Disadvantages: poor scalability. For example, it is difficult to write files to the client. In addition, if complicated protocol analysis is added, the performance may be greatly reduced.

3. Advantages and disadvantages of the separation model. Advantage: good scalability, dedicated disk I/O optimization, without having to consider its impact on processing a large number of concurrent network connections at the same time; for network I/O, you can also add analysis of complex protocols, you do not have to consider the impact on disk I/O processing at the same time. Disadvantages: the implementation is more complex, and the implementation scheme directly affects the system performance. The user receives data at an uneven rate. Sometimes the data does not arrive for a long time, but sometimes it will arrive at a lot of data instantly, in this way, the user experience of streaming media applications will be poor.

Iii. Conclusion

To meet our needs, we decided to use the separation model. Because the separation model has good scalability and is expected to achieve high efficiency through optimization.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More