Multi-Channel IO multiplexing model select Epoll, etc.

Last Update:2018-07-26 Source: Internet

Author: User

Tags data structures epoll int size

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Synchronous blocking IO spends too much time waiting for data to be ready, while traditional synchronous non-blocking IO does not block processes, but combining polling to determine whether data is ready still consumes a lot of CPU time.

Multi-Path IO multiplexing provides a high-performance solution for a large number of file descriptors to be ready for inspection.

Select

Select was born in 4.2BSD and is supported on almost all platforms, and its good cross-platform support is one of its main and few advantages.

The disadvantage of a select (1) The maximum number of file descriptors that a single process can monitor (2) a select needs to replicate a large number of handle data structures, resulting in huge overhead (3) Select returns a list containing the entire handle. The application needs to traverse the entire list to find out which handles have occurred (4) the trigger for select is a horizontal trigger, and if the application does not perform an IO operation on a file descriptor that is already ready, then each select call will then notify the process of the file descriptor. The opposite way is the edge trigger.

Poll

Poll was born in Unix System V Release 3, when At&t had stopped source code authorization for UNIX, so it was clear that the BSD select was not directly used, so at&t itself implemented a poll that didn't make much difference to the select.

Poll and select are twins with different names, except that there is no limit on the number of files to monitor, and the following 3 disadvantages of select Apply to poll.

Faced with the flaws of select and poll, different OS make different solution, it is a blossom of flowers. But they have at least completed the following two points beyond, one is the kernel long-term maintenance of a list of event concerns, we only need to modify the list, and do not need to copy the handle data structure into the kernel, and the second is to directly return to the list of events, rather than all the list of handles.

/dev/poll

Sun has proposed a new implementation in Solaris that uses a virtual/dev/poll device that the developer can add to a file descriptor to monitor and then wait for event notifications via IOCTL ().

/dev/epoll

A device named/dev/epoll appears in the Linux2.4 as a patch, providing functionality similar to/dev/poll and, to some extent, using mmap to improve performance.

Kqueue

FreeBSD implements Kqueue, which supports both horizontal and edge triggers, and performance is very close to the epoll mentioned below.

Epoll

Epoll was born in the Linux 2.6 kernel and is considered to be the best Linux2.6 multiplex Io multiplexing method under the performance.

int epoll_create (int size)

int epoll_ctl (int epfd, int op, int fd, struct epoll_event *event)

int epoll_wait (int epfd, struct epoll_event *events, int maxevents, int timeout)

Epoll_create creates a kernel event table that is equivalent to creating Fd_set epoll_ctl modifying this table, Fd_set waits for operations epoll_wait wait for I/O events, equivalent to Select/poll functions

Epoll supports both horizontal and edge triggers, which theoretically have higher edge-triggering performance, but is more complex to use because any unexpected loss event can cause a request-handling error. Nginx uses the epoll edge-triggering model.

Here's the difference between the horizontal trigger and the edge trigger ready notice, which comes from the computer hardware design. Two words. The difference is that as long as the handle satisfies a certain state, the level trigger will give a notification, and the Edge trigger will only notify when the handle state changes. For example, a socket after a long wait to receive a period of 100k of data, both trigger will send a ready notification to the program. Assuming that the program reads 50k data from this socket and calls the listener again, the horizontal trigger will still issue a ready notification, and the edge trigger will not be notified and trapped for a long time because the socket "has data readable" status.

So when using an edge-triggered API, be aware that each time you read the socket back to Ewouldblock

=================================================================================

Http://bbs.linuxpk.com/thread-43628-1-1.html

Let's first introduce the next nginx nginx:
Supports high concurrent connections. The official test is a 5w concurrent connection but can be made into the actual production of 2-4w concurrent connections, thanks to Nginx using the latest epoll (Linux 2.6 kernel) and Kqueue (FreeBSD) network I/O model. And Apache uses the traditional select model, Its relatively stable prefork mode is a multi process mode, which needs to be derived from time to time, and the CPU and other server resources consumed are much higher than the nginx.

Select and Epoll inefficient reasons: Select is polling, Epoll is trigger, so it is efficient. In that case, it would be a good idea to see the ghost. Let us remember this sentence objectively.

First say select:
1.Socket Quantity Limit: The number of sockets that can be manipulated by this mode is determined by fd_setsize, and the kernel defaults to 32*32=1024.
2. Operation restrictions: Through the traversal of fd_setsize (1024) socket to complete the scheduling, regardless of which socket is active, are traversed again.

After said poll:
The number of 1.Socket is almost unlimited: the FD list for the socket in this mode is saved by an array, not limited in size (default 4k).
2. Operation restrictions: With SELECT.

Besides: Epoll:
1.Socket Quantity Unlimited: Same poll
2. Unrestricted operation: Based on the reflection mode provided by the kernel, when there is active socket, the kernel accesses the socket's callback and does not need to traverse polling. But when all the sockets are active, all the callback are awakened, which leads to competition for resources. Since all sockets are processed, traversal is the simplest and most effective way to implement them.

For example:
for IM servers, servers and servers are long links, but the number is not much, generally a 60\70, such as the use of ice architecture design, but the request is very frequent and intensive, at this time through reflection wake-up callback not necessarily than with select to traverse the process better.
For the Web portal Server, is the browser client-initiated HTTP short link request, a large number of good sites every minute of thousands of requests come over, at the same time the server side there are more idle waiting timeout socket, it is not necessary to all the sockets are traversed processing , because the requests that wait for timeouts are mostly, so it would be better to use Epoll.

Supports a process to open a large number of socket descriptors
Select the most intolerable is a process opened by the FD is a certain limit, set by Fd_setsize, the default value is 1024. It's obviously too little for IM servers with tens of thousands of connections that need support. The first time you can choose to modify the macro and recompile the kernel, but the data also points to a decrease in network efficiency and the choice of multiple-process solutions (the traditional Apache solution), but although the cost of creating a process on Linux is relatively small, it is still noticeable, Plus, data synchronization between processes is far less efficient than synchronization between threads, so it's not a perfect solution. However, Epoll does not have this restriction, it supports the maximum number of FD can open file, this number is generally far greater than 2048, for example, in 1GB memory machine about 100,000, the number can be cat/proc/sys/fs/file-max to see, In general, this number is very much related to system memory.
IO efficiency does not linearly decrease as the number of FD increases
Another Achilles heel of traditional select/poll is when you have a large socket set, but due to network latency, only a portion of the socket is "active" at any one time, but select/poll each call will scan the entire set linearly. resulting in a linear decline in efficiency. But Epoll does not have this problem, it will only operate on the "active" socket---this is because in the kernel implementation Epoll is based on the callback function above each FD implementation. Then, only "active" socket will be active to invoke the callback function, the other idle state socket will not, in this case, Epoll implemented a "pseudo" AIO, because this time the driving force in the OS kernel. In some benchmark, if all the sockets are basically active---such as a high-speed LAN environment, Epoll is no more efficient than select/poll, on the contrary, if you use too much epoll_ctl, there is a slight decrease in efficiency. But once you use idle connections to simulate a WAN environment, epoll is far more efficient than select/poll.
Use MMAP to accelerate message delivery between the kernel and user space.
This actually involves the concrete realization of the epoll. Whether it is select,poll or epoll need the kernel to notify users of the FD message, how to avoid unnecessary memory copies is very important, in this case, epoll through the kernel in the user space mmap the same memory. And if you want me to focus on epoll from the 2.5 kernel, I will not forget to hand mmap this step.
Kernel Fine tuning
This is not really a epoll advantage, but the advantages of the entire Linux platform. Maybe you can doubt the Linux platform, but you can't avoid the Linux platform giving you the ability to fine-tune the kernel. For example, the kernel TCP/IP protocol stack uses the memory pool to manage the sk_buff structure, so the size of the memory pool (Skb_head_pool) can be dynamically adjusted during runtime---through the echo xxxx>/proc/sys/net/core/hot_ List_length complete. For example, the 2nd parameter of the Listen function (TCP completes the packet queue length of 3 handshake) can also be adjusted dynamically according to your platform memory size. Even more, try the latest NAPI network card driver architecture on a special system with a large number of packets and a small size for each packet itself.

Why select mode is inefficient
Select mode inefficiencies are determined by the definition of SELECT, regardless of the operating system implementation, and any kernel must do rounds to realize the select in order to know the condition of these sockets, which consumes the CPU. Also, when you have a large socket set, although only a small portion of the socket is "active" at any one time, you have to fill all the sockets into a fd_set, which consumes some CPU, and when the select returns, You may also need to do "contextual mapping" when dealing with a business, and there will also be some performance impact, so select is relatively inefficient than epoll.
Epoll's application scenario is a large number of sockets, but active is not very high.
And Kqueue, there are actually a lot of servers based on BSD development
Kqueue and Epoll are similar, and are said to be slightly more efficient, but not compared

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More