Select epoll, multiplexing model, etc.

Source: Internet
Author: User

Synchronous blocking Io takes too much time while waiting for data to be ready. Although traditional non-blocking synchronous IO does not block the processRound RobinTo determine whether the data is ready will still consume a lot of CPU time.

Multi-channel Io multiplexing provides a high-performance solution to check the readiness of a large number of file descriptors.

 

Select

Select was born on 4.2bsd and is supported on almost all platforms. Its good cross-platform support is one of the few advantages.

Disadvantages of select (1) Maximum number of file descriptors that a single process can monitor (2) Select needs to copy a large number of handle data structures, resulting in huge overhead (3) the SELECT statement returns a list containing the entire handle. The application needs to traverse the entire list to find out which handles have an event (4) the SELECT statement is triggered horizontally, if the application does not complete the I/O operation on a ready file descriptor, then each select call will still notify the process of these file descriptors. Edge triggering is the opposite method.

 

Poll

Poll was born on UNIX System V Release 3. At that time, at&t had stopped UNIX source code authorization, so it apparently did not directly use the BSD select, so at&t implemented a poll that is no different from select.

Poll and select are twin brothers with different names. Apart from the absence of the limit on the number of monitoring files, the three disadvantages behind the SELECT statement also apply to poll.

In the face of the defects of select and poll, different operating systems have made different solutions, which can be said to be a hundred flowers. However, they have completed at least the following two points. First, the kernel maintains an event concern list for a long time. We only need to modify this list, instead of copying the handle data structure to the kernel; the second is to directly return the event list, rather than the list of all handles.

/Dev/poll

Sun proposed a new implementation scheme in Solaris, which uses a virtual/dev/poll device. developers can add the file descriptor to be monitored to this device, and then use IOCTL () to wait for the event notification.

/Dev/epoll

The device named/dev/epoll appears in linux2.4 as a patch. It provides functions similar to/dev/poll and uses MMAP to improve performance.

 

Kqueue

FreeBSD implements kqueue and supports horizontal and edge triggering. The performance is very similar to epoll mentioned below.

 

Epoll

Epoll was born in the Linux 2.6 kernel and is recognized as the best multi-channel Io multiplexing method in linux2.6.

 
Int epoll_create (INT size)
 
Int epoll_ctl (INT epfd, int op, int FD, struct epoll_event * event)
 
Int epoll_wait (INT epfd, struct epoll_event * events, int maxevents, int timeout)
  • Epoll_create creates the follow event table in the kernel, which is equivalent to creating fd_set
  • Epoll_ctl modifies this table, which is equivalent to fd_set and other operations.
  • Epoll_wait waits for an I/O event to occur, which is equivalent to the select/poll function.

Epoll supports horizontal triggering and edge triggering. In theory, edge triggering has a higher performance, but is more complex, because any unexpected loss event will cause request processing errors. Nginx uses the epoll edge trigger model.

The difference between the horizontal trigger and the edge trigger readiness notification comes from the computer hardware design. The difference between them is that as long as the handle meets a certain State, horizontal triggering will send a notification, and edge triggering will only send a notification when the handle state changes. For example, if a socket receives KB of data after a long wait, both methods will send a ready notification to the program. Assuming that the program reads 50 K data from this socket and calls the listener function again, the horizontal trigger will still send a ready notification, the edge trigger will wait for a long time because the "Data readable" Status of the socket does not change and no notification is sent.

Therefore, when using edge-triggered APIs, note that each time the socket returns ewouldblock

 

========================================================== ========================================================== =

 

Http://bbs.linuxpk.com/thread-43628-1-1.html

Let's first introduce nginx:
Supports high-concurrency connections. the official test shows 5 W concurrent connections, but 2-4 W concurrent connections can be made in actual production, thanks to nginx's use of the latest epoll (Linux 2.6 kernel) and kqueue (FreeBSD) network I/O models. apache uses the traditional select model. Its stable prefork mode is a multi-process model, which requires frequent distribution of child processes, and consumes more CPU and other server resources than nginx.

The reason for the poor efficiency of select and epoll is that select is round-robin and epoll is triggered, so the efficiency is high. In this case, we can understand this sentence. Well, now we can remember this sentence objectively.

Let's talk about select:
1. Socket quantity limit: the number of sockets that can be operated in this mode is determined by fd_setsize. The default kernel value is 32*32 = 1024.
2. Operation restriction: The fd_setsize (1024) socket is traversed to complete the scheduling. No matter which socket is active, it is traversed.

Poll:
1. The number of sockets is almost unlimited: The FD list corresponding to the socket in this mode is saved by an array, and the size is not limited (4 K by default ).
2. Operation restrictions: Same as select.

Besides: epoll:
1. unlimited number of sockets: Same as poll
2. unlimited operation: Based on the reflection mode provided by the kernel, when an active socket exists, the kernel accesses the callback of the socket and does not need to traverse the polling. however, when all sockets are active, all callback will be awakened, which will lead to resource competition. since all sockets need to be processed, traversal is the simplest and most effective implementation method.


For example:
For im servers, the server and the server are long connections, but the number is not large. Generally, one server is 60 \ 70. For example, the architecture of ice is used, but requests are quite frequent and intensive, at this time, it is not necessarily better to wake up callback through reflection than to traverse with select.
For web portal (portal) servers, they are all HTTP short link requests initiated by browser clients. The number of requests is very large, and even better websites may send thousands of requests every minute, at the same time, there are more idle waiting timeout sockets on the server end. At this time, there is no need to traverse all the sockets for processing, because most of the requests waiting for timeout will be better, so epoll will be used.

Allows a process to open a large number of socket descriptors.
The most intolerable thing about the SELECT statement is that the FD opened by a process has certain limitations, which are set by fd_setsize. The default value is 1024. For im servers that need to support tens of thousands of connections, there are obviously too few. At this time, you can choose to modify this macro and then re-compile the kernel. However, the materials also point out that this will bring about a reduction in network efficiency, second, you can select a multi-process solution (the traditional Apache solution). However, although the cost of creating a process on Linux is relatively small, it cannot be ignored, in addition, data synchronization between processes is far less efficient than inter-thread synchronization, so it is not a perfect solution. However, epoll does not have this limit. The FD limit supported by epoll is the maximum number of files that can be opened. This number is generally greater than 2048. For example, the size of a machine with 1 GB of memory is about 0.1 million. You can check the number of machines with CAT/proc/sys/fs/file-max. Generally, this number has a great relationship with the system memory.
Io efficiency does not linearly decrease with the increase in the number of FD
Another critical weakness of traditional select/poll is that when you have a large set of sockets, but due to network latency, only some of the sockets at any time are "active, however, each select/poll call will linearly scan all sets, resulting in a linear decline in efficiency. However, epoll does not have this problem. It only operates on "active" sockets-this is because epoll is implemented based on the callback function on each FD in kernel implementation. Then, only the "active" socket will actively call the callback function, and other idle status socket will not. At this point, epoll implements a "pseudo" AIO, this is because the driver is in the OS kernel. In some benchmarks, if all the sockets are basically active-for example, in a high-speed LAN environment, epoll is not more efficient than select/poll. On the contrary, if epoll_ctl is used too much, the efficiency is also slightly lower. However, once idle connections is used to simulate the WAN environment, epoll is far more efficient than select/poll.
Use MMAP to accelerate message transmission between the kernel and the user space.
This actually involves the specific implementation of epoll. Both select, poll, and epoll require the kernel to notify users of FD messages. It is important to avoid unnecessary memory copies, epoll is implemented through the same memory of the user space MMAP kernel. If you want me to focus on epoll from the 2.5 kernel, you will not forget the manual MMAP step.
Kernel fine-tuning
This is not an advantage of epoll, but an advantage of the entire Linux platform. Maybe you can doubt the Linux platform, but you cannot avoid the Linux platform giving you the ability to fine-tune the kernel. For example, if the Kernel TCP/IP protocol stack uses a memory pool to manage the sk_buff structure, you can dynamically adjust the memory pool (skb_head_pool) during runtime) by echo XXXX>/proc/sys/NET/CORE/hot_list_length. For example, the listen function's 2nd parameters (TCP completes the length of the packet queue after three handshakes) can also be dynamically adjusted based on the memory size of your platform. Even in a special system with a large number of data packets but the size of each data packet itself is very small, try the latest napi NIC driver architecture.

Reason why the select mode is inefficient
The low efficiency of the select mode is determined by the Select definition and has nothing to do with the operating system implementation. Any kernel must perform a round robin when implementing the SELECT statement to know the situation of these sockets, this will consume CPU. In addition, when you have a large socket set, although only a small part of the socket is "active" at any time, but every time you have to fill in all the sockets in a fd_set, this will also consume some CPU, and when the select returns, you may need to perform "context ing" when processing the business, which also has some performance impact. Therefore, select is relatively inefficient than epoll.
Epoll is applicable to a large number of sockets, but it is not very active.
There is also kqueue. In fact, many servers are developed based on BSD.
Kqueue is similar to epoll. It is said that the efficiency is slightly higher, but it has never been compared.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.