Introduction to select, poll, Epoll
Epoll and select provide a multi-channel I/O multiplexing solution. In the current Linux kernel can be supported, where Epoll is unique to Linux, and select should be POSIX rules, the general operating system has implemented
Select
Select essentially processes the next step by setting or checking the data structure that holds the FD flag bit. The disadvantages of this are:
1, a single process can monitor the number of FD is limited, that is, the size of the listening port is limited.
In general, this number and system memory relationship is very large, the specific number can be cat/proc/sys/fs/file-max. A 32-bit machine defaults to 1024. The 64-bit machine defaults to 2048.
2, the socket scan is a linear scan, that is, the use of polling method, low efficiency:
When the socket is more, each time the select () through the traversal of the fd_setsize socket to complete the dispatch, regardless of which socket is active, are traversed again. This can waste a lot of CPU time. If you can register a callback function with the socket, and when they are active, the related actions are automatically done, then polling is avoided, which is exactly what Epoll and Kqueue do.
3, the need to maintain a large number of FD data structure, which will make the user space and kernel space in the transfer of the structure when the replication cost is large
Poll
Poll is essentially not the same as SELECT, it copies the user's incoming array to the kernel space, and then queries each FD corresponding device state, if the device is ready to add an entry in the device waiting queue and continue to traverse, if not found ready device after traversing all FD, the current process is suspended, Until the device is ready or the active timeout is awakened, it again iterates over the FD. This process has gone through many meaningless loops.
It does not have a limit of the maximum number of connections because it is stored based on a linked list, but there is also a disadvantage:
1, a large number of FD arrays are copied in the whole between the user State and the kernel address space, regardless of whether such replication is meaningful.
2, poll also has a feature is "horizontal trigger", if the FD is reported, is not processed, then the next poll will report the FD again.
Epoll:
Epoll has the epolllt and epollet two trigger modes, LT is the default mode, ET is the "high speed" mode. Lt mode, as long as this FD also has data readable, each time epoll_wait will return its events, remind the user program to operate, and in the ET (Edge trigger) mode, it will only prompt once, until the next time there is data flow before the prompt again, regardless of whether there is data readable in FD. Therefore, in the ET mode, read an FD must read its buffer, that is, read the return value is less than the request value, or encountered a eagain error. Another feature is that Epoll uses the "Event" readiness notification method to register FD through EPOLL_CTL, and once the FD is ready, the kernel uses a callback mechanism similar to callback to activate the fd,epoll_wait to be notified.
Why should Epoll have Epollet trigger mode?
If you use Epolllt mode, once you have a large number of ready file descriptors that you do not need to read and write, they will be returned each time they call epoll_wait, which greatly reduces the efficiency of the handler retrieving the ready file descriptor of its own interest. In the case of Epollet-edge triggering mode, epoll_wait () notifies the handler to read and write when a read-write event occurs on the monitored file descriptor. If you do not read and write all the data this time (such as the read-write buffer is too small), then the next time you call Epoll_wait (), it will not notify you, that is, it will only notify you once, until the file descriptor on the second read-write event will notify you!!! This mode is more efficient than horizontal triggering, and the system will not be flooded with ready file descriptors that you don't care about.
Advantages of Epoll:
1, there is no limit to the maximum concurrent connection, can open the upper limit of the FD is much larger than the 1024x768 (1G of memory can listen to about 100,000 ports);
2, efficiency improvement, not polling way, not with the increase in the number of FD efficiency decline. Only active FD will invoke the callback function;
The biggest advantage of Epoll is that it's just your "active" connection, which has nothing to do with the total number of connections, so in a real network environment, Epoll is much more efficient than select and poll.
3. Memory copy, using mmap () file to map memory to accelerate message delivery to kernel space, that is, Epoll uses mmap to reduce replication overhead.
Select, poll, Epoll difference summary:
1. Support the maximum number of connections that a process can open
Select
The maximum number of connections that a single process can open is defined by the Fd_setsize macro, the size of which is 32 integers (on a 32-bit machine, the size is32, the same 64-bit machine is fd_setsize64), and of course we can modify it, The kernel is then recompiled, but performance may be affected, which requires further testing.
Poll
Poll is essentially no different from Select, but it does not have the maximum number of connections because it is stored based on a linked list
Epoll
Although the number of connections is capped, but large, 1G of memory on the machine can open about 100,000 of the connection, 2G memory of the machine can open about 200,000 of the connection
2. The IO efficiency problem caused by FD surge
Select
Because the connection is linearly traversed each time it is invoked, the increase in FD results in a "linear descent performance problem" with slow traversal.
Poll
Ditto
Epoll
Because the implementation in the Epoll kernel is implemented according to the callback function on each FD, only the active socket will actively invoke callback, so in the case of less active sockets, using Epoll does not have a performance problem with the linear descent of the preceding two. However, when all sockets are active, there may be performance issues.
3. Message Delivery method
Select
The kernel needs to pass the message to the user space, requiring the kernel copy action
Poll
Ditto
Epoll
Epoll is implemented by sharing a piece of memory with the kernel and user space.
Summarize:
In summary, the choice of Select,poll,epoll should be based on the specific use of the occasion and the three ways of their own characteristics.
1, the surface of the Epoll performance is the best, but the number of connections and connections are very active, select and poll performance may be better than epoll, after all, Epoll notification mechanism requires a lot of function callbacks.
2. Select is inefficient because it requires polling every time. But inefficient is also relative, depending on the situation, but also through a good design to improve
Comparison of select/poll/epoll mechanism under Linux