Create an Alibaba Cloud account, and get a free trial with 40+ products, enterprise account can enjoy a free trial worth $1200. Register Now!
Epoll mechanism detailed
What is Epoll?
Epoll is an improved poll for handling large batches of handles, and is the best multi-channel I/O readiness notification method.
Only three system calls: Epoll_create, Epoll_ctl, epoll_wait;
The event registration function of Epoll_ctl-epoll, which differs from select () tells the kernel what type of event to listen to when listening to events, and instead registers the type of event to listen on first;
How the Epoll works
Epoll also only informs those file descriptors that are ready, and when we call Epoll_wait () to get the ready file descriptor, the return is not the actual descriptor, but rather a value representing the number of ready descriptors;
You just have to go to the Epoll specified array to get the corresponding number of file descriptors, and also use memory mapping (MMAP) technology, which eliminates the cost of copying these file descriptors in the system call;
Another essential improvement is that the Epoll uses event-based readiness notification methods;
In Select/poll, the kernel scans all monitored file descriptors only after a certain method has been called;
and Epoll in advance through EPOLL_CTL () to register a file descriptor, once based on a file descriptor is ready, the kernel will use a similar callback callback mechanism, quickly activate the file descriptor, when the process calls Epoll_wait () to be notified;
Two ways to work with Epoll
Horizontal Trigger (LT)
Equivalent to the speed of a relatively fast poll;
The LT (level triggered) is the epoll default and supports both block and No-block sockets. In this practice, the kernel tells you whether a file descriptor is ready, and then you can perform IO operations on the Ready FD;
If you do not do anything, the kernel will continue to notify you, so this mode of programming error is less likely;
Traditional Select/poll are the representatives of this model.
Edge Trigger (ET)
used the Epollet logo;
Equivalent to non-blocking reading;
ET (edge-triggered) is a high-speed mode of operation, only support no-block socket, it is more efficient than lt;
The difference between ET and LT is that when a new event arrives, the ET pattern can of course get to the event from the epoll_wait call, but if the socket buffer for this event is not processed, and no new events arrive in the socket, In et mode it is not possible to get this event again from the epoll_wait call;
And the LT mode is the opposite, as long as an event corresponding to the socket buffer has data, you can always get the event from the epoll_wait;
The development of Epoll-based applications in the LT mode is simpler and less prone to error. When an ET mode event occurs, if the buffer data is not completely processed, the user request in the buffer will not be responded to;
Advantages of Epoll
Supports a process to open a large number of socket descriptors (FD);
Select the most unbearable is a process to open the FD is a certain limit, set by Fd_setsize, the default value is 2048;
Recompile the kernel to solve this problem, or use multi-process to solve the problem (Apache scheme);
Epoll does not have this restriction, it supports the FD limit is the maximum number of open files, this number is generally much larger than 2048, 1G of memory is about 10W;
IO efficiency does not decrease linearly with the increase of FD number;
Another Achilles heel of traditional select/poll is that when you have a large socket set, performance decreases linearly;
However, due to network delay, only some of the sockets are "active" at any one time, but select/poll each call will scan the entire set linearly, resulting in a linear decrease in efficiency;
Epoll does not have this problem, it only operates on the "active" socket (the reason for the kernel implementation);
This is because the Epoll is implemented in the kernel implementation according to the callback function above each FD;
only "active" socket will be active to call the callback function, the other idle state socket will not;
Epoll implements a "pseudo AIO, because this is the driving force in the OS kernel;
In some benchmark, if all sockets are basically active; The
Epoll is no more efficient than select/poll, and conversely, if the epoll_ctl is used too much, the efficiency is slightly reduced;
uses mmap to accelerate message passing between the kernel and user space;
This actually involves the implementation of the Epoll;
both Select,poll and epoll need the kernel to notify the FD message to the user space, how to avoid unnecessary memory copy is very important, at this point, epoll through the kernel in the user space mmap the same piece of memory implementation;
epoll is an IO multiplexing technology that can handle millions of socket handles very efficiently;
because Select/poll each call will pass all the sockets you want to monitor to the Select/poll system call, which means that you need to copy the user-configured socket list to the kernel state, If the handle of the million will cause each time to copy a few hundred KB of memory to the kernel state, very inefficient;
epoll_wait does not pass the socket handle to the kernel, because the kernel has already got a list of the handles to be monitored in epoll_ctl;
Epoll also maintains a doubly linked list where the user stores events that occur;
When Epoll_wait is called, simply observe that there is no data in the list link, which is the eptime item;
have data to return, no data on sleep, wait until timeout time to the back even if the list is not data also returned;
How is this list of ready lists maintained?
When we execute epoll_ctl, except to place the socket on the red-black tree corresponding to the file object in the Epoll filesystem;
also registers a callback function with the kernel interrupt handler, telling the kernel that if the handle is interrupted, put it in the Ready list link;
A red black tree, a ready-to-handle chain list, a small number of kernel cache, help us solve the problem of socket processing under large concurrency; When the
executes Epoll_create, a red-black tree and a ready-linked list are created;
when executing epoll_ctl, if the socket handle is incremented, check for presence in the red-black tree, return immediately, nonexistent, add to the trunk, and register a callback function with the kernel to temporarily insert data into the ready-made list when the interrupt event occurs;
immediately return the data in the ready-to-be-ready list when executing epoll_wait;
This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or
reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or
complaint, to email@example.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
and provide relevant evidence. A staff member will contact you within 5 working days.