1, select, poll a few shortcomings
Recall the interface of select and poll
int Select (int Nfds, fd_set *readfds, fd_ Set *writefds, fd_set *exceptfds, struct Timeval *timeout);
int Poll (struct POLLFD *fds, nfds_t nfds, int timeout);
The two multiplexing implementations are characterized by:
- Each call to select and poll sets the set of events that the user cares about (select is the Readfds,writefds,exceptfds collection, poll is the FDS struct array) from user space to kernel space.
- If only a small number of events are active within a time period (only a small subset of the events that the user cares about will occur), the CPU is wasted on polling for invalid events, making it inefficient, for example, that the user cares about 1024 read events for a TCP socket, when Yes, Only 1 TCP links are active each time a select or poll is invoked, so polling for the other 1023 events is not necessary.
Select supports a small number of file descriptors, generally only 1024,poll although there is no such limitation, but based on the above two reasons, poll and select have the same disadvantage, that is, the array containing a large number of file descriptors is copied to the user state and the kernel address space, And regardless of whether these file descriptors are ready, the state of all descriptors is polled each time, causing their overhead to increase linearly as the number of file descriptors increases. Epoll has improved on these shortcomings, no longer like Select and poll, each call to select and poll to copy the descriptor collection to the kernel space, but once registered for permanent use, on the other hand, Epoll does not poll time for each descriptor whether it occurs, Instead, resource preemption is done only for file descriptors where the event has occurred (because the same descriptor resource (such as readable or writable) may block multiple processes, and the process calling Epoll needs to preempt the corresponding resource with those processes. Here is a record of your own learning and understanding of Epoll.
2, Epoll of several interfaces
It says that each invocation of select and poll copies the descriptor collection to kernel space because the Select and poll registration events and listener events are bound together, so to speak, we see that the programming patterns of select and poll are clear:
while (true) { select(maxfd+1, readfds,writefds,execpfds,timeout)/poll (POLLFD, nfds,timeout);}
in the In the I/O multiplexing select , the implementation of select is mentioned, and a copy of the user space to the kernel space is made once the Select is invoked. Epoll's improvement is to separate the registration event from the Listener event, Epoll uses a special file to manage the collection of events that the user cares about, which exists in the kernel and consists of a special data structure and a set of operations, so Users can inform the kernel of their own concerns in advance, and then listen to them, so that only a single copy of the user space to the kernel space is needed. The files that manage the collection of events are created through Epoll_create, the registered user behavior is implemented through EPOLL_CTL, and the listener is implemented through epoll_wait. So the programming model probably looks like this:
epoll_fd=epoll_create (size); Epoll_ctl (epoll_fd,operation,fd,event); while (true) {epoll_wait (epoll_fd,events,max_events,timeout);}
2.1. Epoll_create Interface
#include <sys/epoll.h>
int epoll_create (int size);
Epoll_create creates the Epoll file, which returns a handle to the Epoll, which is used to tell the kernel the maximum number of listener file descriptors, which differs from the first parameter in select (which gives the maximum listening fd+1 value). It is important to note that when the Epoll handle is created, it consumes an FD value, and under Linux, if the/proc/process is viewed id/fd/, the FD can be seen, so close () must be called after the epoll is used, otherwise it may cause FD to be exhausted. (Excerpt from the Essence of Epoll)
epoll_create will complete the data structure required for the kernel initialization epoll, one of the key structures is rdlist, which represents the ready file description Fu Chinqu, and the epoll_wait function is to directly examine the linked list to preempt the prepared event Another key structure is a red-black tree, which is dedicated to managing the set of file descriptors that users care about.
Note: For the core data structure of Epoll file and the source code of Epoll_create, please refer to these two documents
Analysis of Linux kernel Poll/select/epoll implementation
Epoll Source code Implementation analysis [finishing]
2.2. Epoll_ctl Interface
#include <sys/epoll.h>
int Epoll_ctl (int epfd, int op, int fd, struct epoll_event *event);
Epoll_ctl is used by the user to tell the kernel what descriptor (FD) They are interested in and what event (event)
- EPFD, a epoll handle created with the Epoll_create function, in the structure corresponding to the EPFD file descriptor, has a red-black tree dedicated to managing the collection of events that the user cares about.
- OP, which specifies user behavior, has three values for the OP parameter: FD, the file descriptor that the user cares about
- Epoll_ctl_add, registering the new FD into the EPFD;
- Epoll_ctl_mod, modify the event of the registered FD;
- Epoll_ctl_del, remove a FD from the EPFD;
- event, user-cared events (read, write)
The structure of the parameter event is as follows:
struct Epoll_event { __uint32_t events; /* Epoll Events */ epoll_data_t data; /* User data variable, the kernel modifies the property */};
Events can be a collection of several macros:
- Epollin, which indicates that the corresponding file descriptor can be read (including a graceful shutdown of the peer socket);
- Epollout, indicating that the corresponding file descriptor can be written;
- Epollpri, which indicates that the corresponding file descriptor has an urgent data readable (this should indicate the arrival of out-of-band data);
- Epollerr, indicating an error occurred in the corresponding file descriptor;
- Epollhup, indicating that the corresponding file descriptor is suspended;
- Epollet, Epoll is set to edge triggered mode, which is relative to the horizontal trigger (level triggered).
- Epolloneshot, listen to only one event, when you listen to this event, if you still need to continue to listen to the socket, you need to add the socket to the Epoll queue again
2.2.1, Epoll_ctl_add
Focus on this value, when Op=epoll_ctl_add, epoll_ctl mainly do four things:
- Add the current file descriptor and its corresponding event (fd,epoll_event) to a red-black tree for kernel management
- Registers the device driver poll callback function Ep_ptable_queue_proc, when called F_op->poll (), will eventually call the callback function Ep_ptable_queue_proc()
- In the Ep_ptable_queue_proc callback function, register the callback function Ep_poll_callback,ep_poll_callback Indicates how the process is communicated when the corresponding event occurs on the descriptor FD.
- In the Ep_ptable_queue_proc callback function, the detection is the file descriptor fd corresponding to the device Epoll_event event occurs, and if so, the FD and its Epoll_ Event joins the above mentioned ready queue Rdlist
Note: For Epoll_ctl, Ep_ptable_queue_proc, ep_poll_callback principle and source code, please refer to these two documents
Analysis of Linux kernel Poll/select/epoll implementation
Epoll Source code Implementation analysis [finishing]
2.3. Epoll_wait interface
#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
- EPFD, a epoll handle created with the Epoll_create function, in the structure corresponding to the EPFD file descriptor, has a red-black tree dedicated to managing the collection of events that the user cares about.
- Events, outgoing parameter, indicating the event that occurred
- Maxevents, passing in a parameter that represents the maximum capacity of the events array, whose value cannot exceed the parameter of the Epoll_create function size
- timeout,0, non-blocking, integer, blocking timeout time , negative, infinite blocking
The principle of the epoll_wait function is to examine each node in the Rdlist linked list mentioned above, each node of the rdlist can be indexed to the listener's file descriptor, the file descriptor corresponding to the device's poll driver function F_op->poll is called, To check if the device is available. Here's a question to think about, since Rdlist represents a ready event, that is, the resource for the device is available, why check it? This is because a resource for a device may be waiting for multiple processes, and when the device resource is ready, the device wakes up all the processes that are blocking the resource, and the currently calling epoll_wait process may not be able to preempt the resource, so it needs to be called again to check if the resource is available. To prevent it from being preempted by another process, the method of checking is to invoke the drive f_op->poll of the FD device.
This is why epoll efficiency may be higher, epoll each time only to check the equipment is ready, unlike Select, poll, whether or not ready to go to check.
Note: The principle of epoll_wait and source code please refer to these two documents
Analysis of Linux kernel Poll/select/epoll implementation
Epoll Source code Implementation analysis [finishing]
2, Epoll two trigger mode et<
The difference between the two is that in Level-trigger mode, whenever a socket is in the readable/writable state, the socket will be returned whenever the epoll_wait is made. In Edge-trigger mode, writable only returns a socket when it changes from unreadable to readable or from unwritable to epoll_wait. The following two images clearly reflect the difference between the two images from the Epoll in the LT and ET modes
Resources:
Analysis of Linux kernel Poll/select/epoll implementation
Epoll Source code Implementation analysis [finishing]
Epoll Read and write mode in LT and et modes
Epoll for I/O multiplexing