Detailed description of the differences between select, poll, and epoll (1)

Source: Internet
Author: User
1 select function
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,            struct timeval *timeout);
1.1 parameter description the first parameter NFDs is the maximum descriptor value added to the fdset set by 1. fdset is a single-digit group with a size limit of _ fd_setsize (1024 ), each bit of the bitwise array represents whether its corresponding descriptor needs to be checked.
The second, third, and fourth parameters indicate the array of file descriptor bits that require attention to read, write, and error events. These parameters are both input parameters and output parameters, kernel modifications may be used to indicate which descriptors are followed by events. Therefore, fdset needs to be re-initialized before each select call.
The fifth parameter is the time-out time. This structure is modified by the kernel and its value is the time-out remaining.
Select corresponds to the sys_select call in the kernel. sys_select first copies the fd_set to which the second, third, and fourth parameters point to the kernel, and then poll each set descriptor call, it is recorded in the temporary result (fdset). If an event occurs, select writes the temporary result to the user space and returns it. When no event occurs after round robin, if the time-out period is specified, select will sleep to the time-out period. After sleep ends, round-robin will be performed again, and the temporary results will be written to the user space, and then return.
After the SELECT statement is returned, check whether the attention descriptor is set (whether the event occurs ).
2 poll Functions
Struct pollfd {int FD; // file descriptor short events; // The event mask short revents that requires detection; // The returned event mask} typedef unsigned long nfds_t; int poll (struct pollfd * FDS, nfds_t NFDs, int timeout );
 
2.1 The parameter description shows that poll is different from select. A pollfd array is used to pass events that require attention to the kernel, so there is no limit on the number of descriptors. The events field and revents field in pollfd are used to indicate the event to be followed and the event to occur, so the pollfd array only needs to be initialized once. 

The first parameter is used to store the socket descriptor that needs to check its status. The system does not clear this array every time this function is called, which is convenient to operate. Especially when there are many socket connections, to some extent, the processing efficiency can be improved. This is different from the select () function. After the select () function is called, the Select () function clears the socket descriptor set it detects, therefore, the socket descriptor must be added to the collection to be detected before each select () call. Because of this, the Select () function is suitable for detecting only one socket descriptor, the poll () function is suitable for a large number of socket descriptors.
The implementation mechanism of poll is similar to that of select. It corresponds to sys_poll in the kernel, But poll transmits the pollfd array to the kernel and then poll each descriptor in pollfd. poll is more efficient than fdset.
After poll is returned, you need to check the revents value for each element in pollfd to check whether the event has occurred.
The second parameter is used to mark the total number of struct elements in the array FDS.
The third parameter is the time when the poll function call is blocked, in milliseconds.
2.2 return value> 0: Total number of socket descriptors in the array FDS with read, write, or error statuses ready.
= 0: no socket descriptor in the array FDS is ready for read, write, or error. In this case, Poll times out and the timeout time is timeout millisecond. In other words, if no event occurs on the detected socket descriptor, the poll () function blocks the specified length of milliseconds for timeout and returns the result. If timeout = 0, then poll () the function returns immediately without blocking. If timeout = inftim, the poll () function will continue to block, it is not returned until the event of interest on the detected socket descriptor is generated. If the event of interest never occurs, then poll () will be blocked forever.
-1: the poll function fails to be called, and the global variable errno is automatically set.

If the socket descriptor to be detected is a negative value, the detection of this descriptor will be ignored, that is, the member variable events will not be detected, events registered on events are also ignored. When the poll () function returns, the member variable revents is set to 0, indicating that no event has occurred.
In addition, the poll () function will not be affected and restricted by the o_ndelay mark and o_nonblock mark on the socket descriptor, that is, whether the socket is blocked or not blocked, Poll () the Select () function is different. The select () function is affected by the o_ndelay mark and o_nonblock mark. If the socket is a blocked socket, select () is the same as that when select () is not called. The socket is still a blocking TCP communication. On the contrary, if the socket is a non-blocking socket, select () is called () non-blocking TCP communication can be implemented.
Therefore, the function and return value of the poll () function have the same meaning as the function and return value of the select () function. The difference between the two is that the internal implementation method is different. Select () functions can be run on all system platforms that support file descriptor operations (such as Linux, UNIX, windows, and MACOs), with good portability, while poll () functions are supported only by some operating systems (such as SunOS, Solaris, Aix, and HP, but not Linux), with poor portability.
2.3 events are frequently detected: Pollin/pollrdnorm (readable), pollout/pollwrnorm (writable), and pollerr (error ). If you are interested in multiple events on a descriptor, you can perform bitwise OR operations between these constant tags. For example, if you are interested in reading, writing, and exception events on the socket descriptor FD, you can do this:
Struct pollfd FDS;
FDS [nindex]. Events = Pollin | pollout | pollerr;
When the poll () function returns, you need to determine the events on the detected socket descriptor. You can do this:
Struct pollfd FDS;
Check readable TCP connection requests:
If (FDS [nindex]. revents & Pollin) = Pollin) {// receives data/calls accept () to receive connection requests}
Detection writable:
If (FDS [nindex]. revents & pollout) = pollout) {// send data}
Detection exception:
If (FDS [nindex]. revents & pollerr) = pollerr) {// Exception Handling}
3 epoll function 3.1 parameter description epoll mainly contains epoll_create, epoll_ctl and epoll_wait functions.

Epoll_createFunction to create the epoll file descriptor. The parameter size does not limit the maximum number of descriptors that epoll can listen to. It is just a suggestion for the kernel to initially allocate internal data structures. After the epoll handle is created, an FD value is occupied. in Linux, you can view it by "cat/proc/process ID/FD. Therefore, close () must be called after epoll is used up. Otherwise, FD may be exhausted.
Example: int epfd = epoll_create (INT size );
Epoll_ctlEvent registration function, controls the execution of OP operations on the specified descriptor FD.
The first parameter is the returned value of epoll_create;
The second parameter indicates the action, which is expressed using the following three macros:
Epoll_ctl_add // register a new FD to epfd;
Epoll_ctl_mod // modify the registered FD listening event;
Epoll_ctl_del // delete an FD from epfd;
The third parameter is the FD to be monitored.
The fourth parameter is to tell the kernel what event to listen. Events can be a collection of the following macros:
Epollin // indicates that the corresponding file descriptor can be read (including the normal shutdown of the Peer socket );
Epollout // indicates that the corresponding file descriptor can be written;
Epollpri // indicates that the corresponding file descriptor has an urgent readable data (Here it should indicate that out-of-band data has arrived );
Epollerr // indicates that the corresponding file descriptor is incorrect;
Epollhup // indicates that the corresponding file descriptor is hung up;
Epollet // set epoll to edge triggered mode, which is relative to level triggered.
Epolloneshot // only listens for an event once. After listening for this event, if you need to continue listening for this socket,
// Add the socket to the epoll queue again.
When the other party closes the connection (FIN) and epollerr, it can be considered as an epollin event. When reading, there are two return values: 0 and-1 respectively.
Example:
Struct epoll_event EV;
Ev. Data. FD = listenfd; // you can specify the file descriptor associated with the event to be processed.
Ev. Events = epollin | epollet; // you can specify the type of the event to be processed.
Epoll_ctl (epfd, epoll_ctl_add, listenfd, & eV); // register the epoll event
Epoll_waitWait for the IO event on epfd, and a maximum of maxevents events are returned.
The first parameter is the returned value of epoll_create (), that is, the generated file descriptor dedicated to epoll;
The second parameter is the array used to return the event processing;
The third parameter is the number of events that can be processed each time. It is used to tell the kernel how large the events are. The maxevents value cannot be greater than the size when epoll_create () is created;
The fourth parameter is the timeout time for an I/O event. The Unit is millisecond. 0 will return immediately, and-1 will be uncertain. It is also said to be permanent blocking.
The operating principle of epoll_wait is: The occurrence of the socket FD event registered on epfd by the waiter. If so, the socket FD and event type will be put into the events array. The socket FD event type registered on epfd is cleared. If you want to pay attention to this socket FD in the next loop, you need to use epoll_ctl (epfd, epoll_ctl_mod, listenfd, & eV) to reset the socket FD event type. In this case, epoll_ctl_add is not used, because socket FD is not cleared, but the event type is cleared. This step is very important.
3.2 Return Value Epoll_createReturns an epoll-specific file descriptor. In fact, it applies for a space in the kernel to store whether or not the socket FD you want to pay attention to occurs and what happened.
Epoll_ctl0 is returned when execution is successful;-1 is returned if execution fails.
Epoll_waitReturns the number of events to be processed. If 0 is returned, the event has timed out. 3.3 The difference between select/poll and select/poll is that in select/poll, the kernel scans all the monitored file descriptors only after a certain method is called, and epoll scans all the file descriptors through epoll_ctl () in advance () to register a file descriptor. Once a file descriptor is ready, the kernel uses a callback mechanism similar to callback to quickly activate this file descriptor. When the process calls epoll_wait () you will be notified.
Epoll is different from select and poll. First, it does not need to copy the event description information to the kernel every time it is called. After the first call, the event information will be associated with the corresponding epoll descriptor. Secondly, epoll registers the callback function not through polling, but through the waiting descriptor. when an event occurs, the callback function stores the event in the ready event chain table and writes it to the user space.
After epoll is returned, this parameter points to an event in the buffer zone, which can be processed for each element in the buffer zone without polling checks like poll and select.
3.4 advantages of epoll 1. the number of monitored descriptors is unrestricted. The maximum FD value supported by the monitored descriptor is the maximum number of files that can be opened. This number is generally greater than 2048. For example, on a machine with 1 GB of memory, it is about 0.1 million left-right. You can check the specific number by CAT/proc/sys/fs/file-max. Generally, this number has a great relationship with the system memory. The biggest disadvantage of the SELECT statement is that there is a limit on the number of FD opened by the process. This is not suitable for servers with a large number of connections. Although you can also choose a multi-process solution (Apache is implemented in this way), although the cost of creating a process on Linux is relatively small, it can still be ignored, in addition, data synchronization between processes is far less efficient than inter-thread synchronization, so it is not a perfect solution.
2. Io efficiency will not decrease as the number of monitoring FD increases. Epoll is different from the select and poll polling methods, but is implemented through the callback function defined by each FD. Only the ready FD executes the callback function.
3. supports Level Trigger and edge trigger (only tells the process which file descriptors have just changed to ready state, it only says it once, if we do not take action, then it will not tell again, this method is called edge triggering. In theory, edge triggering has a higher performance, but the code implementation is quite complicated.
4. MMAP accelerates information transmission between the kernel and the user space. Epoll uses the same memory as the user space MMAP through the kernel to avoid unnecessary memory copies.
3.5 epoll model epoll events have two types of models: Level triggered (LT) and edge triggered (ET ):
LT (Level triggered, horizontal trigger mode) is the default mode of work, and supports both block and non-block socket. In this way, the kernel tells you whether a file descriptor is ready, and then you can perform Io operations on this ready FD. If you do not perform any operation, the kernel will continue to inform you, so the possibility of programming errors in this mode is lower. The traditional select/poll model is representative of this model.
Et (edge-triggered, edge trigger mode) is a high-speed operation mode that only supports no-block socket. In this mode, when the descriptor is never ready, the kernel tells you through epoll. Then it will assume that you know that the file descriptor is ready and will not send more ready notifications for that file descriptor. The event will not be ready again until new data comes in next time.


Detailed description of the differences between select, poll, and epoll (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.