Difference between select, poll and epoll

Source: Internet
Author: User

The select () System Call provides a mechanism to synchronize multiple I/O:


# Include <sys/time. h>
# Include <sys/types. h>
# Include <unistd. h>

Int select (int n,
Fd_set * readfds,
Fd_set * writefds,
Fd_set * required TFDs,
Struct timeval * timeout );

FD_CLR (int fd, fd_set * set );
FD_ISSET (int fd, fd_set * set );
FD_SET (int fd, fd_set * set );
FD_ZERO (fd_set * set );

The call to select () will be blocked until the specified file descriptor is ready to execute I/O, or the time specified by the optional parameter timeout has passed.
The monitored file descriptors are divided into three types: set, each corresponding to waiting for different events. The file descriptors listed in readfds are monitored for data available for reading (if the read operation is complete, it will not block ). The file descriptors listed in writefds are monitored to determine whether the write operation is complete without blocking. Finally, the file descriptors listed in ipvtfds are monitored for exceptions or uncontrolled data availability (these statuses are only applied to sockets ). The three types of set can be NULL. In this case, select () does not monitor this type of event.
When select () is returned successfully, each set is modified so that it only contains the file descriptor for preparing I/O. For example, assume that there are two file descriptors with values 7 and 9, which are placed in readfds. When select () returns, if 7 is still in set, the file descriptor is ready to be read without blocking. If '9' is no longer in 'set', reading it may be blocked (I said it may be because the data may be available exactly after 'select' is returned. In this case, the next call to 'select () prepare the returned file descriptor for reading ).
The first parameter n is equal to the value of the maximum file descriptor in all sets plus 1. Therefore, the select () caller checks which file descriptor has the maximum value, and adds this value to 1 and passes it to the first parameter.
The timeout parameter is a pointer to the timeval struct. The timeval is defined as follows:
# Include <sys/time. h>
Struct timeval {
Long TV _sec;/* seconds */
Long TV _usec;/* 10E-6 second */
};

If this parameter is not NULL, select () will return after TV _sec seconds and TV _usec microseconds even if no file descriptor is ready for I/O. When select () is returned, the status of the timeout parameter is undefined in different systems. Therefore, the timeout and file descriptor set must be reinitialized before each select () call. In fact, the current version of Linux automatically modifies the timeout parameter and sets its value to the remaining time. Therefore, if the timeout value is set to 5 seconds and then 3 seconds before the file descriptor is ready, TV _sec will change to 2 when the select () call returns.
If both values in timeout are set to 0, the call to select () will return immediately. All pending events will be reported, but no subsequent events will be waited.
File descriptor set is not directly operated. It is generally managed using several helper macros. This allows Unix systems to implement the file descriptor set in their preferred way. However, most systems simply implement a set array. FD_ZERO removes all file descriptors from the specified set. You should call select () before each call.
Fd_set writefds;
FD_ZERO (& writefds );

FD_SET adds a file descriptor to the specified set. FD_CLR removes a file descriptor from the specified set:
FD_SET (fd, & writefds);/* add 'fd 'to the set */
FD_CLR (fd, & writefds);/* oops, remove 'fd 'from the set */

Well-designed code should never use FD_CLR, And it is rarely used in actual situations.
FD_ISSET tests whether a file descriptor specifies a part of the set. If the file descriptor is set, a non-zero integer is returned. If not, 0 is returned. FD_ISSET is used after select () is called to return data. It is used to test whether the specified file descriptor has the relevant action ready:
If (FD_ISSET (fd, & readfds ))
/* 'Fd 'is readable without blocking! */

Because the file descriptor set is created statically, they impose a limit on the maximum number of file descriptors. The value of the maximum file descriptor that can be put into the set is specified by FD_SETSIZE. In Linux, the value is 1024. Later in this chapter, we will also see derivatives of this restriction.
Return Value and error code
When select () succeeds, the number of file descriptors that prepare I/O is returned, including all three sets. If timeout is provided, the returned value may be 0. If an error occurs,-1 is returned, and errno is set to one of the following values:
EBADF
An invalid file descriptor is provided to a set.
EINTR
A signal is captured while waiting, and a call can be initiated again.
EINVAL
The parameter n is a negative number, or the specified timeout is invalid.
ENOMEM
Insufficient memory to complete the request.
Bytes --------------------------------------------------------------------------------------------------------------

Poll () is a multi-I/O SOLUTION OF System V. It solves several limitations of select (), although select () is still frequently used (mostly out of habit, or in the name of portability ):

# Include <sys/poll. h>
Int poll (struct pollfd * fds, unsigned int nfds, int timeout );

Unlike select (), poll () does not use an inefficient set of three bit-based file descriptors. Instead, it uses a separate structure pollfd array that points the fds pointer to this group. The pollfd struct is defined as follows:

# Include <sys/poll. h>

Struct pollfd {
Int fd;/* file descriptor */
Short events;/* requested events to watch */
Short revents;/* returned events witnessed */
};

Each pollfd struct specifies a monitored file descriptor. It can transmit multiple structs to instruct poll () to monitor multiple file descriptors. The events field of each struct is the event mask that monitors the file descriptor, which is set by the user. The revents field is the event mask of the file descriptor operation result. The kernel sets this field when calling the response. Any event requested in the events domain may be returned in the revents domain. Valid events are as follows:
POLLIN
Data is readable.
POLLRDNORM
Common Data is readable.
POLLRDBAND
Readable data is preferred.
POLLPRI
There is urgent data readable.
POLLOUT
Writing data does not cause blocking.
POLLWRNORM
Writing common data does not cause blocking.
POLLWRBAND
Writing priority data does not cause blocking.
POLLMSG
The SIGPOLL message is available.

In addition, the revents domain may return the following events:
POLLER
The specified file descriptor is incorrect.
POLLHUP
The specified file descriptor suspension event.
POLLNVAL
The specified file descriptor is invalid.

These events are meaningless in the events domain because they are always returned from revents when appropriate. Poll () is different from select (). You do not need to explicitly request exception reports.
POLLIN | POLLPRI is equivalent to the read event of select (), and POLLOUT | POLLWRBAND is equivalent to the write event of select. POLLIN is equivalent to POLLRDNORM | POLLRDBAND, while POLLOUT is equivalent to POLLWRNORM.
For example, to monitor whether a file descriptor is readable and writable, we can set events to POLLIN | POLLOUT. When poll returns, we can check the flag in revents, which corresponds to the events structure of the file descriptor request. If the POLLIN event is set, the file descriptor can be read without blocking. If POLLOUT is set, the file descriptor can be written without blocking. These flags are not mutually exclusive: they may be set at the same time, indicating that the read and write operations of the file descriptor will return normally without blocking.
The timeout parameter specifies the number of milliseconds to wait. poll returns no matter whether I/O is ready or not. If the value of timeout is negative, the infinite timeout is indicated. If the value of timeout is 0, the poll call returns immediately and lists the file descriptors for preparing I/O, but does not wait for other events. In this case, poll () is returned as soon as it is elected.
Return Value and error code
When the request succeeds, poll () returns the number of file descriptors whose revents field is not 0. If no event occurs before the timeout, poll () returns 0. If the request fails, poll () return-1 and set errno to one of the following values:
EBADF
The specified file descriptor in one or more struct is invalid.
EFAULT
The fds Pointer Points to an address that exceeds the address space of the process.
EINTR
A signal is generated before the request event, and the call can be initiated again.
EINVAL
The nfds parameter exceeds the PLIMIT_NOFILE value.
ENOMEM
The request cannot be completed because the available memory is insufficient.
Bytes --------------------------------------------------------------------------------------------------------------
The above content is from OReilly. Linux. System. Programming-Talking. Directly. to. the. Kernel. and. C. Library.2007.
Bytes --------------------------------------------------------------------------------------------------------------

Advantages of epoll:
1. support a process to open a large number of socket Descriptors (FD)
The most intolerable thing about the select statement is that the FD opened by a process has certain limitations, which are set by FD_SETSIZE. The default value is 2048. For IM servers that need to support tens of thousands of connections, there are obviously too few. At this time, you can choose to modify this macro and then re-compile the kernel. However, the materials also point out that this will bring about a reduction in network efficiency, second, you can select a multi-process solution (the traditional Apache solution). However, although the cost of creating a process on linux is relatively small, it cannot be ignored, in addition, data synchronization between processes is far less efficient than inter-thread synchronization, so it is not a perfect solution. However, epoll does not have this limit. The FD limit supported by epoll is the maximum number of files that can be opened. This number is generally greater than 2048. For example, the size of a machine with 1 GB of memory is about 0.1 million. You can check the number of machines with cat/proc/sys/fs/file-max. Generally, this number has a great relationship with the system memory.

2. IO efficiency does not decrease linearly as the number of FD increases
Another critical weakness of traditional select/poll is that when you have a large set of sockets, but due to network latency, only some of the sockets at any time are "active, however, each select/poll call will linearly scan all sets, resulting in a linear decline in efficiency. However, epoll does not have this problem. It only operates on "active" sockets-this is because epoll is implemented based on the callback function on each fd in kernel implementation. Then, only the "active" socket will take the initiative to call the callback function, other idle status socket will not, in this regard, epoll implements a "pseudo" AIO, this is because the driver is in the OS kernel. In some benchmarks, if all the sockets are basically active-for example, in a high-speed LAN environment, epoll is not more efficient than select/poll. On the contrary, if epoll_ctl is used too much, the efficiency is also slightly lower. However, once idle connections is used to simulate the WAN environment, epoll is far more efficient than select/poll.

3. Use mmap to accelerate message transmission between the kernel and user space.
This actually involves the specific implementation of epoll. Both select, poll, and epoll require the kernel to notify users of FD messages. It is important to avoid unnecessary memory copies, epoll is implemented through the same memory of the user space mmap kernel. If you want me to focus on epoll from the 2.5 kernel, you will not forget the manual mmap step.

4. kernel fine-tuning
This is not an advantage of epoll, but an advantage of the entire linux platform. Maybe you can doubt the linux platform, but you cannot avoid the linux platform giving you the ability to fine-tune the kernel. For example, if the Kernel TCP/IP protocol stack uses a memory pool to manage the sk_buff structure, you can dynamically adjust the memory pool (skb_head_pool) during runtime) by echo XXXX>/proc/sys/net/core/hot_list_length. For example, the listen function's 2nd parameters (TCP completes the length of the packet queue after three handshakes) can also be dynamically adjusted based on the memory size of your platform. Even in a special system with a large number of data packets but the size of each data packet itself is small, try the latest napi nic driver architecture.

Author: "Linux School"
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.