SelectThe essence is to use 32-bit 32-bit integer, that is, 32*32 = 1024, And the FD value is 1-1024. When the FD value exceeds the 1024 limit, you must modify the fd_setsize. In this case, you can identify the FD in the 32 * max value range.
Select is not suitable for scenarios where a single process is multi-threaded and each thread processes multiple FD operations.
1. All threads scan from 1-32 * max, and each thread processes a FD value, which is a waste.
2.1024 upper limit. For a process that processes multiple users, the FD value is far greater than 1024.
So poll should be used at this time,
PollThe array header pointer and the length of the array are passed. As long as the length of the array is not very long, the performance is still very good, because poll applies for 4 K in the kernel at a time (one page size is used to store FD), it should be controlled within 4 K as much as possible.
EpollIt is also an optimization of poll. It does not need to traverse all FD after returning, and maintains the FD list in the kernel. Select and poll maintain the kernel list in the user State and then pass it to the kernel. However, it is only supported in the 2.6 kernel.
Epoll is more suitable for handling a large number of FD, and there are not many cases of active FD. After all, FD is a large number of serial operations.
========================================================== ==================================
I do not know much about select, poll, and epoll. The following is an introduction from building a high-performance web site, and so on.
Select
Select first appeared in. It monitors arrays of multiple file descriptors through a select () system call. When select () returns, the ready file descriptor in the array will be modified by the kernel so that the process can obtain these file descriptors for subsequent read/write operations.
Currently, select supports almost all platforms, and its good cross-platform support is also an advantage. In fact, this is also one of its few advantages.
One disadvantage of select is that there is a maximum limit on the number of file descriptors that a single process can monitor, which is generally 1024 in Linux, however, this restriction can be improved by modifying the macro definition or re-compiling the kernel.
In addition, the data structure maintained by select () stores a large number of file descriptors. As the number of file descriptors increases, the overhead of copying also increases linearly. At the same time, due to the delay in network response time, a large number of TCP connections are inactive, but calling select () will perform a linear scan on all sockets, which also wastes a certain amount of overhead.
Poll
Poll was born in System V Release 3 in 1986. It is essentially no different from select, But poll has no limit on the maximum number of file descriptors.
Poll and select have the same disadvantage: An array containing a large number of file descriptors is copied between the user State and the kernel address space, regardless of whether these file descriptors are ready, its overhead increases linearly with the increase in the number of file descriptors.
In addition, after select () and Poll () Tell the process the ready file descriptor, if the process does not perform Io operations on it, the next call of select () and Poll () these file descriptors are reported again, so they generally do not lose the ready message. This method is called level triggered ).
Epoll
It wasn't until linux2.6 that there was an implementation method directly supported by the kernel, namely epoll, which had almost all the advantages mentioned earlier, it is recognized that linux2.6 has the best performance in multi-channel I/O ready notification methods.
Epoll supports both horizontal triggering and edge triggering (edge triggered). It only tells the process which file descriptors have just changed to the ready state. If we do not take any action, so it will not tell again that this method is called edge triggering). Theoretically, edge triggering has a higher performance,CodeImplementation is quite complex.
Epoll also only informs the ready file descriptors. When we call epoll_wait () to obtain the ready file descriptor, the returned value is not the actual descriptor, but a value representing the number of ready descriptors, you only need to obtain the corresponding number of file descriptors in sequence in an array specified by epoll. The memory ing (MMAP) technology is also used here, this completely saves the overhead of copying these file descriptors during system calls.
Another essential improvement is that epoll uses event-based readiness notification. In select/poll, the kernel scans all monitored file descriptors only after a certain method is called, and epoll registers a file descriptor through epoll_ctl () in advance, once a file descriptor is ready, the kernel uses a callback mechanism similar to callback to quickly activate the file descriptor. When the process calls epoll_wait (), it is notified.
========================================================== ==================================
The Select () System Call provides a mechanism to synchronize multiple I/O:
# include sys / time . H > # include sys / types . H > # include unistd . H > Int Select (IntN, Fd_set*Readfds, Fd_set*Writefds, Fd_set*Exceptfds, Struct Timeval *Timeout); Fd_clr ( Int FD , Fd_set * Set ) ; Fd_isset ( Int FD , Fd_set * Set ) ; Fd_set ( Int FD , Fd_set * Set ) ; Fd_zero ( Fd_set * Set ) ; |
calling select () will block until the specified file descriptor is ready to execute I/O, or the time specified by the optional parameter timeout has passed.
the monitored file descriptors are divided into three types: Set, each of which corresponds to waiting for different events. The file descriptors listed in readfds are monitored for data available for reading (if the read operation is complete, it will not block ). The file descriptors listed in writefds are monitored to determine whether the write operation is complete without blocking. Finally, the file descriptors listed in ipvtfds are monitored for exceptions or uncontrolled data availability (these statuses are only applied to sockets ). The three types of set can be null. In this case, select () does not monitor this type of event.
when select () is returned successfully, each set is modified so that it only contains the file descriptor for preparing I/O. For example, assume that there are two file descriptors with values 7 and 9, which are placed in readfds. When select () returns, if 7 is still in set, the file descriptor is ready to be read without blocking. If '9' is no longer in 'set', reading it may be blocked (I said it may be because the data may be available exactly after 'select' is returned. In this case, the next call to 'select () prepare the returned file descriptor for reading ).
the first parameter n is equal to the value of the maximum file descriptor in all sets plus 1. Therefore, the Select () caller checks which file descriptor has the maximum value, and adds this value to 1 and passes it to the first parameter.
the timeout parameter is a pointer to the timeval struct. The timeval is defined as follows:
# include sys / time . H > struct timeval { long TV _sec ; /* seconds */ long TV _usec ; /* 10e-6 second */ } ; |
If this parameter is not null, select () will return after TV _sec seconds and TV _usec microseconds even if no file descriptor is ready for I/O. When select () is returned, the status of the timeout parameter is undefined in different systems. Therefore, the timeout and file descriptor set must be reinitialized before each select () call. In fact, the current version of Linux automatically modifies the timeout parameter and sets its value to the remaining time. Therefore, if the timeout value is set to 5 seconds and then 3 seconds before the file descriptor is ready, TV _sec will change to 2 when the select () call returns.
If both values in timeout are set to 0, the call to select () will return immediately. All pending events will be reported, but no subsequent events will be waited.
File descriptor set is not directly operated. It is generally managed using several helper macros. This allows UNIX systems to implement the file descriptor set in their preferred way. However, most systems simply implement a set array. Fd_zero removes all file descriptors from the specified set. You should call select () before each call.
Fd_set writefds;
Fd_zero (& writefds );
Fd_set adds a file descriptor to the specified set. fd_clr removes a file descriptor from the specified set:
Fd_set (FD, & writefds);/* Add 'fd 'to the Set */
Fd_clr (FD, & writefds);/* oops, remove 'fd 'from the set */
Well-designed code should never use fd_clr, And it is rarely used in actual situations.
Fd_isset tests whether a file descriptor specifies a part of the set. If the file descriptor is set, a non-zero integer is returned. If not, 0 is returned. Fd_isset is used after select () is called to return data. It is used to test whether the specified file descriptor has the relevant action ready:
If (fd_isset (FD, & readfds ))
/* 'Fd 'is readable without blocking! */
Because the file descriptor set is created statically, they impose a limit on the maximum number of file descriptors. The value of the maximum file descriptor that can be put into the set is specified by fd_setsize. In Linux, the value is 1024. Later in this chapter, we will also see derivatives of this restriction.
Return Value and error code
When select () succeeds, the number of file descriptors that prepare I/O is returned, including all three sets. If timeout is provided, the returned value may be 0. If an error occurs,-1 is returned, and errno is set to one of the following values:
Ebadf
An invalid file descriptor is provided to a set.
Eintr
A signal is captured while waiting, and a call can be initiated again.
Einval
The parameter n is a negative number, or the specified timeout is invalid.
Enomem
Insufficient memory to complete the request.
Bytes --------------------------------------------------------------------------------------------------------------
Poll () is a multi-I/O SOLUTION OF SYSTEM V.It solves several limitations of select (), although select () is still frequently used(Mostly out of habit, or in the name of portable ):
# include sys / poll . H > int poll ( struct pollfd * FDS , unsigned int NFDs , int timeout ) ; |
Unlike select (), Poll () does not use an inefficient set of three bit-based file descriptors. Instead, it uses a separate structure pollfd array that points the FDS pointer to this group.. The pollfd struct is defined as follows:
# include sys / poll . H > struct pollfd { int FD ; /* file descriptor */ short events ; /* Requested events to watch */ short revents ; /* returned events witnessed */ } ; |
Each pollfd struct specifies a monitored file descriptor. It can transmit multiple structs to instruct poll () to monitor multiple file descriptors. The events field of each struct is the event mask that monitors the file descriptor, which is set by the user. The revents field is the event mask of the file descriptor operation result. The kernel sets this field when calling the response. Any event requested in the events domain may be returned in the revents domain. Valid events are as follows:
Pollin
Data is readable.
Pollrdnorm
Common Data is readable.
Pollrdband
Readable data is preferred.
Pollpri
There is urgent data readable.
Pollout
Writing data does not cause blocking.
Pollwrnorm
Writing common data does not cause blocking.
Pollwrband
Writing priority data does not cause blocking.
Pollmsg
The sigpoll message is available.
In addition, the revents domain may return the following events:
Poller
The specified file descriptor is incorrect.
Pollhup
The specified file descriptor suspension event.
Pollnval
The specified file descriptor is invalid.
These events are meaningless in the events domain because they are always returned from revents when appropriate. Poll () is different from select (). You do not need to explicitly request exception reports.
Pollin | pollpri is equivalent to the read event of select (), and pollout | pollwrband is equivalent to the write event of select. Pollin is equivalent to pollrdnorm | pollrdband, while pollout is equivalent to pollwrnorm.
For example, to monitor whether a file descriptor is readable and writable, we can set events to Pollin | pollout. When poll returns, we can check the flag in revents, which corresponds to the events structure of the file descriptor request. If the Pollin event is set, the file descriptor can be read without blocking. If pollout is set, the file descriptor can be written without blocking. These flags are not mutually exclusive: they may be set at the same time, indicating that the read and write operations of the file descriptor will return normally without blocking.
The timeout parameter specifies the number of milliseconds to wait. Poll returns no matter whether I/O is ready or not. If the value of timeout is negative, the infinite timeout is indicated. If the value of timeout is 0, the poll call returns immediately and lists the file descriptors for preparing I/O, but does not wait for other events. In this case, Poll () is returned as soon as it is elected.
Return Value and error code
When the request succeeds, Poll () returns the number of file descriptors whose revents field is not 0. If no event occurs before the timeout, Poll () returns 0. If the request fails, Poll () return-1 and set errno to one of the following values:
Ebadf
The specified file descriptor in one or more struct is invalid.
Efault
The FDS Pointer Points to an address that exceeds the address space of the process.
Eintr
A signal is generated before the request event, and the call can be initiated again.
Einval
The NFDs parameter exceeds the plimit_nofile value.
Enomem
The request cannot be completed because the available memory is insufficient.
========================================================== ==================================
Advantages of epoll:
1. support a process to open a large number of socket Descriptors (FD)
The most intolerable thing about the SELECT statement is that the FD opened by a process has certain limitations, which are set by fd_setsize. The default value is 2048. For im servers that need to support tens of thousands of connections, there are obviously too few. At this time, you can choose to modify this macro and then re-compile the kernel. However, the materials also point out that this will bring about a reduction in network efficiency, second, you can select a multi-process solution (the traditional Apache solution). However, although the cost of creating a process on Linux is relatively small, it cannot be ignored, in addition, data synchronization between processes is far less efficient than inter-thread synchronization, so it is not a perfect solution. However, epoll does not have this limit. The FD limit supported by epoll is the maximum number of files that can be opened. This number is generally greater than 2048. For example, the size of a machine with 1 GB of memory is about 0.1 million. You can check the number of machines with CAT/proc/sys/fs/file-max. Generally, this number has a great relationship with the system memory.
2. Io efficiency does not decrease linearly as the number of FD increases
Another critical weakness of traditional select/poll is that when you have a large set of sockets, but due to network latency, only some of the sockets at any time are "active, however, each select/poll call will linearly scan all sets, resulting in a linear decline in efficiency. However, epoll does not have this problem. It only operates on "active" sockets-this is because epoll is implemented based on the callback function on each FD in kernel implementation. Then, only the "active" socket will take the initiative to call the callback function, other idle status socket will not, in this regard, epoll implements a "pseudo" AIO, this is because the driver is in the OS kernel. In some benchmarks, if all the sockets are basically active-for example, in a high-speed LAN environment, epoll is not more efficient than select/poll. On the contrary, if epoll_ctl is used too much, the efficiency is also slightly lower. However, once idle connections is used to simulate the WAN environment, epoll is far more efficient than select/poll.
3. Use MMAP to accelerate message transmission between the kernel and user space.
This actually involves the specific implementation of epoll. Both select, poll, and epoll require the kernel to notify users of FD messages. It is important to avoid unnecessary memory copies, epoll is implemented through the same memory of the user space MMAP kernel. If you want me to focus on epoll from the 2.5 kernel, you will not forget the manual MMAP step.
4. kernel fine-tuning
This is not an advantage of epoll, but an advantage of the entire Linux platform. Maybe you can doubt the Linux platform, but you cannot avoid the Linux platform giving you the ability to fine-tune the kernel. For example, if the Kernel TCP/IP protocol stack uses a memory pool to manage the sk_buff structure, you can dynamically adjust the memory pool (skb_head_pool) during runtime) by echo XXXX>/proc/sys/NET/CORE/hot_list_length. For example, the listen function's 2nd parameters (TCP completes the length of the packet queue after three handshakes) can also be dynamically adjusted based on the memory size of your platform. Even in a special system with a large number of data packets but the size of each data packet itself is small, try the latest napi NIC driver architecture.