Linux I/O multiplexing
Linux I/O multiplexing
1 I/O multiplexing
I/O multiplexing allows you to monitor multiple descriptors. Once a descriptor is ready (generally read or write), you can notify the program to perform corresponding read/write operations.
I/O multiplexing is a technology designed to solve the problem of process or thread blocking to an I/O system call, so that the process does not block a specific I/O system call.
2 I/O multiplexing select
This function allows the process to instruct the kernel to wait for any of the multiple events to be sent, and to wake up only when one or more events occur or after a specified period of time.
2.1 select function
2.1.1 header files required
# Include
# Include
# Include
# Include
2.1.2 declarations and return values
1 Statement
Int select (int nfds, fd_set * readfds, fd_set * writefds, fd_set * limit TFDs, struct timeval * timeout );
2 Return Value
Success: Number of ready descriptors. 0 is returned for timeout.
Error:-1.
2.1.3 Functions
Monitor and wait for attributes of multiple file descriptors to change (readable, writable, or incorrect ). The file descriptors monitored by the select () function are divided into three categories: writefds, readfds, and effectfds. After the call, the select () function will be blocked until a descriptor is ready (data is readable, writable, or has an error exception) or timeout (timeout specifies the wait time. When the select () function returns, you can traverse fdset to find the ready descriptor.
2.1.4 Parameters
1 nfds: the range of file descriptors to be monitored. Generally, the maximum value of the number of monitored descriptors is + 1. For example, write 10 here. In this case, the descriptor 0, 1, 2 ...... 9 will be monitored. The maximum value in Linux is generally 1024.
2 readfd: a collection of monitored readable descriptors. As long as a file descriptor is about to be read, this file descriptor is stored here.
3 writefds: a set of writable descriptors monitored.
4 forbidden TFDs: a collection of error and exception descriptors monitored.
5 timeout tells the kernel how much time it takes to wait for any ready in the specified description. The timeval structure is used to specify the number of seconds and the number of microseconds during this period.
Struct timeval {
Long TV _sec; // seconds
Long TV _usec; // microseconds
};
Value that can be set for timeout:
1. Set this parameter to NULL. It indicates that it will wait forever. It is returned only when a descriptive word is ready for I/O.
2. Set this parameter to the value of the number of seconds and the number of microseconds in the timeval structure. Indicates waiting for the specified time-out period. If no descriptive word is ready for I/O after the time-out period expires, the system returns the result directly.
3. Set this parameter to the value of the number of seconds and the number of microseconds in the timeval structure, and the number of seconds and microseconds are both 0. Returns immediately after I/O is prepared without checking the description. This is called round robin.
2.1.5 fd_set
Fd_set can be understood as a collection, which stores file descriptors and can be set through the following four macros:
1 void FD_ZERO (fd_set * fdset); // clear the set
2 void FD_SET (int fd, fd_set * fdset); // Add a given file descriptor to the set.
3 void FD_CLR (int fd, fd_set * fdset); // delete a given file descriptor from the set
4int FD_ISSET (int fd, fd_set * fdset); // check whether the specified file descriptor in the set can be read and written.
2.3 Advantages and Disadvantages of select
2.3.1 advantages
Select () is currently supported on almost all platforms, and its good cross-platform support is also an advantage.
2.3.2 disadvantages
1. Each time you call select (), you need to copy the fd set from the user State to the kernel state. This overhead is very high in many fd cases, and each time you call select () all are transmitted in the kernel traversal. This overhead is also very high in most cases.
2. There is a maximum limit on the number of file descriptors that a single process can monitor. in Linux, the limit is generally 1024. You can increase this limit by modifying the macro definition or re-compiling the kernel, however, this will also reduce the efficiency.
3 I/O multiplexing poll
The nature of select () and poll () System calls is the same. The former is introduced in bsd unix, and the latter is introduced in System V. The mechanism of poll () is similar to that of select (). It is essentially no different from select (). Managing multiple descriptors also performs Round Robin and Processing Based on the descriptor status, but poll () there is no limit on the maximum number of file descriptors (but the performance will also decrease when the number is too large ). Poll () and select () have the same disadvantage that an array containing a large number of file descriptors is replicated between the user State and the kernel address space, regardless of whether these file descriptors are ready, its overhead increases linearly with the increase in the number of file descriptors.
3.1 poll Functions
3.1.1 header files required
# Include
3.1.2 declarations and return values
1 Statement
Intpoll (struct pollfd * fds, nfds_t nfds, int timeout );
2 Return Value
When successful, poll () returns the number of file descriptors whose revents field is not 0. If no event occurs before the timeout, poll () returns 0;
When a failure occurs, poll () returns-1 and sets errno to one of the following values:
EBADF: the specified file descriptor in one or more struct is invalid.
EFAULT: the fds Pointer Points to an address that exceeds the address space of the process.
EINTR: a signal is generated before the request event, and the call can be initiated again.
EINVAL: The nfds parameter exceeds the PLIMIT_NOFILE value.
ENOMEM: the request cannot be completed because the available memory is insufficient.
3.1.3 Functions
Monitors and waits for attribute changes of multiple file descriptors.
3.1.4 Parameters
1fds and select () use three bitmaps to represent three fdsets, and poll () uses a pollfd pointer. A pollfd struct array contains the file descriptors and events you want to test. events are determined by the event domain events in the structure. The actual occurrence time after the call is written in the revents field of the struct.
Struct pollfd {
Int fd; // file descriptor
Short events; // The waiting event.
Short revents; // The event actually occurred.
};
Each pollfd struct in fd specifies a monitored file descriptor. Multiple structs can be transmitted to indicate that poll () Monitors multiple file descriptors.
Events: The events field of each struct is the event mask that monitors the file descriptor, which is set by the user. The events wait event mask value is as follows:
Processing input:
POLLIN is common or has a data-readable priority.
POLLRDNORM common data readable
POLLRDBAND has a data-readable priority.
POLLPRI high-priority data readable
Processing output:
POLLOUT normal or priority with data writable
POLLWRNORM common data writable
POLLWRBAND priority with data writable
Handling error:
POLLERR Error
POLLHUP suspended
The POLLVAL description is not an open file.
Poll () processes three levels of data: normal, priority band with priority, and high priority with a high priority. These are all for stream implementation.
POLLIN | POLLPRI is equivalent to the read event of select.
POLLOUT | POLLWRBAND is equivalent to the write event of select.
POLLIN is equivalent to POLLRDNORM | POLLRDBAND.
POLLOUT is equivalent to POLLWRNORM.
For example, to monitor whether a file descriptor is readable and writable, we can set events to POLLIN | POLLOUT.
The revents domain is the event mask of the file descriptor operation result. The kernel sets this domain when it calls the response. Any event requested in the events domain may be returned in the revents domain. The events domain of each struct is set by the user to tell the kernel what we are concerned about, and the revents domain is set by the kernel when the returned result shows what events occurred to this descriptor.
2 nfds is used to specify the number of elements in the first parameter array.
3 timeout: specifies the number of milliseconds to wait.
If timeout is set to the number of milliseconds to wait, poll () returns no matter whether I/O is ready or not.
If the value of timeout is set to 0, the poll () function returns immediately.
If timeout is set to-1, poll () is blocked until a specified event occurs.
4 I/O multiplexing epoll
Epoll is proposed in the 2.6 kernel and is an enhanced version of the earlier select () and poll () versions. Compared with select () and poll (), epoll is more flexible with no descriptor restrictions. Epoll uses a file descriptor to manage multiple descriptors. It stores the file descriptor events of user relations in an event table of the kernel, so that only one copy is required in the user space and kernel space.
4.1 header files required
# Include
4.2 statement
Int epoll_create (int size );
Int epoll_ctl (int epfd, int op, intfd, struct epoll_event * event );
Int epoll_wait (int epfd, structepoll_event * events, int maxevents, int timeout );
4.3 epoll_create Function
Int epoll_create (int size );
4.3.1 Functions
This function generates an epoll-specific file descriptor (create an epoll handle ).
4.3.2 Parameters
Size is used to tell the kernel the total number of listeners. The parameter size does not limit the maximum number of descriptors that epoll can listen to. It is just a suggestion for the kernel to initially allocate internal data structures.
Since linux2.6.8, the size parameter is ignored. That is to say, you can enter only any value greater than 0. Note that after the epoll handle is created, it occupies an fd value. In linux, If you view/proc/process id/fd /, you can see this fd, so you must call close () to close it after epoll is used. Otherwise, the fd may be exhausted.
4.3.3 Return Value
Success: file descriptor dedicated to epoll
Failed:-1
4.4 epoll_ctl Function
Int epoll_ctl (int epfd, int op, int fd, struct epoll_event * event );
4.4.1 Functions
The epoll event registration function is different from the select () function that tells the kernel what type of events to listen to when listening to events. Instead, it registers the event type to be listened.
4.4.2 Parameters
1 epfd epoll-specific file descriptor, epoll_create () returned value
2 op indicates the action, which is represented by three macros:
EPOLL_CTL_ADD: register a new fd to epfd;
EPOLL_CTL_MOD: modifies the listener events of the registered fd;
EPOLL_CTL_DEL: delete an fd from epfd;
3 fd file descriptor to be monitored
4 event tells the kernel what event to listen for. The structure of struct epoll_event is as follows:
// Save the data related to a file descriptor of the trigger event (related to the specific usage)
Typedef union epoll_data {
Void * ptr;
Intfd;
_ Uint32_tu32;
_ Uint64_tu64;
} Epoll_data_t;
// Events of interest and events triggered
Struct epoll_event {
_ Uint32_tevents;/* Epoll events */
Epoll_data_tdata;/* User data variable */
};
Events can be a collection of the following macros:
EPOLLIN: indicates that the corresponding file descriptor can be read (including the normal shutdown of the Peer SOCKET );
EPOLLOUT: indicates that the corresponding file descriptor can be written;
EPOLLPRI: indicates that the corresponding file descriptor has an urgent readable data (Here it should indicate that out-of-band data has arrived );
EPOLLERR: indicates that the corresponding file descriptor is incorrect;
EPOLLHUP: indicates that the corresponding file descriptor is hung up;
EPOLLET: Set EPOLL to Edge Triggered mode, which is relative to Level Triggered.
EPOLLONESHOT: only listens for an event once. After listening for this event, if you want to continue listening for this socket, you need to add this socket to the EPOLL queue again.
4.4.3 Return Value
Success: 0
Failed:-1
4.5 epoll_wait Function
Int epoll_wait (int epfd, structepoll_event * events, int maxevents, int timeout );
4.5.1 Functions
Wait for event generation to collect events that have been sent in epoll-monitored events, similar to select () calls.
4.5.2 Parameters
1 epfdepoll private file descriptor, epoll_create () returned value
2. In the epoll_event struct array allocated by events, epoll will assign events to the events array (events cannot be a null pointer, and the kernel will only copy data to this events array, will not help us allocate memory in user mode ).
3. maxevents: indicates the kernel size of the events.
4 timeout.
If timeout is set to the number of milliseconds to wait, it will be returned regardless of whether I/O is ready or not.
If the value of timeout is set to 0, the function returns immediately.
If timeout is set to-1, a specified event is blocked until it occurs.
4.5.3 Return Value
Success: return the number of events to be processed. If 0 is returned, the event has timed out.
Failed:-1
4.6 LT and ET Modes
Epoll has two modes for file descriptor operations: LT (level trigger) and ET (edge trigger ). The LT mode is the default mode.
4.6.1 LT Mode
When epoll_wait detects a descriptor event and notifies the application of this event, the application does not immediately process the event. The next time epoll_wait is called, it will respond to the application again and notify this event.
4.6.2 ET Mode
When epoll_wait detects a descriptor event and notifies the application of this event, the application must immediately process the event. If not, the next time epoll_wait is called, it will not respond to the application again and notify this event.
4.6.3 comparison between the LT mode and the ET Mode
The ET mode greatly reduces the number of times epoll events are repeatedly triggered, so the efficiency is higher than that of the LT mode. When epoll works in the ET mode, it is necessary to use a non-blocking set interface to avoid starving the task of processing multiple file descriptors due to the blocking read/blocking write operation of a file handle.
4.7 advantages of epoll
1. In select/poll, the kernel scans all monitored file descriptors only after a certain method is called, and epoll () first uses epoll_ctl () to register a file descriptor. Once a file descriptor is ready, the kernel uses a callback mechanism similar to callback (software interruption) to quickly activate this file descriptor. When the process calls epoll_wait () you will be notified.
2. The number of monitored descriptors is unrestricted. The maximum FD value supported by the monitored descriptor is the maximum number of files that can be opened. This number is generally greater than 2048. For example, the size of a machine with 1 GB of memory is about 0.1 million. You can check the specific number by using cat/proc/sys/fs/file-max. Generally, this number has a great relationship with the system memory. The biggest drawback of select () is that there is a limit on the number of fd opened by the process. This is not suitable for servers with a large number of connections. Although you can also choose a multi-process solution (Apache is implemented in this way), although the cost of creating a process on Linux is relatively small, it can still be ignored, in addition, data synchronization between processes is far less efficient than inter-thread synchronization, so it is not a perfect solution.
3. I/O efficiency will not decrease as the number of monitoring fd increases. The select (), poll () implementation requires you to constantly poll all fd sets until the device is ready, during which sleep and wakeup may alternate multiple times. Epoll also needs to call epoll_wait () to continuously poll the ready linked list. During this period, sleep and wake up may alternate multiple times. However, when the device is ready, epoll calls the callback function, put the ready fd in the ready linked list and wake up the process that enters sleep in epoll_wait. Although both sleep and alternate, select () and poll () traverse the entire fd set while being "Awake, when epoll is "Awake", you only need to judge whether the ready linked list is empty, which saves a lot of CPU time. This is the performance improvement brought about by the callback mechanism.
4. select () and poll () each call must copy the fd set from the user State to the kernel state. epoll only needs to copy the fd set once, which can save a lot of overhead.
Reference link:
Http://blog.csdn.net/lingfengtengfei/article/details/12392449
Http://blog.csdn.net/god2469/article/details/8761346
Http://blog.csdn.net/u010155023/article/details/53507788
Http://blog.csdn.net/tennysonsky/article/details/45745887