Introduction to the network I/O multiplexing model select & amp; poll & amp; epoll

Source: Internet
Author: User

Introduction to the network I/O multiplexing model select & poll & epoll

First, we need to know that select, poll, and epoll are IO multiplexing mechanisms. I/O multiplexing uses a mechanism to monitor multiple descriptors. Once a descriptor is ready (generally read or write), it can notify the program to perform corresponding read/write operations. However, select, poll, and epoll are essentially synchronous I/O, because they all need to read and write after the Read and Write events are ready, that is, the read and write process is blocked.

Basic usage of select: http://blog.csdn.net/nk_test/article/details/49256129

Basic usage of poll: http://blog.csdn.net/nk_test/article/details/49283325

Epoll basic usage: http://blog.csdn.net/nk_test/article/details/49331717

Next we will discuss how to correctly use non-blocking I/O Multiplexing + poll/epoll.

Let's talk about several common problems:

1. Generation and processing of SIGPIPE Signals

 

If the client uses close to close the socket, and the server calls a write operation, the server receives an RST segment (TCP Transport Layer). If the server calls write again, at this time, the SIGPIPE signal will be generated. If this signal is not ignored, the program will be exited by default, which obviously does not meet the high availability of the server. It can be ignored directly in the program, such as signal (SIGPIPE, SIG_IGN ).

2. Influence of TIME_WAIT status on the server

Avoid TIME_WAIT status on the server as much as possible. If the server actively disconnects (call close before the client), the server enters the TIME_WAIT status, and the kernel holds some resources, greatly reducing the server's concurrency. Solution: the protocol design should allow the client to actively disconnect, so that the TIME_WAIT status can be dispersed to a large number of clients. If the client is inactive, some malicious clients continuously initiate connections, which will occupy the connection resources of the server. Therefore, the server should also have the opportunity to kill inactive connections.

3. New accept4 system call

With the flags parameter added, you can set the following two flags:

 

 int accept4(int sockfd, struct sockaddr *addr, socklen_t *addrlen, int flags);SOCK_NONBLOCK   Set the O_NONBLOCK file status flag on the new open file description.  Using this flag saves  extra  calls to fcntl(2) to achieve the same result. SOCK_CLOEXEC    Set  the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.  See the description of the O_CLOEXEC  flag in open(2) for reasons why this may be useful.
When a process is replaced, the file descriptor is closed to set the returned connected socket. You can also use fcntl to set it, but the efficiency is slightly lower.

 

4. accept (2) returns EMFILE processing (the file descriptor has been used up)

(1) Increase the number of process file descriptors
(2) death
(3) Exit the program
(4) disable the listening socket. So when will it be re-opened?
(5) For the epoll model, you can use edge trigger instead. The problem is that if an accept (2) is missed, the program will no longer receive a new connection (no status change)
(6) Prepare an idle file descriptor. In this case, first close the idle file and get a file descriptor quota; then accept (2) Get the file descriptor of the socket connection; then immediately close (2 ), in this way, the connection to the client is elegantly disconnected. Finally, the idle file is re-opened and the "pit" is filled in for use in case of such a situation again.
int idlefd = open("/dev/null", O_RDONLY | O_CLOEXEC);connfd = accept4(listenfd, (struct sockaddr *)&peeraddr,                 &peerlen, SOCK_NONBLOCK | SOCK_CLOEXEC);/*          if (connfd == -1)                ERR_EXIT("accept4");*/if (connfd == -1){    if (errno == EMFILE)    {        close(idlefd);        idlefd = accept(listenfd, NULL, NULL);        close(idlefd);        idlefd = open("/dev/null", O_RDONLY | O_CLOEXEC);        continue;    }    else        ERR_EXIT("accept4");}

 

(1) poll handling process and issues needing attention


Notes:

(1) packet sticking problem: the read operation may not read all the data in the connfd receiving buffer (kernel) at one time, so the connfd will remain active next time. We should save the read data in the connfd Application Layer buffer (char buf [1024]) and process the boundary of good news.

(2) When the data volume of the write response is large, it may not be able to send all the data to the kernel buffer at a time. Therefore, there should be a buffer at the application layer, add the unsent data to the sending buffer at the application layer.

(3) Time to pay attention to the connfd POLLOUT event. When the POLLOUT event arrives, the application layer sends the buffer data to write. If the application layer sends the buffer data, the POLLOUT event is not followed. POLLOUT event trigger condition: connfd's sending buffer (kernel) is not enough (can accommodate data ).

 

Note: The connfd receiving buffer (kernel) data is cleared after it is received. When the receiving data segment receives the ACK segment of the other party after it is sent, the sending buffer (kernel) data segment is cleared. Write only copies the sending buffer data at the application layer to the kernel sending buffer corresponding to connfd and returns the result. read Only copies the Received Buffer data from the kernel receiving buffer corresponding to connfd to the receiving buffer at the application layer and returns the result.

(2) epoll handling process and issues needing attention Level trigger mode:
The basic processing process is very similar to that of poll. Note that epoll_wait returns active data and can be processed directly without traversing. If the write return succeeds, the data is copied to the kernel buffer. EPOLLIN event
A socket receiving buffer in the kernel is null. A socket receiving buffer in the kernel is not null.
EPOLLOUT event
A socket in the kernel sends a buffer below the high level. A socket in the kernel sends a buffer at full low level.
Note: As long as the first write is not complete, the next time you call write, add the data directly to the application layer buffer OutBuffer and wait for the EPOLLOUT event. Edge trigger mode:
Disadvantages:

This vulnerability may cause an accept vulnerability, which is difficult to handle;

 

When the file descriptor reaches the upper limit, it remains at a high level and will not be triggered again. It is also difficult to handle.

The reason why epoll is recommended to use the <mode:One of them is compatible with poll; the LT (horizontal) mode will not miss the BUG of the event, but the POLLOUT event cannot be concerned at the beginning, otherwise, the busy loop will appear (that is, no data needs to be written yet, but once the connection is established, the POLLOUT event will always be triggered if the kernel sending buffer is empty ), instead, you should pay attention to the issue when the write statement cannot be completely written to the kernel buffer. Add the data not written to the kernel buffer to the application-layer output buffer until the application-layer output buffer is written, and stop paying attention to the POLLOUT event. You do not have to wait for EAGAIN during read/write, which can save the number of system calls and reduce latency. (Note: If the ET mode is used, EAGAIN is read during read, and EAGAIN is written until the output buffer is written or written)

 

Note: when using the ET mode, you can write more strictly, that is, set listenfd to non-blocking. If accpet calls a response, you cannot return to epoll_wait immediately except to establish the current connection, you also need to continue loop accpet until-1 is returned and errno = EAGAIN is exited. The sample code is as follows:

If (ev. events & EPOLLIN) {do {struct sockaddr_in stSockAddr; socklen_t iSockAddrSize = sizeof (bytes); int iRetCode = accept (listenfd, (struct sockaddr *) & stSockAddr, iSockAddrSize ); if (iRetCode> 0 ){//... establish connection // Add event follow} else {// do not continue accept if (errno = EAGAIN) {break ;}} while (true) until EAGAIN occurs ); //... other EPOLLIN events}
(3) Comparison of the three aspects

Select: fd_set has a limit on the number of file descriptors. In addition, each time it is copied to the kernel space, the complexity of O (N) scanning is performed. We need to traverse all file descriptors to determine whether an event has occurred.

Poll: it is also copy and round robin. The chain table copied to the kernel has no limit on the maximum number of connections.

Epoll: Use the shared memory (mmap) to reduce the replication overhead and store the sockets you are interested in. It operates in kernel mode and uses the callback mechanism of Event Notification.

Note: epoll is not the most efficient under any conditions. You need to determine which I/O to use based on actual application conditions.

If the number of connected sockets is not large and these sockets are always active, calling the callback function continuously may cause inefficiency, that is, lower than one-time traversal, in this case, epoll may be less efficient than select and poll.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.