How to solve multi-process multiplexing wake-up conflicts in Linux systems

Last Update:2017-01-13 Source: Internet

Author: User

Tags epoll sleep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linux for Accept (2) of the Panic Group (thundering herd) problem, has long been resolved. At present, many people also call this phenomenon a new surprise group: When using a multiplexing model, the intersection of a file descriptor set that is monitored by different processes is not empty, and when a file IO event of this intersection is triggered, the kernel will monitor the IO and block over the Select (2), poll (2) or Epoll_ Wait (2) the process wakes up. But strictly speaking, this kind of phenomenon is not called the startled group (thundering Herd), but the conflict (collision). For the kernel, it is reasonable to wake up all the processes that monitor this IO event. This is because: Select/poll/epoll differs from accept, the file descriptors they monitor can be handled simultaneously by multiple processes, such as a process that reads only a small part of the file handle, and the other process reads the remainder. The sockets that accept handles are mutually exclusive, and a socket cannot be accept by two processes.

I have noticed that there are many misconceptions about this select/poll/epoll conflict, such as the fact that someone uses a similar code to simulate a select conflict (there is truth in a select group or epoll):

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <strings.h>
#include <arpa/inet.h>

void Worker_hander (int listenfd)
{
Fd_set RSet;
int CONNFD, ret;

printf ("Worker pid#%d is waiting for CONNECTION...N", Getpid ());
for (;;) {
Fd_zero (&rset);
Fd_set (Listenfd,&rset);
ret = select (Listenfd+1,&rset,null,null,null);
if (Ret < 0)
Perror ("select");
else if (Ret > 0 && fd_isset (LISTENFD, &rset)) {
printf ("Worker pid#%d ' s LISTENFD is Readablen",
Getpid ());
CONNFD = Accept (LISTENFD, NULL, 0);
if (CONNFD < 0) {
Perror ("Accept error");
Continue
}
printf ("Worker pid#%d create a new CONNECTION...N",
Getpid ());
Sleep (1);
Close (CONNFD);
}
}
}

static int fd_set_noblock (int fd)
{
int flags;

Flags = FCNTL (FD, F_GETFL);
if (flags = = 1)
return-1;
Flags |= O_nonblock;
Flags = FCNTL (FD, F_SETFL, flags);
return flags;
}

int main (int argc,char*argv[])
{
int LISTENFD;
struct sockaddr_in servaddr;
int sock_opt = 1;

LISTENFD = socket (af_inet,sock_stream,0);
if (LISTENFD < 0) {
Perror ("socket");
Exit (1);
}
Fd_set_noblock (LISTENFD);
if (setsockopt (LISTENFD, Sol_socket, SO_REUSEADDR, (void *) &sock_opt,
sizeof (sock_opt)) < 0) {
Perror ("setsockopt");
Exit (1);
}
Bzero (&servaddr, sizeof servaddr);
servaddr.sin_family = af_inet;
SERVADDR.SIN_ADDR.S_ADDR = htonl (Inaddr_any);
Servaddr.sin_port = htons (1234);
Bind (LISTENFD, (struct sockaddr*) &servaddr, sizeof (SERVADDR));
Listen (LISTENFD, 10);

pid_t pid;
PID = fork ();
if (PID < 0) {
Perror ("fork");
Exit (1);
else if (PID = = 0)
Worker_hander (LISTENFD);
Worker_hander (LISTENFD);
return 0;
}

After compiling, run the above server first, the client can simulate the connection with Netcat:

NC 127.0.0.1 1234

The code above is two processes that monitor the same file descriptor at the same time, and the resulting return is basically a single select return. So the pilot said, "Not all the work process is awakened, but only a part of the awakening."

The realization of this mistake is not to understand the meaning of awakening. It is not called to return from select (2) to wake.

Before a process waits for an IO event, the kernel sets the task_interruptible state for that process descriptor, at which time the process descriptor is in the wait queue. Once the waiting event occurs, the process is awakened, the process descriptor is moved to the run queue, and when a process switch occurs, the kernel process Scheduler selects a process execution from the run queue based on the scheduling policy.

Thus, the above program actually wakes up all two processes, except that the scheduled process select (2) is returned, and the IO event is disposed of without process switching if the execution to accept (2) occurs. And wait until after the execution of the process, select (2) has no such IO event, the kernel detects this process does not monitor the event occurs, will continue to put the process into the waiting queue, select (2) did not return. The probability of this situation is very large. Another small probability scenario is that process switching occurs when a scheduled process executes to accept (2), and the scheduler initiates the latter process before the next run, so that the latter process is returned from select (2).

The latter situation is not easy to occur, and inserting usleep (3) or sleep (3) before ACCETP (2) can increase the probability of occurrence.

The kernel wake-up process does not allow the process to execute, moving it to the waiting queue again, causing a certain overhead waste. Nginx is handled by using a management process to manage multiplexing of multiple worker processes. The worker process applies a lock to the management process before epoll_wait (2) to ensure that at the same time, multiple processes are empty at the intersection of the set of file descriptors that the Epoll listens to.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More