How does nginx solve the "surprise group" phenomenon?

Last Update:2014-08-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, let's explain what is a "surprise group" phenomenon: if multiple worker processes have a listening set of interfaces at the same time, once the set of interfaces has a client request, in this case, all the worker processes with this set of interfaces will compete for this request. Only one worker process can compete, and other worker processes are doomed to return without success. This phenomenon is"Surprise group".

Nginx uses the load balancing policy to solve this problem. Next, we will introduce nginx's load balancing policy in detail based on the nginx source code.

The first is how to enable the load balancing policy for nginx: Of course, a multi-process model is run for nginx, and the number of worker processes is greater than 1. This is easy to understand. Only when multiple worker processes compete for one set of interfaces will there be a group shock and a load balancing policy be required.

if (ccf->master && ccf->worker_processes > 1 && ecf->accept_mutex) {        ngx_use_accept_mutex = 1;        ngx_accept_mutex_held = 0;        ngx_accept_mutex_delay = ecf->accept_mutex_delay;    } else {        ngx_use_accept_mutex = 0;    }

The variable ngx_use_accept_mutex is used to identify whether to enable the Server Load balancer policy. The load balancing policy here is also called" Frontend Server Load balancer", Because it is used to reasonably allocate client requests to the working process; and" Backend server Load balancer"Is used to reasonably select a backend server policy for processing client requests.

Next, we will introduce how the Server Load balancer policy assigns requests to the working process reasonably. The ngx_process_events_and_timers () has the following code:

 if (ngx_use_accept_mutex) {        if (ngx_accept_disabled > 0) {            ngx_accept_disabled--;        } else {            if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {                return;            }            if (ngx_accept_mutex_held) {                flags |= NGX_POST_EVENTS;            } else {                if (timer == NGX_TIMER_INFINITE                    || timer > ngx_accept_mutex_delay)                {                    timer = ngx_accept_mutex_delay;                }            }        }    }

The Code takes effect only after Server Load balancer (ngx_use_accept_mutex = 1) is enabled. In this logic, we first check whether the ngx_accept_diabled value of the variable is greater than 0 to determine whether the current process is overloaded. Why do we need to understand the meaning of the ngx_accept_diabled value in accept () you can see it in the ngx_event_accept () processing function that accepts the new connection request.

 ngx_accept_disabled = ngx_cycle->connection_n / 8                              - ngx_cycle->free_connection_n;

Ngx_cycle-> connection_n indicates the maximum number of tolerable connections in a working process. You can use the worker_connections command to configure it. The default value is 512. another variable ngx_cycle-> free_connection_n indicates the number of available connections. If the number of active connections is X, the value is ngx_cycle-> connection_n-X. Therefore, the value of ngx_accept_diabled is:

ngx_accept_diabled = X - ngx_cycle->connection_n * 7 / 8;

That is to say, if the number of active connections (x) exceeds 7/8 of the maximum number of connections, overload occurs. The value of the variable ngx_accept_diabled is greater than 0. A larger value indicates a greater overload, the load of the current process is heavier.

Let's look back at the code in the ngx_process_events_and_timers () function. when the process is overloaded, the work is only to reduce the ngx_accept_diabled variable by 1, which indicates that it has undergone a round of event processing, the load is definitely reduced, so the value of the variable ngx_accept_diabled is adjusted accordingly. After a period of time, the ngx_accept_diabled will be reduced to less than 0, and you can obtain a new request connection through a lock. SoIt can be seen that 7/8 of the maximum number of tolerable connections is a load balancing point. When the load of a worker process reaches this critical point, it will not try to obtain the mutex lock, this allows new Server Load balancer instances to be added to other worker processes..

If the process is not overloaded, it will compete for the lock. Of course, it is actually the monitoring right of the listening set interface.All listener SetsAdded to its own event monitoring mechanism (if not in the past); If the lock contention fails, it will delete the listening set interface from its own event monitoring mechanism (if originally in the past ).

The value of the variable ngx_accept_mutex_held is used to identify whether the current lock exists. This is important because if the current lock exists, a flags variable is marked with ngx_post_events, this means that all events will be processed later.This is an agreement that must be followed by any architecture design. That is, the lock owner must minimize the lock holding time.. Therefore, most events are delayed until the lock is released and then processed. The lock is released as soon as possible, shortening the lock holding time so that other processes can obtain the lock as much as possible. If the current process does not have a lock, the timeout time of the event monitoring mechanism congestion point (such as epoll_wait () is limited to a relatively short range, and the faster the timeout, then it will jump out of the blocking more frequently, and there will be more opportunities to compete for mutex locks.

As mentioned above, when an event arrives, it will not be processed immediately, but will be set with a latency identifier (ngx_post_events ). When an event occurs, the event is cached as a linked list.

After talking about this, it seems that it has nothing to do with solving the problem. In fact, the content mentioned above has basically explained how nginx can avoid the problem. The following two points are summarized:

First, what if a new request is sent to the listener socket interface while processing the new connection event? This does not matter. The current process only processes cached events. New requests will be blocked on the listener socket interface, and the listener socket interface is added to the event monitoring mechanism horizontally, so it will be triggered and captured only when the process that gets the lock in the next round is applied to the event monitoring mechanism.

Second, when a process processes an event, it only releases the lock but does not delete the listening socket interface from the event monitoring mechanism. Therefore, when processing a cache event, the mutex lock may be compete by another process and add all listening socket interfaces to its event monitoring mechanism. Therefore, the listening socket may be owned by multiple processes at the same time. However, at the same time, the listening socket interface may only be monitored by one process, therefore, after the process processes cache events, it will compete for the locks. if it finds that the locks are occupied by other processes and fail to compete, it will delete all the listening interfaces from its event monitoring mechanism, before event monitoring.At the same time, the listening set interface can only be monitored by one process, which means nginx will not be affected by the Group.

How does nginx solve the "surprise group" phenomenon?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How does nginx solve the "surprise group" phenomenon?

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support