Nginx health check and load balancing mechanism analysis

Source: Internet
Author: User

Nginx is an excellent reverse proxy server. Here we mainly talk about its health check and load balancing mechanisms, as well as the problems caused by these mechanisms. The so-called health check means that when there is a problem at the backend (specifically, what is a problem, depending on the specific implementation, different implementation definitions), no longer distribute requests to the backend and perform subsequent checks, until the backend returns to normal. The so-called Server Load balancer is to select the backend mode and how to distribute requests evenly to the backend (based on the backend capabilities. In addition, when a backend request fails, the request must be distributed to another backend (redispatch ). Here, ngx_http_upstream_round_robin (RR) is used as the load balancing module, and ngx_http_proxy_module (check proxy) is used as the backend proxy module.

 

Nginx health check is closely related to Server Load balancer. It does not have an independent health check module, but uses service requests as health check, which saves independent health check threads. This is a benefit. The disadvantage is that when the business is complex, a false positive may occur. For example, the backend response times out. This may be because the backend is down or a service request has a problem, but it has nothing to do with the backend. If the backend is down, nginx will still Distribute Service requests to it from time to time after marking it as unavailable to check whether the backend is restored.

 

Nginx parses the client request header. Upstream calls the peer. Get of the RR module to select a specific backend. When the request ends, upstream calls peer. Free of the RR module to report the backend health status to RR. When an error occurs during the communication between upstream and the backend, ngx_http_upstream_next is called,

Void ngx_http_upstream_next (ngx_http_request_t * r, ngx_http_upstream_t * u, ngx_uint_t ft_type); the third parameter specifies the error type, which has the following error types:

 

# Define ngx_http_upstream_ft_error 0x00000002

# Define ngx_http_upstream_ft_timeout 0x00000004

# Define ngx_http_upstream_ft_invalid_header 0x00000008

# Define ngx_http_upstream_ft_http_500 0x00000010

# Define nginx http_upstream_ft_http_502 0x00000020

# Define ngx_http_upstream_ft_http_503 0x00000040

# Define ngx_http_upstream_ft_http_504 0x00000080

# Define ngx_http_upstream_ft_http_404 0x00000100

# Define ngx_http_upstream_ft_updating 0x00000200

# Define ngx_http_upstream_ft_busy_lock 0x00000400

# Define ngx_http_upstream_ft_max_waiting 0x00000800

# Define ngx_http_upstream_ft_nolive 0x40000000

 

Ngx_http_upstream_next, as long as the error type is not ngx_http_upstream_ft_http_404, it is considered that there is a problem with the backend (ngx_peer_failed)

If (ft_type = ngx_http_upstream_ft_http_404 ){

State = ngx_peer_next;

} Else {

State = ngx_peer_failed;

}

Ngx_http_upstream_next calls RR's peer. Free. RR checks whether the backend that just received the request is healthy based on the state.

If (ft_type! = Ngx_http_upstream_ft_nolive ){

U-> peer. Free (& U-> peer, U-> peer. Data, State );

}

 

If ngx_http_upstream_next exceeds the maximum number of retries (the default value is the number of backend instances. If one retries, the value is reduced by 1), or the proxy does not allow redispatch, the response status is returned to the client.

If (u-> peer. Tries = 0 |! (U-> conf-> next_upstream & ft_type )){

Ngx_http_upstream_finalize_request (R, U, status );

}

Proxy_next_upstream configuration of the proxy module, in which case the request redispatch is sent to the next backend.

 

As mentioned earlier, as long as the error type is not ngx_http_upstream_ft_http_404, the backend is considered to be faulty. The error types include connection failure to the backend, connection, read/write backend timeout, and 500,502,504 returned from the backend. This policy is open to question, especially when the read/write backend times out. A service request may cause read/write timeout due to its own reasons. Note: specifying timeout in proxy_next_upstream and http_504 is different. The former indicates upstream connection, and the read/write backend times out. The latter indicates that the HTTP code returned by the backend is 504.

 

In fact, health check is not necessary because the existence of redispatch ensures that the client still receives the correct response even if the backend is down. We will consider disabling health check. Max_fails parameters configured through the upstream Server

 

Rr peer. Get. If max_fails is 0, the backend is always available (even if it is faulty ).

If (peer-> max_fails = 0

| Peer-> fails <peer-> max_fails)

{

Break;

}

 

Because the number of redispatches depends on the number of backend instances, it is advantageous to have a slightly larger number of backend instances.

 

Below are some tests to prove the analysis.

 

Upstream test {

Server 127.0.0.1: 8060 max_fails = 0;

Server 127.0.0.1: 8070 max_fails = 0;

Server 127.0.0.1: 8080 max_fails = 0;

Server 127.0.0.1: 8090 max_fails = 0;

}

Only and are alive, and are unavailable. max_fails = 0 and health check is disabled.

 

Proxy_read_timeout 2;

The read timeout value is set to 2 s.

 

Proxy_next_upstream error timeout;

By default, when error and timeout occur, redispatch.

 

The sleep parameter of the test request specifies the backend sleep time, and the code parameter specifies the HTTP code returned by the backend. Based on the comparison of time and sleep time, it is determined that several backend instances are retried.

 

Time curl "http: // 127.0.0.1: 8099/index. php? Sleep = 3 "-VV

Real 0m4. 014 s

Sleep = 3. The read times out and two backend instances are retried.

 

Modify proxy_next_upstream error;

 

Time curl "http: // 127.0.0.1: 8099/index. php? Sleep = 3 "-VV

Real 0m2. 018 s

Read timeout, no redispatch, and 1 backend is retried.

 

Modify proxy_next_upstream error http_504;

 

Time curl "http: // 127.0.0.1: 8099/index. php? Sleep = 1 "-VV

Real 0m1. 022 s

This is a normal request.

 

Time curl "http: // 127.0.0.1: 8099/index. php? Sleep = 1 & code = 504 "-VV

Real 0m2. 023 s

Let the backend return 504. nginx will perform redispatch and retry two backend servers.

However, nginx returns 502 to the client, not 504 because all backend servers return 504. nginx considers the backend to be unavailable and returns 502.

 

Test the health check and disable redispatch. Proxy_next_upstream off;

 

Curl & quot; http: // 127.0.0.1: 8099/index. php? Sleep = 3 "-VV

Returns 502 for two times and 504 for two times. The surviving backend returns 504, And the problematic return is 502.

 

Modify max_fails server 127.0.0.1: 8060 max_fails = 1; Enable health check for 8060.

 

Curl & quot; http: // 127.0.0.1: 8099/index. php? Sleep = 3 "-VV

Four requests in the first round, two 502 requests and two 504 requests

If there is a problem with 8080 and 8090, 8070, 504 and 8060 are returned, and 504 is returned. Because enables health check and is returned, it is marked as unavailable.

For the second round of 4 requests, the system returns three 502 requests and one 504 request. 8070 health check is not enabled, so 504 is still returned.

 

According to the test analysis, business requests (sleep 3 s, or HTTP 504 output) can make nginx mistakenly think that the backend is down, and then the backend is well-functioning. On the private cloud platform, this is usually not a problem. You can avoid this problem by setting the timeout value to a greater value without returning the 5xx error. But on the public cloud platform, this is fatal because the business can program to output 5xx errors. There are two ways to deal with it. One is to disable health check, the other is to modify the nginx code, and the ngx_http_upstream_ft_error is identified as a backend error only.

 

Nginx health check and load balancing mechanism analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.