2016-nginx load Balancing-consistent hash (consistent hash)

Source: Internet
Author: User
Tags crc32 disk usage haproxy nginx load balancing

Nginx Version: 1.9.1


Algorithm Introduction

Consistent hashing algorithms are often used to load balance when the backend is a caching server.

The advantage of using a consistent hash is that when you add or subtract a clustered cache server, only a small amount of caching is invalidated and the source is smaller.

In Nginx+ats/haproxy+squid and other CDN architectures, the load balancing algorithm used by Nginx/haproxy is the consistent hash.

Let's give an example to illustrate the benefits of a consistent hash.

Suppose the back-end cluster contains three cache servers, A, B, and C.

Request R1, R2 fall on a.

Request R3, R4 fall on B.

Request R5, R6 fall on C.

With a consistent hash, when cache server B is down, R1/R2 will still fall on a, R5/R6 will still fall on C,

This means that the caches on both servers will not fail. The R3/R4 will be reassigned to a or C and generated back to the source.

Using other algorithms, when cache Server B is down, R1/R2 no longer falls on a, r5/r6 no longer falls on C.

That is, the cache on a, B, and C is invalid, and all requests are returned to the source.

The basic principle of a consistent hashing algorithm is not described here, and if you don't understand it, take a 10-minute look at this article:

Http://www.codeproject.com/Articles/56138/Consistent-hashing

Before parsing the module code, let's look at the consistent hash algorithm implemented by Nginx.

1. Initialize upstream block

The main task is to create and initialize real nodes, create and initialize virtual nodes.

Where the real node is created using the round robin method.

Q: How many virtual nodes are in total, and how many virtual nodes a real node corresponds to.

The weight of the real node is accumulated, the total weight value is total_weight, and the number of virtual nodes is generally total_weight * 160.

A real node with a weight of weight, the corresponding virtual node number is weight * 160.

Q: For each real node, how to create its corresponding virtual node.

1. The server member of the real node is the first parameter of its server directive, which is first resolved to host and port.

Base_hash = CRC32 (HOST 0 PORT)

A real node corresponds to weight * 160 virtual nodes, and Base_hash is the same for each virtual node.

2. To make the hash value of each virtual node different, Prev_hash is introduced, which is the hash value of the previous virtual node.

hash = CRC32 (Base_hash prev_hash)

3. A server member of a virtual node that points to the server member of the real node. So, by comparing the virtual node and the real node

Whether the server members are the same, you can determine whether they are relative.

After the virtual node array is created and initialized, the virtual nodes are sorted according to the hash value, and only the first one is reserved for the virtual node with the same hash value.

After the above steps, we get an array of all the virtual nodes, and the hash values of the elements are ordered without repetition. In other words, the ring was established.

2. Load Balancing data for initial speech requests

According to the real time value of the first parameter of the hash instruction Key,key is generally $host$uri, the hash value of this request is computed.

hash = CRC32 (KEY)

According to the hash value of the request, in the virtual node array, find the closest virtual node in the clockwise direction, indexed by I.

What is called the clockwise direction recently. Is Point[i-1].hash < hash <= Point[i].hash.

This request falls on the virtual node and is then processed by its corresponding real node.

3. Select the real node

In Peer.init, you already know which virtual node the request falls on.

In Peer.get, you need to find the real node for the virtual node.

Finds the same, available true nodes of the server members in the real-node array, based on the server members of the virtual node.

If it is not found, proceed clockwise to find the real node for the next virtual node.

If one is found, then it is.

If more than one is found, select one from by using the polling method.

4. Defects and Improvements

A virtual node and a real node are associated based on their server members.

There is a situation where a virtual node corresponds to a number of real nodes because:

If the first parameter of the server directive is a domain name and may be resolved to multiple real nodes, the server members of these real nodes are the same.

For a request, the hash value of its key is computed, and after the nearest virtual node is found, the virtual node is found to correspond to multiple real nodes.

Which real node to use. This module uses the polling method to select one from multiple real nodes.

But we know that in a scenario where a consistent hash is used, a real node is typically a caching server.

A virtual node that corresponds to multiple real nodes can cause a file to be slow to exist on multiple cache servers.

This increases the amount of disk usage, as well as the amount of back source, which is clearly not what we want to see.

The way to solve this problem is simply that the virtual node and the real node establish the association through the name member.

Because even with the same server configuration, the first parameter of the server is the domain name, and the name members of each real node are unique.

So, find a virtual node, you can find a unique real node, there is no such problem.

Data Structure

1. Real node

Is the round robin algorithm created by the back-end server, the type is ngx_http_upstream_rr_peer_t.

It should be noted that if the first parameter of the server directive is IP and port, then a single server instruction corresponds to only one real node.

If the first parameter of the server directive is a domain name, a server instruction may correspond to multiple real nodes.

Their server members are the same and can be distinguished by name members.[Java] View Plain Copy struct ngx_http_upstream_rr_peer_s {       struct  sockaddr *sockaddr; /*  back-end server address  */       socklen_t  socklen; /*  address length */       ngx_str_t name; /*  back-end server address string Name of,server.addrs[i].name */       ngx_str_t server; /* server ,server.name */               ngx_int_t  current_weight; /*  current weights, dynamic adjustment, initial value of 0 */       ngx_int_t  effective_weight; /*  effective weights will be reduced  */       ngx_int_t weight due to failure ; /*  the weights specified by the configuration item, fixed value  */          ngx_uint_t conns;  /*  Current number of connections  */          ngx_uint_t fails; /*   "For some time", has beenFailed times  */       time_t accessed; /*  last failed time point  */        time_t checked; /*  used to check for "some time"  */           ngx_uint_t max_fails; /*  "Over time", maximum number of failures, fixed value  */       time_t fail_timeout; /*  "A period of time" value, fixed value  */        ngx_uint_t down; /*  Server permanently unavailable logo  */               ngx_http_upstream_rr_peer_t *next; /*  point to the next back end to form a list  * /       ...  } ngx_http_upstream_rr_peer_t;  

Ngx_http_upstream_rr_peers_t represents a set of back-end servers, such as a back-end cluster.       [Java] view plain copy struct ngx_http_upstream_rr_peers_s {ngx_uint_t number;/* Back-end server quantity//... ngx_uint_t Total_weight; /* The cumulative value of all backend server weights */unsigned single:1; /* Whether only one back-end server/unsigned weighted:1; /* Whether to use the weight * * ngx_str_t *name; /* Upstream configuration block name */ngx_http_upstream_rr_peers_t *next; /* Backup Server cluster * * ngx_http_upstream_rr_peer_t *peer; /* Back-end server composition of the linked list * *};

2. Virtual Node

A real node, typically corresponding to weight * 160 virtual nodes.

The server member of the virtual node, pointing to the server member of the real node it belongs to, so that once a virtual node is found,

You can find the real node where it belongs.

But here's the problem, with a server member of a virtual node, you might find multiple real nodes, not one.

Because if the first parameter of the server directive is a domain name, then the server members of multiple real nodes are the same.[Java]View Plain copy typedef struct {uint32_t hash;/* Virtual node hash value * * ngx_str_t *server;/* Virtual node, true node, corresponding to the real node of the server      Member */} ngx_http_upstream_chash_point_t; typedef struct {ngx_uint_t number;//* Virtual nodes/ngx_http_upstream_chash_point_t point[1];/* Virtual node array/}      ngx_http_upstream_chash_points_t; typedef struct {ngx_http_complex_value_t key;/* The first parameter of the associated hash instruction, which is used to compute the hash value of the request * * Ngx_http_upstream_chash_point s_t *points; /* The array of virtual nodes */} ngx_http_upstream_chash_points_t;

3. The requested consistent hash data [Java] View Plain Copy typedef struct {       /* the round robin  data must be first */       ngx_http_upstream_rr_peer_ Data_t rrp; /* round robin per request Load Balancing data  */        ngx_http_upstream_hash_srv_conf_t *conf; /* server Configuration Block  */        ngx_str_t key; /*  for this request, the exact value of the first parameter of the hash instruction, used to compute the hash value for this request  */        ngx_uint_t tries; /*  number of virtual nodes that have been tried  */        ngx_uint_t rehash; /*  This algorithm does not use this member  */       uint32_t hash;  /*  find a virtual node in the nearest direction according to the hash value of the request, hash is the index of the virtual node in the array  */       ngx_event_ Peer.get function  */  } ngx_http_ of Get_peer_pt get_rr_peer; /* round robin algorithm Upstream_hash_peer_data_t;  

Round robin's per request load balancing data. [Java] view plain copy typedef struct {ngx_http_upstream_rr_peers_t *peers;/* Back-end cluster/ngx_http_up stream_rr_peer_t *current; /* The back-end server currently in use * * uintptr_t *tried; /* point to back-end server bitmap/uintptr_t data; * * When the number of backend servers is small, used to store its bitmap/} ngx_http_upstream_rr_peer_data_t;

the analytic function of the instruction

In a upstream configuration block, if there is a hash instruction and it takes only one parameter, the load balancing algorithm used is the hash algorithm, for example:

Hash $host $uri;

In a upstream configuration block, if there is a hash instruction and it takes two parameters and the second parameter is consistent, the

Load balancing algorithms are consistent hashing algorithms, such as:

Hash $host $uri consistent;

This shows that the module ngx_http_upstream_hash_module of the hash instruction implements two load balancing algorithms at the same time, and actually

Hash algorithms, consistent hashing algorithms can be implemented with two separate modules, and they do not have much correlation in themselves.

The implementation of the hashing algorithm is relatively simple, similar to the previous analysis of the Ip_hash, the next analysis is a consistent hash algorithm.

The analytic function of the hash instruction is mainly done:

The first parameter of the hash instruction is associated with a ngx_http_complex_value_t variable, and then the real-time value of the parameter can be obtained by the variable.

Specifies the properties supported by the server directive in this upstream block.

Based on the parameters carried by the hash instruction to determine whether to use the hash algorithm, or a consistent hash algorithm. If the second parameter of the hash instruction is "consistent",

Specifies the initialization function Uscf->peer.init_upstream of the upstream block using a consistent hash algorithm.[Java] View Plain copy Static char *ngx_http_upstream_hash (Ngx_conf_t *cf, ngx_command_t *cmd,  void *conf)    {       ngx_http_upstream_hash_srv_conf_t  *hcf = conf;       ngx_str_t *value;        ngx_http_upstream_srv_conf_t *uscf;       ngx_http_compile_complex_ value_t ccv;          value = cf->args->elts;        ngx_memzero (&ccv, sizeof (ngx_http_compile_complex_value_t));           /*  The first parameter of the hash instruction, associated to a ngx_http_complex_value_t variable,    After     *  you can get the real-time value of the parameter through this variable.        */       ccv.cf = conf;       ccv.value = &value[1];       ccv.complex_value = &hcf->key;           if  (Ngx_http_compile_complex_value (&AMP;CCV) &NBSP;!=&NBSP;NGX_OK)             return NGX_CONF_ERROR;           /*  get the upstream{} block  */       uscf = ngx_http_conf _get_module_srv_conf (Cf, ngx_http_upstream_module);       if  (uscf-> Peer.init_upstream)            ngx_conf_log_error (NGX_LOG_WARN , cf, 0,  "load balancing method redefined");           /*  specify attributes that are supported by the server directive in this upstream block  */       uscf->flags  = ngx_http_upstream_create   &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;|&NBSP;NGX_HTTP_ Upstream_weight           | NGX_HTTP_UPSTREAM_MAX_FAILS            | NGX_HTTP_UPSTREAM_FAIL_TIMEOUT            | NGX_HTTP_UPSTREAM_DOWN;           /*  determine whether to use a hash algorithm or a consistent hash algorithm based on the parameters carried by the hash command.        *  Each of these algorithms has its own upstream block initialization function.         */       if  (cf->args->nelts &NBSP;==&NBSP;2)            uscf->peer.init_upstream =  ngx_http_upstream_init_hash;       else if  (ngx_strcmp (value[2). data,  "consistent")  == 0)            uscf-> peer.init_upstream = ngx_http_upstream_init_chash;       else &nbsp         ngx_conf_log_error (ngx_log_emerg, cf, 0,  " Invalid parameter \ "%v\" ",  &value[2]);           return ngx_conf_ok;  }  

Initialize upstream block

After the parsing function of the instruction is executed, the init main conf function of all HTTP modules is immediately called.

The initialization function of all upstream blocks is invoked when the init main conf function of Ngx_http_upstream_module is executed.

For a upstream block that uses a consistent hash, its initialization function (Peer.init_upstream) is specified in the previous step

Ngx_http_upstream_init_chash, it was mainly done:

Call round robin's upstream block initialization function to create and initialize the real node

Specify the load balancing initialization function for the per request Peer.init

Creates and initializes an array of virtual nodes so that the virtual nodes in the array are ordered and not duplicated[Java] View Plain copy Static ngx_int_t ngx_http_upstream_init_chash (ngx_conf_t *cf, ngx_http_ Upstream_srv_conf_t *us)    {       u_char *host, *port,  c;       size_t host_len, port_len, size;        uint32_t hash, base_hash;       ngx_str_t * server;       ngx_uint_t npoints, i, j;        ngx_http_upstream_rr_peer_t *peer;       ngx_http_upstream_rr_ peers_t *peers;       ngx_http_upstream_chash_points_t *points;        ngx_http_upstream_hash_srv_conf_t *hcf;        union {           uint32_t value;            u_char byte[4];       } prev_hash;           /*  Use the Round robin upstream block initialization function to create and initialize the real node  */        if  (Ngx_http_upstream_init_round_robin (cf, us) &NBSP;!=&NBSP;NGX_OK)            return NGX_ERROR:          /*  Reset Per request Load-balancing initialization function  */       us- >peer.init = ngx_http_upstream_init_chash_peer;           Cluster  */       npoints of peers = us->peer.data; /*  real node  = peers->total_weight * 160;          /*  Create a npoints virtual node  */       size = sizeof (ngx_http_upstream_chash_ points_t) &NBSP;+&Nbsp;           sizeof (ngx_http_upstream_chash_point_t)  *   (npoints - 1);          points = ngx_palloc ( cf->pool, size);       if  (points == null)            return NGX_ERROR;           points->number = 0;          /*  Initialize all virtual nodes  */       for  (Peer = peers->peer; peer; peer  = peer->next)  {           server = The first parameter of the  &peer->server; /* server Directive, server.name */                       /* hash  expression is compatible with cache::memcached::fast:          &NBSP;&NBSP;*&NBSP;CRC32 (host 0 port prev_hash) .            */           if  (server->len  &GT;=&NBSP;5&NBSP;&AMP;&AMP;&NBSP;NGX_STRNCASECMP (server->data,  (u_char *)   "UNIX:",  5)  == 0)            {                host = server->data + 5;                host_len = server-> len - 5;               port  = NULL;               port_len  = 0; &nbsp             goto done;            }               /*  resolves the server member of each peer to host and port */            for  (j = 0; j < server->len; j++)  {                c = server->data[server->len  - j - 1];                   if  (c ==  ":")  {                    host = server->data;                

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.