Those algorithms that load-balance

Last Update:2016-06-12 Source: Internet

Author: User

Tags message queue

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Last week sent a questionnaire to see if you have any suggestions for Lao Wang, and then a lot of friends voted to understand the programming technology and server architecture of dry goods, so the next will talk about programming and architecture-related algorithms, And then maybe in late June, I'll talk to you about the interview (Lao Wang has attended about hundreds of interviews and can talk about the interview from the interviewer's point of view). Lao Wang Chat technology has a characteristic, is never impracticality, only to stick to fly. So, talk about things will be related to the actual, we may also be useful in peacetime.

We're talking to everybody today. Some algorithms related to load balancing. Lao Wang in Baidu (estimated 5-6 years ago), wrote a common base library (do not know if there is no department in use), to do different systems load balancing. Too details of the east estimate can not remember, but the basic algorithm to do with you to share.

The first question: What ' s load-balance?

Let's say I have two modules (or two systems): Module-a and module-b,a depend on B for service. When the user requests to come over, a will go to request B, let B according to the request to do some processing (such as: according to the word ID to check the corresponding word), after the completion of the results returned to A,a to deal with this result. However, in order to ensure that the service is stable, it is possible that B service has many machines, a encounter this time is puzzled: I should go to the B of which machine to fetch data?

One of the most common case is nginx: for example, our web logical server is jetty or tomcat, there will be more than one, Nginx will need to configure these multiple machines:

upstream simplemain.com {     server  192.168.1.100:8080;     server  192.168.1.101:8080;}

So how are these machines chosen? is actually the load balancing algorithm.

Lao Wang's understanding of load balancing should include two levels:
1, load: Is the back-end system load-carrying capacity. For example, under the same conditions, a 1-core cpu-1g memory machine's load capacity is generally worse than the 8-core cpu-8g memory machine, under the same configuration, a CPU utilization of 80% of the machine than 30% of the load capacity is generally poor and so on.
2, Equalization: To ensure the balance of back-end requests. For example: In the same case, the allocation of multiple machines to the same request, in some cases, the same user as far as possible to allocate the same machine and so on.

Therefore, the load balancing algorithm is actually to solve the cross-system calls, in consideration of the back-end machine load condition, to ensure the balance and reasonable request allocation.

The second question follows: Why?
Why load Balancing?
1, obviously, if we do not consider the back end of the load-bearing situation, it is possible to crush a machine directly (such as CPU utilization is already 80%, and then give a large number of requests directly to dry dead), more serious will directly cause an avalanche (a pressure to die, the corresponding request to overwhelm the other machine, and a dry dead one ...). ), resulting in service paralysis.
2, if we choose the balance algorithm is not good, it will lead to waste of the backend resources. For example, if you choose a consistent hash algorithm, you can use the cache's capacity well. At random, it is possible to compromise the cache effect (almost identical content is cached on each machine).

Therefore, load balancing should be a better choice.

Let's solve the third problem: how?
According to the previous idea, we are divided into two parts: Load & balance.

1, first look at the load algorithm:
Since we want to solve the load-carrying capacity of the backend system, we have many ways, the following are common:
A, simple and rough effective: manual configuration!
Do people think this sounds like a cottage? Not really. This is the most effective and stable way for small and medium-sized systems. We are most aware of the performance configuration of the backend machine, what services are deployed above, and how much load capacity is available. That we in the configuration, we can clearly tell the caller, you can only allocate how much pressure on a server, more not!

For example, we often see Nginx configuration:

upstream simplemain.com {     server  192.168.1.100:8080 weight=30;     server  192.168.1.101:8080 weight=70;}

That is, although there are two backend servers, but their load capacity is not the same, there is a stronger ability, we give him 70% of the pressure, there is a weaker, we give him 30% of the pressure. In this way, Nginx will allocate more pressure to the second one.

The configuration is simple and stable, and basically does not generate the jitter allocated. However, the problem is that the distribution is very fixed and cannot be adjusted dynamically. If you have a back-end server with performance jitter for some time (for example, if there are other services that are disturbing the steady operation of the machine, causing CPU utilization to increase over time), it is difficult for front-end callers to redistribute the request pressure according to the actual situation. Therefore, the second method is introduced.

B, dynamic adjustment.
This scenario contrasts the current state of the machine and the historical average, and discovers that if the current state is worse than history, the number of requests is reduced dynamically. If it is better than history, then you can continue to increase the pressure on the request until a balance is reached.

How do you do it specifically?
First, at the beginning of the access, we can calculate the response time of all machines to the request and calculate an average. For faster-responding machines, we can allocate a few more requests. If the request is too much to slow the response, this time it will gradually be flat with the other machines, indicating that the machine has reached a corresponding balance.

Then, when the access balance is reached, the average response time of the machine can be counted. If a response request slows down (and is slower than any other machine), you can reduce the allocation of the request and transfer the pressure to the other machine until all the machines have a balance of the whole.

Does this kind of plan look very high? His advantage is that it can dynamically balance the processing power of the back servers. However, there are two sides to everything. This scenario can cause a system avalanche if it encounters extreme conditions! When a machine has a brief network jitter, his response can be slow, and at this point the front-end service will assign his request to other machines. If you allocate a lot, it can cause some machines to respond slowly. Then the request of these machines is assigned to another ... So, those diligent machines will be crushed to death by these requests.

So, a better solution would be to combine the two. On the one hand, static configuration of the load of a range, more than the largest throw away; On the other hand, dynamic monitoring of the back-end machine response, make small-scale request adjustment.

2. Equalization algorithm
The equalization algorithm primarily addresses how requests are sent to backend services. The following four algorithms are often used: random, Rotation (round-robin), consistent hash (Consistent-hash), and master standby (Master-slave).

For example: When we configure nginx, we often use such a configuration:

upstream simplemain.com {     ip_hash;     server  192.168.1.100:8080;     server  192.168.1.101:8080;}

This configuration is done by IP hash algorithm, and then assigned to the corresponding machine.

Let's take a closer look at how these algorithms work.

A, random algorithm.
As the name implies, in the selection of back-end servers, the use of a random method. Before we go into this algorithm, let's take a look at an example where we write the following C code:

#include<stdlib.h>#include<stdio.h>int main(){        srand(1234);        printf("%d\n"rand());        return0;}

We use the Srand function to sow a 1234 seed to the random algorithm, then we go to random numbers, then we compile and link gcc rand.c-o rand

Ideally, every time we run Rand this program, we should get a different result, right. But......

You can see that every time we run the results are the same!! What's the problem?

We're talking about random, a pseudo-random algorithm is commonly used in computer algorithms. We will first put a seed to the algorithm, and then according to a certain algorithm to take the seed to calculate, and finally get a so-called random value. We make a small change to the above algorithm, change 1234 to Time (NULL), the effect is different:

#include<stdlib.h>#include<stdio.h>#include<time.h>int main(){        srand((int)time(NULL));        printf("%d\n"rand());        return0;}

Time this function takes the current number of seconds, and then puts the value as a seed into the pseudo-random function, so that the calculated pseudo-random value will vary depending on the number of seconds.

Take a look at how the Java source code is implemented. The Java random class that we use is the Java.util.Random class. He provides two constructors:

 Public Random() { This(Seeduniquifier () ^ system.nanotime ());} Public Random(LongSeed) {if(getclass () = = Random.class) This. Seed =NewAtomiclong (initialscramble (Seed));Else{//subclass might have overriden setseed         This. Seed =NewAtomiclong ();    Setseed (seed); }} We can see that this class also needs a seed. Then we call the next function when we get the random values: Protectedint Next (intBITS) {LongOldseed, Nextseed; Atomiclong seed = This. Seed; Do{oldseed = seed.Get();    Nextseed = (Oldseed * multiplier + addend) & mask; } while(!seed.compareandset (Oldseed, nextseed));return(int) (Nextseed>>> ( --bits));}

This function uses the seed to perform an operation and then gets the random value. So, we seem to have a random algorithm that is actually related to time and is related to the algorithm's operation. It's not really random.

Well, the word to the point, we use random algorithm how to do the request balance? For example, the Nginx configuration we had before:

upstream simplemain.com {     server  192.168.1.100:8080 weight=30;     server  192.168.1.101:8080 weight=70;}

We have two machines, each of which requires 30% and 70% of the pressure, then our algorithm can write (pseudo code):
BOOL res = ABS (rand ())% < 30
What do you mean by this sentence?
1. We first generate a pseudo-random number: rand ()
2. Convert this pseudo-random number to non-negative: ABS (rand ())
3. Take this number to modulo 100 and convert the value to [0,100] half open half-closed interval: ABS (rand ())% 100
4, see if this number falls into the first 30 number of intervals [0,30]: ABS (rand ())% < 30
If the random is uniform, they fall to [0,100] This interval must be uniform, so as long as in the [0,30] This interval, we will be distributed to the first machine, otherwise we will be distributed to the second machine.

In fact, here is just a way, there are many other methods, we can think about.

Random algorithms are the most common and most commonly used algorithms, most of which use him. First, in terms of probability, it can ensure that our request is basically scattered, so as to achieve the equilibrium effect we want, and secondly, he is stateless, does not need to maintain the last selection state, does not need a balance factor and so on. Overall, convenient and affordable and useful, we have been using him!

B, rotation algorithm.
The rotation algorithm is like counting the numbers (123-123-123 ...). ), and each of them came round.

upstream simplemain.com {     server  192.168.1.100:8080 weight=30;     server  192.168.1.101:8080 weight=70;}

Or this configuration, we can do this (for convenience, we call the first machine A, the second one is B):
1, we first give two machines a sorted array: array = [ABBABBABBB]
2, we use a counting pointer to indicate the position of the array now: IDX = 3
3. When a request comes, we select the machine that corresponds to the pointer, and the pointer adds one to the next position.
So, 10 requests, we can guarantee that there are 3 must be a, 7 must be B.

The rotation algorithm is also used in practice, but is stateful because it maintains the IDX pointer. We often replace them with random algorithms.

C, consistent hashing algorithm.
This algorithm is the most to discuss, the most research, the mystery of the strongest algorithm. Lao Wang just knew this algorithm, also spent a lot of thought to study him. Search on Baidu: "Consistent hash", there are about 3.21 million articles related.

Everyone to search the online algorithm, will generally say [0,232] all the integers to a circle, and then the unique code of your machine (such as: IP) through the hash of the integer is also projected on the circle (Node-a, Node-b). If a request comes, the integer that the unique code of the request (for example: User ID) is calculated by the hash algorithm is also projected onto the circle (Request-1, Request-2), and the first corresponding machine is found by clockwise direction. Such as:

At that time Lao Wang read these articles also feel very reasonable, but after a period of time forget ... Self-pondering for a period of time, and constantly ask yourself, why do you want to do so?

After a long time, Lao Wang has some experience. In fact, a consistent hash solves two problems:
1, the invariance of the hash: is the same request (for example: the same user ID) as far as possible into a machine, not because of time and other reasons, fall into different machines;
2, the dispersion after the exception: when some machines break down (or increase the machine), originally fell to the same machine request (for example: User ID 1,101,201), as far as possible scattered to other machines, do not fall into another machine. This minimizes impact and impact on the system.

With the above two principles, this code is very well written. For example we can do this (assuming the requested user id=100):
1, we will this ID and all the service IP and port stitching into a string:

str1 = "192.168.1.100:8080-100"str2 = "192.168.1.101:8080-100"

2, hash these strings, and then get some of the corresponding integers:

hashhash(str2)

3, to these integers do from the big to the small sort, selects the first one.

OK, now let's see if our algorithm fits the two principles we said before.
1, the invariance of the hash: it is obvious that the algorithm is reentrant, as long as the input, the result is certainly the same;
2, the dispersion after the exception: When a machine is broken down, the original line to the first of these machines are replaced by the second one. As long as our hash algorithm is decentralized, then the machines that get to the second position are scattered.

So, this algorithm can actually achieve the same goal. Of course, you can write the same effect of the algorithm a lot of many, we can also self-pondering. The most fundamental is to meet the above-mentioned principle.

The most common scenario for a consistent hash algorithm is to assign the cache service. To cache one user's data on a fixed server, we basically do not have to cache the same data on multiple machines, which can be very helpful for us to improve cache utilization.

But the coin has two sides, the same hash is no exception. When a machine is out of trouble, the cache on this machine fails, and the request that was previously overwhelmed by the machine is pressed onto the other machine. Since the other machines did not have the cache of these requests, it is possible to directly press the request to the database, resulting in a sudden increase in the database pressure. If the pressure is high, it is possible to crush the database directly.

Therefore, when considering the use of a consistent hash algorithm, it is important to estimate if a machine is down, the back-end system can withstand the corresponding pressure. If not, it is recommended that you waste a bit of memory utilization using a random algorithm.

D, the main preparation algorithm.
The core idea of this algorithm is to put the request as far as possible on a fixed machine service (note here is as far as possible), while other machine services are used to do backup, if there is a problem, switch to another machine service.

This algorithm is relatively different, but it is used in some special cases. For example, I have more than one message Queue service, in order to ensure the timing of the submission of data, I would like to put all the requests as far as possible to a fixed service, when this service problems, and then use other services.

How do we do that? The simplest way to do this is to make a hash of each machine's ip:port and then sort the order from large to small, and the first one is the result we want. If the first problem arises, then we take the second one: Head (sort (hash ("Ip:port1"), hash ("Ip:port2"), ...)

Of course, there are other ways. For example: Lao Wang does the naming service uses a centralized lock services to determine the current primary server and lock him up.

Well, the algorithms related to load balancing are generally said to be so much. In fact, there is a related topic did not say, is the health check. His role is to perform survival and health checks on all services to see if it is necessary to provide a choice for load balancing. If there is a problem with the service of a machine, the health check will remove the machine from the list of services, so that the load balancing algorithm cannot see the machine. This is guaranteed for load balancing, but not in his system. However, there is also a view that this can also be counted in the load balancing algorithm. Because the implementation of this algorithm is actually more complex, the old Wang will not talk about this algorithm, you can put in the next article to analyze.

Those algorithms that load-balance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More