Introduction to Enterprise-class load Balancing

Source: Internet
Author: User
Tags ssl connection

Original source: loveis715

In a previous article, "Put your password in place – from the point of the Sesame financial breach," a comment was made in the comments that said "if the whole process is slow to log in, there will be a problem." Although I am quite sure of the correctness of the article, I still need to think carefully about whether I am not clear about it. In this thinking process, I think of a very worthwhile topic, that is load balancing.

In that article, we said that to securely manage passwords, the number of iterations used to calculate password hashes should be as large as possible, making the Tanzhahi calculation faster and more difficult for malicious people to crack passwords. Conversely, if there are 100 or 1000 people performing the login operation at the same time, then this tedious hash calculation will cause the landing server to "not be busy" situation. At this point, we need to use load balancing to spread the login requests across multiple landing servers to reduce the load on a single server.

Introduction to load Balancing

Perhaps some readers are still unfamiliar with the term load balancing, so let's take a little bit of space to explain what is load balancing.

In a large web site, online users can sometimes have thousands of or even tens of thousands. If a user's request requires the service to be processed for 0.02 seconds, the service instance will only be able to process 50 such requests per second and only 3,000 per minute. If the service is a very common feature for users, such as browsing a list of products on a site, it is clear that a single service instance cannot support the operation of the site. In this case, we need to scale up the service.

Expansion is mainly divided into scale up and scale out two, respectively, to enhance the service capacity of a single service and enhance the number of services. In some cases, scale up is a simpler operation, such as adding more memory to the server on which the service resides. However, the ability of any one server can actually be limited by its own physical hardware capabilities, especially the higher the performance of the server its unit service capacity is more expensive, so we need to use scale out method to spread the workload across multiple servers:

As shown, the server is in an overloaded state when the server has too much load to handle. Services in this state often appear to be slow or unresponsive. We will use more than one server to process the user's request at the same time after the scale out is executed. In this solution, we need to distribute these requests to individual servers using a specific device. The device determines how these requests are distributed based on the request distribution logic contained within it to avoid a single server overload. These devices, which are used to distribute requests, are actually load-balanced servers.

Of course, we won't wait until the server is really overloaded to solve this problem. In the daily operations of the service, we need to consider whether we need to scale up the service when the average load and peak load of the server reaches a certain threshold.

Once a service uses a load-balancing system, it will be greatly enhanced in terms of high availability and scalability. This is also the most important reason we use a load balancing solution. For example, for a load-balancing system with three servers, if one fails, the load-balancing server can learn about their exceptions by sending heartbeats to each service, and then no longer distributes tasks to this failed server:

And if the capacity of the current load balancing system is already exceeding the threshold, then we can simply solve this problem by adding servers to the load balancing system:

This reduces the burden on a single server by reducing the task that each server needs to handle.

DNS-Based load balancing

OK, after understanding the approximate composition and usage of the load balancing system, we'll look at various load solutions.

The most commonly used load balancing solutions in the industry today are divided into three types: DNS-based load balancing, L3/4 load balancing based on the network layer, and L7 load balancing, which is based on application tier load balancing. In these solutions, DNS-based load balancing is the simplest and one of the first load balancing solutions to occur.

When we access a website by typing the domain name in the address bar of the browser, the browser first looks for the local DNS cache to have the IP address for that domain name. If so, the browser will attempt to access the site's content directly using that IP address. If the local DNS cache does not have an IP address for that domain name, it sends a request to DNS to obtain the IP corresponding to the domain name and add it to the local DNS cache.

In DNS, a domain name may be bound to multiple IP addresses. In this case, the DNS response will return a list of these IP addresses in the round robin manner. For example, when you view the IP for a specific domain name multiple times through commands such as Nslookup or host, the possible return is as follows (for domestic network reasons, you need to fq the test again):

123456789 $ host -t a google.comgoogle.com has address 72.14.207.99google.com has address 64.233.167.99google.com has address 64.233.187.99$ host -t a google.comgoogle.com has address 64.233.187.99google.com has address 72.14.207.99google.com has address 64.233.167.99

As you can see, the results returned by different DNS requests are rotated according to round robin, which allows different users to access different IP addresses and balance the load on each server.

While this load-balancing solution is very easy to implement, it has a fatal disadvantage: In order to reduce the number of DNS requests to improve access efficiency, browsers often cache the results of DNS queries. If a service at one IP fails, the browser may still send requests to the unavailable service based on information logged in the DNS cache (different browsers may behave differently). Although the entire service has only one IP service failure, but from the user's point of view that the site is inaccessible. Therefore, DNS-based load balancing schemes do not provide high availability assurance as a standalone load balancing solution, but are used as complementary solutions to other load balancing solutions.

L3/4 Load Balancing

Another common load balance is L3/4 load balancing. The L3/4 actually means that the load balancing server is load-balanced based on the data contained in the third layer of the network layer and layer fourth Transport layer (Transport layer) in the OSI model. In this load-balancing server, this data primarily contains the IP header of the packet and the protocol headers for the TCP, UDP, and other protocols:

The L3/4 load Balancing server works very simply: When data arrives, the load balancing server determines and forwards the service instance that needs to process the data, based on its own algorithm and the data contained in the OSI model layer three or four.

The entire load balancing operation consists of three things: the load balancing server needs to know what the current valid service instance is, and according to its own dispatch algorithm, decides the service instance that needs to process the data, and sends the data to the target service instance according to the calculation result of the dispatch algorithm.

Let's start by looking at how the load balancing server determines the validity of the service instance. To ensure that packets distributed from a load-balanced server can be handled properly by the server cluster behind it, the load-balancing server needs to periodically send status query requests to detect which service instances are working effectively. This status query request often goes beyond the awareness of many: if the service instance crashes but the operating system that hosts it works correctly, the operating system will still respond gracefully to the ping command issued by the load-balanced server, but the TCP connection will fail at this point, and if the service instance does not crash, but just hangs, Then it can still accept the TCP connection, but cannot receive the HTTP request.

Because this status query request is actually specific to the implementation of a service instance, many load-balancing servers allow users to add custom scripts to execute queries that are specific to the service instance. These status query requests often contain many tests and even try to return data from the service instance.

Once the load-balancing server discovers that a service instance it manages is no longer valid, it will no longer forward any data to that service instance until the service instance returns to its normal state. In this case, each of the other service instances would need to share the work that the failed server had originally undertaken.

One thing to note here is that after a service instance fails, the entire system should still have enough total capacity to handle the load. For example, if a load-balancing server manages three service instances with the same capabilities, and the three service instances each have a load of about 80%. If one of the service instances fails, then all the load needs to be done by the other two service instances. Each service instance needs to assume 120% of the load, far exceeding the load capacity it has. The direct consequence of this is that the service appears very unstable and often has a system timeout and the application does not work properly.

Ok. Now, assuming that our load-balancing server has a well-designed stateful query request, it allocates the load for the working service instance based on the load-balancing algorithm it uses. The most common misconception for people who first come into load balancing is that the load-balancing server determines the service instances that the request needs to reach based on the response speed or load conditions of each service instance.

Typically, the Round Robin algorithm is the most common and best performing load balancing algorithm. If the capacity of each service instance is not the same, the load balancing server uses the weighted Round Robin algorithm, which distributes the load Lai an proportionally to the actual capabilities of each service instance. In some commercial load-balancing servers, it is true that these allocations are automatically slightly adjusted based on factors such as the load of the current service instance and response time, but they are not a decisive factor.

If you use the round robin algorithm purely, each request with an associated relationship may be assigned to a different service instance. As a result, many load-balancing servers allow these loads to be allocated based on specific characteristics of the data, such as using a hashing algorithm to compute the IP of the user and determining the service instance that needs to be allocated as a result of the calculation.

Similarly, we also need to consider the case where a server instance fails. If a server instance in the load-balancing system fails, the hash space in the hashing algorithm changes, causing the original service instance assignment result to be no longer valid. In this case, all requests are reassigned to the server instance. Also, in some cases, the user's IP may change between requests, causing the service instance that it corresponds to change. Of course, don't worry, the following explanation of the L7 load balancing server will give you a solution.

After determining the destination address of the packet, what the load-balancing server needs to do is forward it to the target instance. The load-balancing server uses a forwarding method that is divided into three main types: Direct routing,tunnelling and IP address translation.

When using direct routing mode, the load balancing server and each service instance must be on the same network segment and use the same IP. When data is received, the load-balancing server forwards the packets directly. Each service instance can then return the response to the load-balanced server after processing the packet, or it can choose to send the response directly to the user without having to load-balance the server again. The latter method of return is called Direct Server return. It works as follows:

In this process, both the load balancing server and the individual service instances do not need to make any changes to the IP (Internet Protocol) layer data to forward them. The throughput of a load-balanced server using this forwarding method is very high. In turn, this organizational approach also requires the cluster's builders to have enough understanding of TCP/IP protocols.

Another way of forwarding tunnelling is actually similar to direct routing. The only difference is that a series of channels are established between the load balancing server and the individual services. Software developers can still choose to use direct server return to mitigate load-balancing server load.

The IP Address translation is very different from the first two methods. The destination address that the user is connected to is actually a virtual address (vip,virtual IP). The load-balanced server, when it receives the request, translates its destination address into the actual address (Rip,real IP) where the service instance resides, and changes the source address to the address where load balancer resides. When the request is processed, the service instance sends the response to the load-balanced server. The load balancing server then changes the address of the response to a VIP and returns the response to the user. In this forwarding mode, the running process is as follows:

Some attentive readers will ask: in the process of message delivery, the user IP is no longer in the message, the load-balanced server when the response should be returned when the user's IP address should be restored? In fact, in this forwarding mode, the load-balancing server maintains a series of sessions to record information about each individual request that is being processed through the load-balanced server. But these sessions are very dangerous. If you set the duration of the session to longer, you need to maintain too many sessions on a high-concurrency load-balancing server. Conversely, if the duration of the session is set too short, it is possible to cause an ACK storm to occur.

Let's look at a longer session duration. Assuming that the current load-balancing server receives 50,000 requests per second, and that the load-balanced server has a session expiration time of 2 minutes, it needs to maintain 6 million sessions. These sessions consume a significant portion of the load-balancing server's resources. And at the peak of the load, the resources it consumes can grow exponentially, putting more pressure on the server.

However, setting the session duration to a shorter time is more cumbersome. This causes an ACK Storm between the user and the load-balanced server, which consumes a lot of bandwidth from the user and the load-balanced server. In a TCP connection, the client and the server need to communicate through their respective sequence number. If a session on a load-balanced server fails quickly, it is possible for other TCP connections to reuse the session. The sequence number on both the client and server side of the reused session is regenerated. If the original user sends the message again at this time, the load balancing server notifies the client that it has an error sequence number by an ACK message. And after the client accepts the ACK message, it sends another ACK message to the load-balancing server notifying the service that the sequence number it owns has an error. After the server accepts the ACK message, the ACK message is sent again to the client notifying it that it owns the sequence number error ... This will continue to send this meaningless ACK message between the client and the server until an ACK message is lost during network transmission.

So at first glance, the use of IP Address translation is easiest, but it is the most dangerous and the most cost-effective option compared to the other two scenarios.

L7 Load Balancing

Another common load balancing solution is L7 load balancing. As the name implies, it determines how the load is distributed primarily through the data in the seventh tier application layer in the OSI model.

At run time, the operating system on the L7 load-balanced server organizes the individual packets that are received into a user request and determines which service instance will process the request, based on the data contained in the request. Its running flowchart is roughly as follows:

Compared to the data used by the L3/4 load Balancing Service, the application-layer data used by the L7 load-balancing service is closer to the service itself, so it has more accurate load-balancing behavior.

As we have described in the previous L3/4 load balancing process, for some individual requests with associated relationships, the L3/4 load Balancing server determines the service instance that processes the request based on certain algorithms, such as the hash value of the computed IP. But this method is not very stable. When a service instance fails or the user's IP changes, the correspondence between the user and the service instance changes. At this point, the user's original session data does not exist on the new service instance, resulting in a series of problems.

The root cause of this problem is the fact that the association between the user and the service instance is created through some external environment, not by the user/service instance itself. Therefore, it cannot withstand the impact of external environmental changes. If you want to establish a stable correlation between users and service instances, you need a stable data transfer between the user and the service instance. In Web services, this data is a cookie.

Simply put, a cookie-based load balancing service is essentially an analysis of a particular cookie in a user's request and determines the destination address to which it needs to be distributed, based on its value. It is divided into two main ways: cookie learning and cookie insertion.

Cookie Learning is a non-intrusive solution. It determines how the load is dispatched by analyzing the cookie that the user communicates with the service instance: the load balancing Service will not find the cookie when the user communicates with the service for the first time, so it will assign the request to a service instance based on the load balancing algorithm. When the service instance returns, the load-balancing server will record the corresponding cookie and the address of the service instance in the load-balanced server. When the user communicates with the service again, the load balancing server finds the service instance that served the user the previous time, and forwards the request to the service instance, based on the data recorded in the cookie.

The biggest drawback to this is the poor support for high availability. If the load-balancing server fails, the matching relationship between the cookie and service instances maintained on the load-balanced server will be lost. This allows all user requests to be directed to a random service instance when the backup load Balancer server is started.

Another problem is the consumption of memory by the session maintenance function. Unlike session maintenance on a L3/4 server, a cookie may expire for a very long time, at least for a few hours at a time for user use. For a system that accesses tens of thousands of times per second, the load-balancing server needs to maintain very many sessions and may even deplete the server's memory. Conversely, if the cookie expiration time in the load-balancing server is set too short, when the user re-accesses the load-balanced server, it is directed to an incorrect service instance.

In addition to Cookie learning, another common method is cookie insertion. By adding a cookie to the response, it records the service instance being dispatched to, and the next time the request is processed, the service instance to which it is distributed is determined based on the value saved by the cookie. When the user communicates with the server for the first time, the load-balancing server will not find the cookie corresponding to the dispatch record, so it assigns a service instance to the request based on the load balancing algorithm. After receiving the data returned by the service instance, the load-balancing server inserts a cookie into the response to record the ID of the service instance. When a user sends a request to a load-balanced server again, it distributes the request based on the ID of the service instance recorded in the cookie.

Compared to the cookie Learning,cookie insertion does not need to maintain the corresponding relationship between the cookie and each service instance in memory, and when the current load-balancing server fails, The standby load Balancing server can also correctly distribute individual requests based on the information recorded in the cookie.

Of course, Cookie insertion also has flaws. The most common problem is the browser and the user's own restrictions on cookies. In cookie insertion, we need to insert an additional cookie to record the service instance assigned to the current user. However, in some browsers, especially in mobile browsers, the number of cookies is often limited or even only one cookie is allowed to appear. To solve this problem, the load-balancing server will also use some other methods. If a cookie is modification, an existing cookie is modified to contain the ID of the service instance.

The cookie insertion will be completely invalid when the user has disabled the cookie. At this point, the load balancing service will be able to leverage only information such as Jsessionid. Therefore, in a L7 load balancing server, cookie learning and cookie insertion are often used simultaneously: the cookie learning plays a major role when the user enables cookies, In the case where the cookie is disabled by the user, cookie learning is used to maintain the association between the user and the service instance based on the Jsessionid.

You might think that the L3/4 load balancing server, when processing each associated request, determines the service instance that processes the request through the hash value of the IP. Since these cookie-based solutions can be 100% accurate, why don't we use them in the L3/4 load balancing server? The answer is that because the L3/4 load Balancing server focuses on packet-level forwarding, and cookie information is hidden in the packet, it is difficult for the L3/4 load-balancing server to determine how a single packet should be forwarded.

For example, when performing a cookie insertion operation, all data in the original packet will be moved back. The load balancing server needs to receive all the packets at this point before it can be completed:

Imagine what might happen to receive all the packets. When sending multiple packets at one end of the network, the order of the packets received at the other end of the network may not be consistent with the original order of delivery. Even when the network is congested, some packets may be lost, and the time required to receive all the packets again is extended.

Therefore, it is very poor to wait for all packets to snap and then insert the cookie compared to the method of forwarding the packet directly. As you'll see in the following explanation of the solution, the performance requirements of the L3/4 load balancing server are generally high, while the L7 load balancing server can solve its own performance problems with a single cluster. DNS-based load balancing, L3/4 load Balancing servers and L7 load balancing servers often work together to form a highly available and highly scalable system.

SSL Farm

In the above explanation, we overlooked one thing, that is, the L7 load Balancing server for SSL support. In the L7 load balancing server, we often need to read and write requests and cookies in response. However, if the communication uses an SSL connection, the L7 load balancing server will not be able to read and write the contents of the request and response.

One solution that has been used to solve this problem is to use a load-balanced server in reverse proxy mode. In this scenario, the load-balancing server owns the service's certificate and can decrypt the request through the key in the certificate. When decryption is complete, the load-balancing server can begin to attempt to read the contents of the cookie and determine the service instance to be distributed to the request based on the information it records. When the request is distributed, the load-balancing server can no longer use the SSL connection, which makes it unnecessary for each service instance to decrypt the request again and improve the operational efficiency of the service instance.

After the request has been processed, the service instance returns a response through the service instance and the non-SSL connection to the load balancing server. After the load-balancing server receives the response, it encrypts the response and emits it over an SSL connection:

The problem with this, however, is that if all the processing of SSL is concentrated on the L7 load balancing server, it will become a bottleneck for the system. The way to bypass this problem is to use a series of reverse proxies to handle the codec operation of SSL before L7 the load balancing server.

At this point the architecture of the entire system renders the following hierarchy:

As you can see, the entire solution is divided into four tiers. When a user's request arrives at the first tier of a load-balanced server, it forwards the request to a second-tier reverse proxy that is specifically responsible for the SSL codec work, based on its own load-balancing algorithm. The agent will send incoming requests transmitted by the SSL connection by a non-SSL connection. When a request reaches the third tier, the L7 load balancing server can directly access the cookie contained in the request and determine the service instance that needs to process the request, based on the content in the cookie.

There are many advantages to doing so. The first is that these reverse proxies are very inexpensive, and even about 1/20 of the price of a common load-balancing server, they have almost the same efficiency in handling SSL connections. In addition, these reverse proxies provide very good scalability and high availability. Once the load-balancing system is struggling with the ability to handle SSL connections, we can add new reverse proxies to the system at any time. And once one of the reverse proxies fails, the other reverse proxies can guarantee the safe operation of the system by assuming more load.

Issues to consider

Before proposing a specific load balancing solution, we need to start by explaining some of the things we need to consider when designing a load balancing system.

The first thing to say is to pay attention to the high availability and scalability of the load balancing system when it is designed. In the first explanation, we've already mentioned that by using load balancing, services made up of many server instances are highly available and scalable. When one of the service instances fails, other service instances can help it share part of the work. While the total service capacity seems a bit tense, we can add new service instances to the service to extend the total capacity of the service.

However, since all data transfers require a load-balanced server, once the load-balancing server fails, the entire system will become unusable. In other words, the availability of a load-balanced server affects the high availability of the entire system.

The method to solve this problem is discussed based on the type of load balancing server. For L3/4 load-balanced servers, a common approach in the industry is to use a pair of load-balanced servers in the system in order to make the entire system non-invalidated. When one of the load-balancing servers fails, the other can also provide load balancing services for the entire system. This pair of load balancing servers can be used in active-passive mode or in active-active mode.

In active-passive mode, a load-balanced server is in a semi-dormant state. It detects the availability of the other by sending a heartbeat message to another load-balancing server. When a working load-balancing server is no longer responding to a heartbeat, the heartbeat app wakes the load-balanced server from the half-sleep state, takes over the IP of the load-balanced server, and begins to perform load balancing.

In active-active mode, the two load-balanced servers work simultaneously. If one of the servers fails, then the other server will assume all the work:

It can be said that both have their merits. In contrast, the active-active mode has a good resistance to large fluctuations in traffic. For example, under normal circumstances, the load of two servers is around 30%, but in the peak hours of service use, the traffic may be twice times the usual, so two server load will reach about 60%, is still within the scope of the system can be processed. If we are using the active-passive mode, then the load will reach 60%, while the load at peak time will reach 120% of the load-balanced server capacity, so that the service cannot process all user requests.

Conversely, the active-active model also has a bad place, which is prone to management negligence. For example, in a system that uses active-active mode, the load of two load-balanced servers is around 60% all year round. Then once one of the load-balancing servers fails, the only remaining server will be unable to process all user requests.

Perhaps you will ask: L3/4 load balancing server must have two? In fact, mainly by the load balancing server products themselves to determine. As we've already said before, actually probing the availability of a load-balanced server actually requires complex test logic. So if we use too many L3/4 load balancing servers in a load-balanced system, the various heartbeat tests sent between these load-balanced servers consume a lot of resources. And because many L3/4 load-balancing servers are inherently hardware-based, they can work very fast and even meet the processing power that matches the network bandwidth that they support. Therefore, in general, the L3/4 load Balancing server is used in pairs.

If the L3/4 load balancing server is really close to its load limit, we can also distribute requests through DNS load balancing:

This approach can not only solve the problem of extensibility, but also take advantage of one of the features of DNS to improve the user experience: DNS can select the closest server to the user based on the region of the user. This is particularly effective in a global service. After all, a Chinese user visiting a service instance in China is much faster than accessing a service instance built in the United States.

In turn, because L7 load-balancing servers are primarily software-based, many L7 load-balancing servers allow users to create more complex load-balanced server systems. For example, define a set of L7 load Balancing servers with two enabled and one standby.

After explaining the high availability, let's introduce the scalability of the load balancing server. In fact, as we've just described, the L3/4 load Balancing server has high performance, so the load-balancing system used by the General Service does not experience the need for scalability. However, once there is a need to expand, the use of DNS load balancing can achieve good scalability. L7 load Balancing is more flexible, so scalability is not a problem.

However, a load-balancing system cannot be made up of L3/4 load-balanced servers, or only by L7 load-balanced servers. This is because both have a very big difference in both performance and price. A L3/4 load balancing server is actually very expensive and often reaches tens of thousands of dollars. The L7 load balancing server can be built with inexpensive servers. L3/4 load-balancing servers often have very high performance, while L7 load-balancing servers often achieve higher overall performance by composing a cluster.

One more thing to consider when designing a load-balancing system is the static and dynamic separation of services. We know that a service is often made up of both dynamic and static requests. These two requests have very different characteristics: A dynamic request often requires a lot of computation and the data is not often transmitted, while a static request often requires a large amount of data to be transferred without much computation. Different service containers have a significant difference in the performance of these requests. As a result, many services often divide the service instances they contain into two parts to handle static and dynamic requests, and to provide services using the appropriate service container. In this case, the static request is often placed under a specific path, such as "/static". This allows the load-balancing server to forward dynamic requests and static requests appropriately, depending on the path to which the request is sent.

The last thing to mention is a software implementation of LVS (Linux Virtual server) for the L3/4 load balancing server. Compared with the hardware implementation, the software implementation needs to do a lot of extra work, such as decoding the packet, allocating memory for processing packets, and so on. Therefore, its performance is often only 1/5 to 1/10 of the L3/4 load balancing server with the same hardware capabilities. Given its limited performance but low build-up costs, such as the use of existing machines that are idle in the lab, it is often used as a temporary alternative when the service is not very large.

Load Balancing Solutions

At the end of the article, we will give you a list of common load balancing solutions for your reference.

In general, the load of a service is often increased by some means. Correspondingly, the load-balancing systems that these services have are often evolved from small to large. As a result, we will introduce these load balancing systems in a gradual fashion from small to large.

The first is the simplest system that contains a pair of L7 load Balancing servers:

If the load of the service increases gradually, the only L7 load balancing server in the system can easily become a bottleneck. At this point we can solve the problem by adding an SSL farm and a server running LVS:

If we are also dealing with increased load, then we need to replace LVS with real hardware-based L3/4 load balancing servers and increase the capacity of each tier:

Since the underlying three layers of the solution are theoretically infinitely scalable, the most likely overload is the top L3/4 load balancing server. In this case, we need to use DNS to allocate the load:

Http://blog.jobbole.com/96952/?utm_source=top.jobbole.com&utm_medium=relatedArticles

Introduction to Enterprise-class load balancing (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.