Summary: For a large website, Server Load balancer is an eternal topic. With the rapid development of hardware technology, more and more Server Load balancer hardware devices are emerging, such as F5 big-IP, Citrix NetScaler, and radware, however, the high price is often prohibitive, So load balancing software is still the best choice for most companies. As a rising star of Webserver, nginx has received wide attention in the industry for its excellent reverse proxy functions and Flexible Load Balancing policies. This article will introduce the nginx load balancing policy in detail in terms of design implementation and specific application based on industrial production.
1. Preface
With the explosive growth of internet information, load balancing is no longer a strange topic. As the name suggests, load balancing distributes loads to different service units, it not only ensures service availability, but also ensures fast response, giving users a good experience. The rapid growth of traffic and data traffic has led to a wide range of Server Load balancer products. Many professional Server Load balancer hardware provides good functions, but they are expensive, this makes the Server Load balancer software very popular. nginx is one of them.
The first public version of nginx was released on December 1, 2004, and 2011 on December 31, 1.0. It is characterized by high stability, powerful functions, and low resource consumption. From its current market share, nginx has a great momentum to compete with Apache in the market. One feature that has to be mentioned is its load balancing function, which has become the main reason for many companies to choose it. This article will introduce nginx's built-in load balancing policy and extended load balancing policy from the perspective of source code, compare various load balancing policies with actual industrial production as a case, and provide a reference for nginx users.
2. Source Code Analysis
Nginx Load Balancing policies can be divided into two categories: built-in policies and extension policies. The built-in policies include Weighted Round Robin (WRR) and IP hash. By default, these two policies are compiled into the nginx kernel. You only need to specify the parameters in the nginx configuration. There are many extension policies, such as fair, General hash, and consistent hash, which are not compiled into the nginx kernel by default. As there is no essential change in the load balancing code in the nginx version upgrade, the following uses the nginx1.0.15 stable version as an example to analyze various policies from the source code perspective.
2.1. Weighted Round Robin (Weighted Round Robin)
The principle of Round Robin is very simple. First, let's introduce the basic process of round robin. The following is a flowchart for processing a request:
There are two points in the figure. First, if the Weighted Round robin algorithm can be divided into deep search and wide search, nginx uses the deep search algorithm, requests will be distributed to the high-weight machine first, and the request will not be distributed to the next high-weight machine until the machine's weight is lower than other machines. Second, when all the backend machines are down, nginx will immediately clear the flag positions of all machines into the initial State to avoid causing all machines to be In the timeout state, as a result, the entire front end is stuck.
Next, let's look at the source code. The directory structure of nginx source code is clear, and the path of Weighted Round Robin is nginx-1.0.15/src/HTTP/ngx_http_upstream_round_robin. [c | H]. Based on the source code, I added comments to important and hard-to-understand points. First, let's take a look at the important statements in ngx_http_upstream_round_assist.h:
From the variable name, we can roughly guess its function. The main difference between current_weight and weight is that the former is the weight sorting value. As the request is processed dynamically, the latter is the configured value to restore the initial state.
Next, let's take a look at the creation process of round robin, as shown in the code.
Here is a tried variable that needs to be described. Tried records whether the server has been connected. It is a bitmap. If the number of servers is less than 32, you only need to record the status of all servers in one Int. If the number of servers is greater than 32, you need to apply for memory in the memory pool for storage. For the usage of this bitmap array, refer to the following code:
Finally, there is the actual policy code. The logic is very simple, and there are only 30 lines of code to implement.
2.2. IP hash
IP hash is another built-in load balancing policy of nginx. The process is similar to polling, but the algorithm and specific policy change, as shown in:
The Core Implementation of the IP hash algorithm is as follows:
From the code, we can see that the hash value is related to both the IP address and the number of backend machines. After testing, the above algorithm can generate 1045 consecutive different values, which is the hard limit of the algorithm. Nginx uses a protection mechanism. When no available machine is found after 20 times of hash, the algorithm degrades to round robin. Therefore, in essence, the IP hash algorithm is a disguised round robin algorithm. If the initial hash values of the two IP addresses are exactly the same, requests from these two IP addresses will always be sent to the same server, which poses a deep balance risk.
2.3. Fair
The fair policy is an extension policy and is not compiled into the nginx kernel by default. The principle is to determine the load based on the response time of the backend server, and select the machine with the lightest load for shunting. This policy is highly adaptive, but the actual network environment is often not that simple, so use it with caution.
2.4. General hash and consistent hash
These two are also extension policies. There are some differences in implementation. The general hash is relatively simple and can be hashed using the built-in nginx variable as the key, consistent hash uses the nginx built-in consistent hash ring and supports memcache.
3. Comparison and Test
This test compares the balance, consistency, and Disaster Tolerance of each policy, analyzes the differences, and provides applicable scenarios accordingly. In order to comprehensively and objectively test the nginx load balancing policy, we have adopted two testing tools and conducted tests in different scenarios to reduce the impact of the environment on the testing results. First, we will briefly introduce the test tool, test network topology, and basic test flow.
3.1. Test Tool 3.1.1. easyabc
Easyabc is a performance testing tool developed internally by the company. It is implemented using the epool model. It is easy to use and can simulate get/POST requests. It can provide tens of thousands of pressure at the maximum, it is widely used within the company. Because the tested object is a reverse proxy server, you need to build a pile server on its backend. nginx is used as the pile webserver to provide the most basic static file service.
3.1.2. Polygraph
Polygraph is a free performance testing tool, which is widely used in cache service, proxy, and switch testing. It has a standard configuration language, PGL (polygraph language), providing powerful software flexibility. Shows how it works:
Polygraph provides the client and server, and places the testing target nginx between the two. The network interaction between the two is based on the HTTP protocol. You only need to configure IP + port. The client can configure the number of virtual robots and the rate at which each robot sends requests, and initiate random static file requests to the proxy server, the server generates a random size static file according to the request URL for response. This is also a test softwareMain reasons: Random URLs can be generated as keys of various nginx hash policies.
In addition, polygraph provides log analysis tools with rich functions. For more information, see relevant materials in the appendix.
3.2. Test Environment
This test runs on five physical machines, where the tested objects are independently built on one 8-core machine, and the other four 4-core machines are respectively set up with easyabc, Webserver, and polygraph, as shown in:
3.3. Test Plan
First, we will introduce the following key test indicators:
Balance: Whether requests can be evenly sent to the backend
Consistency: Can requests with the same key be sent to the same machine?
Disaster Tolerance: Can some backend servers work properly when they are down?
The preceding indicators are used for the following four test scenarios: easyabc and polygraph:
Scenario 1 server _ * provides services normally;
Scenario 2 server_4 fails, and other operations are normal;
Scenario 3: server_3 and server_4 are suspended, while others are normal;
Scenario 4 server _ *: normal services are restored.
The above four scenarios will be performed in chronological order, and each scenario will be based on the previous scenario. The tested objects do not need to perform any operations to simulate the actual situation to the greatest extent. In addition, considering the characteristics of the test tool itself, the test pressure on easyabc is about 17000, and that on polygraph is about 4000. The above tests ensure that the tested objects can work normally without any logs with notice levels or higher (alert/error/warn, in each scenario, record the QPS of server _ * for the final policy analysis.
3.4. Test Results
Table 1 and figure 1 show the load of the polling policy under two testing tools. Compared with the test results of the two test tools, the results are completely consistent, so the impact of the test tool can be excluded. From the chart, we can see that the round-robin policy can meet both the balance and Disaster Tolerance. (Click an image to view the larger image)
Table 2 and Figure 2 show the load of the fair policy under the two test tools. The fair policy is greatly affected by the environment. After the interference of the test tool is excluded, the result is still very jitters. In a straightforward sense, this is totally unbalanced. However, from another perspective, it is precisely because of this self-adaptability that ensures sufficient use in a complex network environment. Therefore, before being applied to industrial production, you need to do a good job of testing in a specific environment. (Click an image to view the larger image)
Table is a variety of hash policies. The difference is only hash key or specific algorithm implementation. Therefore, we will make a comparison. In actual tests, it is found that both general hash and consistent hash have a problem: when a backend machine fails, the traffic originally on this machine will be lost, however, there is no such problem in IP hash. As in the previous analysis of the IP hash source code, when the IP hash fails, it degrades to a polling policy, so there will be no traffic loss. At this level, IP hash can also be seen as an upgraded version of polling. (Click an image to view the larger image)
Figure 5 shows the IP hash policy, which is a built-in nginx policy and can be seen as a special case of the first two policies: the source IP is the key. Because the test tool is not easy to simulate requests from massive IP addresses, the actual online conditions are analyzed here, as shown in:
Figure 5 IP hash Policy
In the figure, the first 1/3 uses the round robin policy, the middle segment uses the IP hash policy, and the last 1/3 still uses the round robin policy. It can be seen that there is a big problem in the balance of IP hash. The reason is not difficult to analyze. In the actual network environment, there are a large number of network nodes, such as the outbound router IP addresses of colleges and universities, and the egress router IP addresses of enterprises, these nodes often generate hundreds of times more traffic than normal users, and the IP hash policy precisely divides traffic by IP address. Therefore, the above consequences will naturally occur.
4. Summary and prospects
Through the actual comparison test, we have verified the various load balancing policies of nginx. The following compares various strategies from the perspectives of balance, consistency, Disaster Tolerance, and applicable scenarios. (Click an image to view the larger image)
From the perspective of source code and actual test data, the above describes the nginx Server Load balancer policies and provides application scenarios suitable for various policies. Through the analysis in this article, it is not difficult to find that no matter which strategy is not a panacea, in a specific scenario, which strategy should be selected depends on the user's familiarity with these policies to a certain extent. We hope that the analysis and testing data in this article will be helpful to readers, and more load balancing policies will be generated.
5. References
Http://wiki.nginx.org/HttpUpstreamConsistentHash
Http://wiki.nginx.org/HttpUpstreamFairModule
Http://wiki.nginx.org/HttpUpstreamRequestHashModule
Http://www.web-polygraph.org/
Http://nginx.org/
Parsing nginx Server Load balancer