Analysis of LVS system structure

Source: Internet
Author: User

Problem

Today, whether in the corporate network, park network or on the WAN such as the Internet, the development of business volume is beyond the most optimistic estimates; at the same time, users continue to pursue higher machine performance, and upgrade a single server system, often resulting in too much input and maintenance costs, significantly lower than expected price. All this, not only on the hardware, but also on the software platform for a higher demand: scalability : A good scalability of the computer system can enable performance as the cost of increasing the linear growth. And it's easy to streamline or expand it. 24x7 Availability Support: A rigorous business environment requires transparent, automated, and adaptable availability support for the hardware and software systems to ensure 24x7 system uptime. Manageability : systems can be large, but easy to manage. relative cost/performance advantages: The construction of such a system, in the cost of input is economical, easy to tailor the appropriate price target system according to the specific requirements.

Back to the top of the page

Solve

Through the analysis we know that the high scalability of the cluster system in cost and benefit is the effective way to solve this problem. With a relatively low overall cost of computer clustering, you can achieve strong performance that a single system cannot provide. Here we propose systems with the following characteristics for the application of the Internet: high scalability, high availability performance

It may be called the "three-high" system.

Back to the top of the page

Introduction to LVS architecture

The Linux Virtual Server project is a developer of Dr. Zhangwensong (open source and Linux kernel). Founder and main developer of the famous Linux cluster Project--lvs (Linux Virtual Server). He is currently working in the national parallel and distributed processing key laboratory, mainly engaged in cluster technology, operating system, object storage and database research. He also spends a lot of time on the development of free software and takes it as a delight. ) hosted a well-known open source project, a solution to achieve the "three-high" system. LVS is designed to address the growing problem of fast-growing web commerce: How to maximize the potential service performance of a Web site with limited capital investment.

LVS is a software tool under a Linux platform. Through LVS, you can quickly and conveniently build a cluster system with the fourth Layer load balancing function. And, with the help of Third-party toolkits, you can also implement functional extensions to the LVS cluster for availability support. First let's look at the LVS architecture schematic:
Figure I: three-layer architecture schematic diagram of LVS

From the above figure, we can see that the abstract architecture of LVS is divided into three levels. The first layer is the load balancer, which is the only portal to the cluster. From the client's point of view, cluster through this layer of service as a single system image (SSI) based on IP address, the entire cluster share this virtual address, through which the client can see the entire cluster as a stand-alone host system with a legitimate IP address, all access to the client is sent to this virtual IP address.

But we also find that if there is only one load balancer, it is easy to cause the load balancer to become a single point of failure of the cluster, making it the most vulnerable part of the cluster. Therefore, it is necessary to provide fault-tolerant mechanism, which can be automatically detected and smoothed when the load balancer is invalidated, which is called ha technology. In the structure of the above diagram, there is a node that runs as a backup balance to monitor the running state of the load balancer in real time and respond according to the detected state: Alarm, takeover and recovery. Specific details will be discussed in the HA section.

The second tier is the server farm that provides the actual service. After the service request sent by the client is processed by the equalizer, it is forwarded to the service pool by the specific server responding to the request and returning the data. We usually provide Web services, FTP services, or video-on-demand services on the service node pool. Because a single system cannot cope with spikes worth of data access, sharing these loads across multiple servers is more economical and feasible.

Server nodes are also likely to have a temporary failure, especially when the node provides a variety of services, the system's random failure or the external environment of the mutation may cause a service temporarily unavailable. Therefore, the fault-tolerant mechanism extended by load balancing should be able to recognize this error and deal with it in time. Similarly, when the error is excluded, the cluster can automatically recognize the recovery event, and the good nodes are brought back into the cluster to continue running.

The third layer is the storage Service system, which provides a stable and consistent file access service for the entire cluster operation. This layer, as an extension of the LVS cluster, can provide a single file system portal for the cluster node pool. That is, the same root (/) is shared on each service node, and the underlying functions, such as file locking, load balancing, fault tolerance, content consistency, read-write transactions, etc., are automatically completed by different node Access file systems. Provides a transparent file access service to the application tier.

The LVs cluster belongs to loosely coupled cluster system. Because LVS implements SSI on the IP layer, there is no need to deploy a special middleware layer or OS extension in the cluster, which is better for server node OS compatibility. For the internal node of the deployment LVs, it is basically compatible with most IP applications, do not need to do complex porting and installation work, each internal node can be considered as a relatively independent server system. Even on the load balancer, the core function of Ipvs is transparently provided to the user space, which does not affect the normal network application of the machine.

In fact, there are many technologies that can implement such a system in reality. By means of load balancing at a certain level, the network requests are fragmented and shared by a large number of cluster service nodes to achieve a cluster technology of maximizing performance.

Back to the top of the page

Load Balancing technology

In fact, load balancing is not a traditional "equilibrium", in general, it is only the possibility of congestion in a place of the load to many places to share. If you renamed it "Load sharing", perhaps better understand some. In layman's terms, the role of load balancing in the network is like taking turns on the duty system, giving the task to everyone to complete so as not to make a person sweat. However, the balance in this sense is generally static, which is the predetermined "rotation" strategy.

Different from the rotating duty system, dynamic load balancing analyzes the data packet in real time through some tools, grasps the data traffic condition in the network, and assigns the task rationally. Structure is divided into local load balancing and regional load balancing (global load Balancing), the former refers to the local server cluster load balancing, the latter is to be placed in different geographical location, in different networks and server clusters for load balancing.

In a load-balancing system, each service node runs a separate copy of the required server program, such as Web, FTP, Telnet, or e-mail server programs. For some services, such as those running on a Web server, a copy of the program runs on all the hosts in the cluster, while Network Load balancing allocates the workload among those hosts. For other services, such as e-mail, only one host handles the workload, and for these services, Network Load Balancing allows network traffic to flow to a host and move traffic to other hosts when the host fails.

Back to the top of the page

Load balanced implementation structure

Overall, load balancing provides a cheap and efficient way to extend server bandwidth and increase throughput, enhance network data processing capabilities, and improve network flexibility and availability on top of existing network architectures. The main tasks are as follows: Solve the network congestion problem, provide the nearest service, realize the geographical independence to provide users with better access quality improve the server response speed and improve the efficiency of server and other resources to avoid the single point failure in the network key parts

For the load balancing technology of such a network, we start with the different implementation levels of the network, and analyze the specific performance bottleneck. From the point of view of client application, load balancing technology can be divided into client load balancing technology, Application server technology, High level Protocol Exchange, network Access Protocol exchange and other different levels of implementation methods:
Level of load Balancing

At present at each level there are a large number of technology to achieve the main function of load balancing, its advantages and disadvantages are different, and for us to understand the purpose of LVS, only need to care about the network Access Protocol load balancing technology. This level of load balancing technology is characterized by high execution efficiency, because the underlying protocol can be deployed through the hardware system, or it can be implemented at the core level of the OS. Strong compatibility, access protocols can often be compatible with most of the existing mainstream network applications, such as the IP layer in the IPV4 system. The system realizes relatively simple, it does not need the complex pattern matching mechanism, it is mainly through the port mapping for the data exchange, the rule is simple, compared with the content based high-level exchange.

Next, we will analyze the LVS framework and implementation method based on load balancing technology.

Back to the top of the page

IP load Balancing technology of LVS

Fundamentally, the implementation of LVS is based on IP switching, which is the aforementioned access Protocol switching technology. But the LVS system structure has certain expansibility, can realize the high performance, the high scalability, the easy management and so on many characteristics, becomes a load balanced as the core real meaning cluster system.

First we understand the LVS load balancing model, there are three kinds: Address translation (NAT), IP tunneling (IP tunneling) and Direct Routing (DR) model.

Address Translation mode Nat
NAT structure diagram and NAT packet processing process



We see that the network structure of NAT is presented as a private network structure similar to a firewall, with the middle dotted line representing the network isolation zone. The service node pool is isolated from the Internet through an internal IP address. The service node can not communicate with the client directly, both the request data and the response data need to go through the load Balancer for IP packet processing.

The main work of NAT is to rewrite the source and destination address information of IP packets, so that the request data sent to the VIP is rewritten to point to the internal host, and the same internal response data is rewritten by the load balancer, and the VIP is sent to the requester as the source address. This model is also called network address translation (also known as IP address camouflage), we are in the proxy server, Iptables, transparent gateway applications, are used to this model, it can be said that this is a relatively mature technology.

Due to the use of NAT, the task of rewriting the header address of the network packets entering and out of the cluster will affect the performance of the whole cluster when the load is heavier, and the load balancer becomes the bottleneck easily.

IP tunneling Mode ipip
IPIP structure diagram and IPIP package processing process



Ipip mode is an open network structure, the service node has a legitimate Internet IP address, you can send the answer packet directly to the client through the routing path. Therefore, the load balancer handles only the request packets that enter the cluster, and the return package does not pass through the router. Therefore, this pattern is referred to as a single-work connection mode (single direction connection mode). The connection of Load Balancer and service node can be LAN or on different network, only need to ensure that load balancer can send IP packet to service node.

The load balancer receives the client's request package, and IPIP the IP packet through the protocol, forming a new IP packet with the selected service node as the destination IP, and the original IP packet data is encapsulated in the new IP packet. After the service node receives the IPIP data from the equalizer, the packet is untied and the processing result is returned directly to the client according to the client address (source address), and the source address of the reply packet becomes the VIP of the virtual address of the cluster.

Ipip mode technology is also reflected in other areas, because the IP is encapsulated, the entire process is still transparent to the application layer. The PPTP protocol is an application of IP tunneling protocol. However, IPIP is currently only implemented on Linux systems. The protocol must open device option support in kernel, tunel the VIP through the device, and the service node can construct the reply package as the VIP source address when returning the reply data.

Direct Routing mode DR
Dr Structure diagram and Dr packet processing process



As with the IPIP mode, Dr Mode is a single-work connection, and the response data is returned directly to the client without the equalizer. The service node must also have a legitimate IP address that can reach the client. Also, in DR Mode, the load Balancer and service node must be in the same network segment.

After receiving the client request, the load balancer selects the appropriate service node, overwrites the MAC address part of the request packet, makes it the MAC address of the destination service node, and broadcasts the packet to the network segment where the equalizer is located. Since each service node has a virtual network enclosing device (can be dummy0 or lo:0), these devices bind the same VIP as the equalizer, except that the device does not respond to the VIP's rap parsing and does not conflict with the VIP address of the equalizer. When the load balancer receives the IP packet that accords with its own Mac, it returns the reply data to the customer directly after processing, and the source address is still the VIP at this time. In this way, in the client's view, the access and acceptance of the response is always the VIP address of the cluster.

Back to the top of the page

Comprehensive comparison

Although LVS supports three load balancing modes, from the above analysis we find that, according to the Load Balancer processing IP packet access mode, LVS actually contains two models: Simplex processing and duplex (two-way connection mode of operation) processing. Obviously, the NAT address translation mode belongs to the Duplex connection processing, in this mode, the load balancer not only needs to deal with the IP packet that enters the cluster, but also handles the reply IP packets returned by the cluster's internal nodes, and a user must pass through the core load balancer of the cluster to receive the response from the access request. So called duplex connection processing. In the other two modes, the load balancer only deals with the IP request packets entering the cluster, while the response data of the cluster's internal nodes are no longer returned to the client through the load balancer, but are sent directly to the destination via the node to the client routing channel. Because the equalizer handles only one IP request part of a full connection, and the answer data for the IP is not processed, it is called a single work connection mode.

What are the disadvantages of comparison? You know, in today's web world, most of the network requests are small, nothing more than a URL page request, get or post form, which is some instructions, etc., which are basically hundreds of to a few K bytes. It's easy to deal with such an IP packet. On the contrary, the answer data in the Web is usually very large, a common Web page also want dozens of k, not to mention that if the return of video, audio streaming, coupled with the increasingly crazy network downloads, even the powerful processor can not withstand such a large number of IP packet processing work.

Therefore, in IP load balancing, if duplex mode (NAT) is used, not only does the request to enter the cluster be processed (overwriting the IP packet's source, destination address), but also the large amount of data returned by the service node does the same job. Then, with the growth of Cluster service node pool, the processing ability of load balancer will soon reach saturation, and also greatly affect the scale scalability of LVS cluster. With IPIP or Dr Mode, the load balancer only needs to handle a relatively small number of IP packets, for a large amount of return data, by the service node through routers, switches and other devices directly back to the client. Therefore, in terms of scale scalability, the single work model has the advantage of scalability.

The existence of things always has his reason. The original design of these three load models, the author must have their own shortcomings. NAT is not a good-for-nothing. NAT is weaker than the other two models in performance, but cluster nodes support more operating systems and are relatively high in network security. The following are the three comparisons in different ways:

Comparison of three modes

NAT Mode Ipip Mode Dr Mode
Requirements for service nodes The service node can be any operating system. Service node must support IP Tunneling protocol, currently only Linux Service node supports virtual network card device, can disable ARP response function of this device
Network Requirements A local area network with a private IP address A LAN or WAN with a legal IP address LAN with legal IP address, service node and equalizer must be in the same network segment
Number of nodes usually supported 10~20, the processing ability of the test equalizer is determined Higher, can support to 100 service nodes Higher, can support to 100 service nodes
Gateway The Equalizer is the gateway to the service node. The service node is connected to its own gateway or router without the equalizer The service node is connected to its own gateway or router without the equalizer
Service Node Security Good, the use of internal IP, service node concealment Poor, with public IP address, node completely exposed Poor, with public IP address, node completely exposed
IP requirements Only one legal IP address is required as a VIP In addition to VIP, each service node must have a legitimate IP address, can be routed directly to the client In addition to VIP, each service node must have a legitimate IP address, can be routed directly to the client

To sum up, when we choose LVS as a cluster load balancing solution, we first determine which IP load balancing structure to use according to the situation of the single application environment. If you only have a legitimate IP address, or you need to build a secure cluster without worrying about performance problems, you can use NAT mode, and if you have high performance requirements and Linux applications, using IPIP or Dr Mode will definitely surprise you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.