LVS three ways of working eight kinds of algorithms

Source: Internet
Author: User
Tags node server

First, the cluster introduction

What is a cluster

A cluster of computer clusters is a computer system that connects highly closely with a set of loosely integrated computer software and/or hardware to perform computational work. In a sense, they can be seen as a computer. A single computer in a clustered system is often called a node, usually connected over a local area network, but there are other possible ways to connect. Cluster computers are often used to improve the computational speed and/or reliability of a single computer. In general, cluster computers are much more expensive than individual computers, such as workstations or supercomputing machines.

A cluster is a group of independent computers that combine a combination of network connections to complete a task together.

Location of LVS in the Enterprise architecture:

The above architecture is just one of many enterprises. The Green Line is the data flow to the user's access request. User-->lvs Load Balancer server--->APAHCE server--->mysql server &memcache server & shared storage server. And our MySQL, shared storage is also able to use LVS to load balance.

---------------Summary-------------------------

Cluster: is a set of independent computers, through a high-speed network to form a computer system, each cluster node is running its own process of a separate server. For network users, the back end of the website is a single system, in collaboration to provide users with system resources, system services.

-------------------------------------------

Why use a cluster

Features of the cluster

1) High performance performance. Some require very strong computational processing capabilities such as weather forecasts, nuclear tests, and so on. This is not something that can be done by several computers. It takes thousands of units to do the work together.

2) Price Availability

Usually a set of system cluster architecture, only need a few or dozens of server host can, and move on Hundred King's dedicated supercomputer has a higher cost-effective.

3) Scalability

When the server load pressure increases, the system can be extended to meet the requirements without compromising the quality of service.

4) High Availability

Despite the failure of some hardware and software, the service of the entire system must be run in a 7x24 hour.

Benefits of Clustering

1) Transparency

If some servers are down and the business is unaffected, the general coupling is not so high and the dependencies are not so high. For example, the NFS server is down and the other is not mounted, so the dependency is too strong.

2) High Performance

Increased traffic to easily scale.

3) Manageability

The whole system may be physically large, but it's easy to manage.

4) Programmability

On the cluster system, easy to develop applications, the portal will require this.

Cluster classification and the characteristics of different classifications

The computer cluster architecture is generally divided into the following categories according to function and structure:

1) Load Balancing cluster (loadbalancingclusters) for short, LBC

2) High availability cluster (high-availabilityclusters) abbreviation HAC

3) High Performance Computing Cluster (high-perfomanceclusters) referred to as HPC

4) Grid computing (gridcomputing)

The network above is generally considered to have three, load-balanced and high-availability clusters in our internet industry commonly used cluster architecture.

(1) Load Balancing cluster

Load Balancing cluster provides a more practical and cost-effective system architecture solution for enterprises. Load Balancing cluster The request load pressure that many customers have centrally access may be distributed to the computer cluster as evenly as possible. Customer request load typically includes application-level processing load and network traffic load. Such a system is ideal for serving a large number of users with the same set of applications. Each node can assume a certain amount of access request load pressure, and can implement the access request is dynamically allocated among the nodes to achieve load balancing.

When load balancing is running, client access requests are typically distributed to a back-end set of servers through one or more front-end load balancers to achieve high performance and high availability throughout the system. This computer cluster is sometimes called a server farm. General high-availability clusters and load-balancing clusters use similar technologies, or have both high availability and load balancing characteristics.

The role of a load Balancing cluster

1) Share access traffic (load Balancing)

2) Maintain continuity of business (high availability)

(2) High Availability cluster

In general, when any node in the cluster fails, all tasks on the node are automatically transferred to other normal nodes, and this process does not affect the operation of the entire cluster, and does not affect the delivery of the business.

Similar to the cluster running two or more than two of the same node, when a master node fails, then other nodes as a slave node will take over the main node above the task. From the node can take over the primary node's resources (IP address, schema identity, etc.), at which point the user will not discover the objects that provide the service from the primary node to the slave node.

The role of a high availability cluster: When one machine goes down and the other takes over. More commonly used high-availability clusters open source software is: keepalive,heardbeat.

(3) High-performance computing cluster

High-performance computing clusters use different computing nodes to assign computational tasks to clusters to improve computing power, and thus are mainly used in the field of scientific computing. The more popular HPC uses the Linux operating system and some other free software to do parallel operations. This cluster configuration is often referred to as Beowulf clusters. Such clusters typically run specific programs to perform hpccluster parallel capabilities. Such programs typically use a specific runtime library, such as an MPI library designed for scientific computing.

HPC clusters are particularly well suited for computing jobs where large amounts of data are communicated between compute nodes in a calculation, such as the intermediate results of a node or the results that affect other nodes.

Common cluster hardware and software

Common open source Cluster software is: lvs,keepalived,haproxy,nginx,apache,heartbeat

Common business cluster hardware are: F5,NETSCALER,RADWARE,A10, etc.

Ii. introduction of LVS Load Balancing cluster

The role of a load Balancing cluster: To provide a cheap, efficient, transparent way to extend the load bandwidth of network devices and servers, increase throughput, enhance network data processing capabilities, and improve network flexibility and availability.

1) The large-scale concurrent access or data traffic that cannot be sustained by a single computer can be processed separately on multiple node devices, reducing the time the user waits for response and improving the user experience.

2) The operation of a single heavy load is divided into multiple node devices to do parallel processing, each node device processing end, the results are summarized, returned to the user, the system processing capacity has been greatly improved.

3) 24x7 Service Guarantee, any one or more device node equipment downtime, can not affect the business. In a load-balanced cluster, all computer nodes should provide the same service, and the cluster load balancer gets all the requests for that service as a station.

LVS Introduction

LVS is a short Linux virtual Server Linux, is a virtual server cluster system, can be unix/linux platform to achieve load Balancing cluster function. The project was organized by Dr. Zhangwensong in May 1998.

The following is the LVS official website to provide 4 articles: (very detailed, I feel interested or look at the official documents more authentic!!) )

Http://www.linuxvirtualserver.org/zh/lvs1.html

Http://www.linuxvirtualserver.org/zh/lvs2.html

Http://www.linuxvirtualserver.org/zh/lvs3.html

Http://www.linuxvirtualserver.org/zh/lvs4.html

Ipvs history

As early as the 2.2 kernel, Ipvs was already in the form of a kernel patch.

Starting with version 2.4.23 Ipvs Software is a collection of kernel patches that are incorporated into a common version of the Linux kernel.

Since 2.4.24 Ipvs has become part of the Linux official standard kernel

As can be seen Lpvs is working at the kernel layer, we can not directly operate Ipvs,vs load Balancer Scheduling technology is implemented in the Linux kernel. Therefore, it is called a Linux virtual server. When we use this software to configure LVS, we cannot directly configure Ipvs in the kernel, but need to be managed using Ipvs's management tool IPVSADM. LVS can also be managed through keepalived.

A brief description of the structure and working principle of LVS system

The LVS cluster load balancer accepts requests from all inbound clients of the service and then decides which cluster node to process the request to reply to the client based on the scheduling algorithm.

As shown in the system of LVS virtual servers, a group of servers are connected to each other through a high-speed LAN or geographically distributed WAN, before which there is a load scheduler (load balance). The load scheduler is responsible for dispatching client requests to the real server. This way the structure of the cluster of servers is transparent to the user. Customer access to a clustered system is just like accessing a high-performance, highly available server. The client program is not affected by the server cluster and does not make any modifications.

For example: We go to a restaurant to eat a la carte, customers just order with the waiter. There is no need to know exactly how they are assigned to work, so they are internally transparent to us. At this time the waiter will be in accordance with certain rules of his hands on the work, assigned to other personnel up. This waiter is the load balancer (LB) and these are the server clusters that are really doing what's behind.

Below is the structure chart provided by the official website:

The basic working process of LVS

The customer sends a request to the Load Balancer server. The load balancer accepts requests from customers and then decides which node server to send this request to, based on the LVS scheduling algorithm (8). Then, according to their working mode (3 kinds) How to send these customer requests to the node server, how the node server should send the response packet back to the client.

Well, then we'll just have to understand the 3 working mode of LVS, 8 kinds of scheduling algorithm can be.

Three modes of operation of LVS:

1) Vs/nat mode (Network address translation)

2) Vs/tun mode (tunneling)

3) Dr Mode (Direct routing)

1. Nat mode-Network address translation

VirtualServer via Network address translation (Vs/nat)

This is through the network address translation method to achieve scheduling. First the Scheduler (LB) receives the client's request packet (the destination IP for the request is the VIP), and according to the scheduling algorithm, decides which backend to send the request to the real server (RS). The dispatch then changes the destination IP address and port of the request packet sent by the client to the IP address (RIP) of the backend real server, so that the real server (RS) can receive the customer's request packet. After the real server responds to the request, view the default route (NAT mode we need to set the RS default route to the LB server.) After sending the response data packets to LB,LB and receiving the response packet, the source address of the package is changed to the virtual address (VIP) and then sent back to the client.

Scheduling process IP Packet Detail diagram:

Schematic Description:

1) Client request data, target IP is VIP

2) Request data to the LB server, lb according to the scheduling algorithm to modify the destination address to rip address and corresponding port (this RIP address is based on the scheduling algorithm.) ) and record the connection in the connection hash table.

3) The packet arrives from the LB server to the RS server webserver, and then webserver responds. The webserver gateway must be lb and then return the data to the LB server.

4) After receiving the return data from RS, modify the source address vip& the target address CIP, and the corresponding port 80 according to the Connection hash table. Then the data arrives at the client from Lb.

5) The client receives only the VIP\DIP information.

Nat Mode pros and Cons:

1, NAT technology will request the message and response of the message needs to be addressed through the LB address rewrite, so the site traffic is relatively large when the LB load Balancer Scheduler has a larger bottleneck, the general requirements of up to 10-20 nodes

2, only need to configure a public network IP address on lb.

3. The gateway address of each internal node server must be the intranet address of the scheduler lb.

4. Nat mode supports the conversion of IP address and port. That is, the port that the user requests and the port of the real server can be inconsistent.

2. Tun Mode

Virtual server via IP tunneling mode: When a NAT mode is used, the scheduler processing power becomes a bottleneck as the request and response messages must be rewritten through the scheduler address, as the client requests become more and more. To solve this problem, the scheduler forwards the requested message over the IP tunnel to the real server. The real server returns the response processed data directly to the client. In this way, the dispatcher only processes the request inbound message, because the General Network Service answer data is much larger than the request message, after adopting the Vs/tun mode, the maximum throughput of the cluster system can be increased 10 times times.

The work flow chart for Vs/tun is as follows, unlike Nat mode, where the transfer between LB and RS does not overwrite the IP address. Instead, the client request package is encapsulated in an IP tunnel, and then sent to the RS node server, and the node server receives the IP tunnel after it has been unpacked and responds to processing. and directly send the package through their own extranet address to customers without going through the LB server.

Tunnel principle Flowchart:

Schematic process Brief:

1) The client requests the packet, and the destination address VIP is sent to Lb.

2) LB receives the customer request packet for IP tunnel encapsulation. That is, in the original Baotou plus IP tunnel header. and send it out.

3) RS node server according to the IP Tunnel header information (at this time another logical stealth tunnel, only between LB and Rs) received the request packet, and then the IP tunnel header information, to obtain the customer Request packet and response processing.

4) After the response has been processed, the RS server packets the response data to the client using its own public network line. The source IP address or the VIP address. (The RS node server needs to configure the VIP on the local loopback interface, which will be spoken later)

3. Dr Mode (direct routing mode)

Virtual server via direct routing (VS/DR)

The DR Mode sends the request to the real server by overwriting the destination MAC address of the request message, and the processing result of the real server response is returned directly to the client user. As with Tun mode, Dr Mode can greatly improve the scalability of the cluster system. And Dr Mode does not have the overhead of IP tunneling, and it is not necessary to support the requirements of IP tunneling protocol for real servers in the cluster. But requires that the scheduler lb and real server RS have a NIC connected to the same physical network segment, must be in the same LAN environment.

Dr Mode is a more used mode of Internet use.

Schematic diagram of Dr Mode:

Dr Mode principle Process brief:

The work flow diagram of the VS/DR mode, as shown in, its connection scheduling and management as in NAT and Tun, its message forwarding method and the first two different. The DR mode routes the message directly to the target real server. In DR Mode, according to the load situation of each real server, the dispatcher chooses a server dynamically, does not modify the target IP address and destination port, and does not encapsulate the IP packet, but instead the target MAC address of the data frame of the request message to the MAC address of the real server. The modified data frame is then sent on the local area network of the server group. Because the MAC address of the data frame is the MAC address of the real server, and it is on the same LAN. Then according to the network communication principle, the real reset is bound to receive the packet sent by Lb. When the real server receives the request packet, it is the VIP to unlock the IP header to see the target IP. (at this time only their own IP Meet target IP will be accepted, so we need to configure the VIP on the local loopback pretext . Another: Because the network interface will be ARP broadcast response, but the other machines in the cluster have the VIP lo interface, the response will conflict. So we need to shut down the ARP response to the LO interface of the real server . the real server then responds to the request, then sends the response packet back to the customer based on its own routing information, and the source IP address is the VIP.

Dr Mode Summary:

1. Forwarding is implemented by modifying the destination MAC address of the packet on the scheduler lb. Note The source address is still CIP, and the destination address is still the VIP address.

2, the requested message passes through the scheduler, and the RS response processing message does not need to go through the scheduler lb, so the concurrent access volume is very high efficiency (and NAT mode ratio)

3, because the DR mode is through the MAC address rewriting mechanism for forwarding, so all RS node and scheduler lb only in one LAN

4, the RS host needs to bind the VIP address on the LO interface, and need to configure ARP suppression.

5, the RS node default gateway does not need to be configured to LB, but directly configured as a superior Route gateway, can let RS directly out of the network.

6, because the DR Mode scheduler only makes the MAC address rewrite, so the scheduler lb can not overwrite the target port, then the RS server will have to use the same port as the VIP service.

The official three kinds of load balancing technology comparison summary table:

Working mode

Vs/nat

Vs/tun

Vs/dr

Real Server

(node server)

Config Dr GW

Tunneling

Non-arp Device/tie VIP

Server Network

Private

Lan/wan

LAN

Server number

(Number of nodes)

Low 10-20

High 100

High 100

Real Server Gateway

Load Balance

OWN Router

OWN Router

Advantages

Address and Port conversions

WAN Environments Encrypt data

Highest performance

Disadvantages

Low efficiency

Tunnel support Required

Cannot cross domain LAN

LVS Scheduling algorithm

It is best to refer to this article: http://www.linuxvirtualserver.org/zh/lvs4.html

The LVS scheduling algorithm determines how the workload is distributed among the cluster nodes. When the Director scheduler receives an inbound request from the Cluster service on the client to access the VIP, the Director scheduler must decide which cluster node should process the request. The scheduling method of director dispatcher is basically divided into two categories:

Fixed scheduling algorithm: Rr,wrr,dh,sh

Dynamic Scheduling algorithm: WLC,LC,LBLC,LBLCR

Algorithm

Description

Rr

Polling algorithm, which assigns the request to a different RS node, that is, an equally distributed distribution in the RS node. This algorithm is simple, but only suitable for the RS node processing performance is similar to the situation

Wrr

Weighted rotation dispatch, which assigns tasks according to the weights of different Rs. RS with higher weights will take precedence over the task, and the number of connections allocated will be more than RS with lower weights. RS of the same weighted value get the same number of connections.

Wlc

Weighted minimum number of connections scheduling, assuming that each of the RS full-time is WI, the current number of TCP connections is TI, in turn Ti/wi for the smallest RS as the next assigned RS

Dh

Destination Address hash dispatch (destination hashing) Find a static hash table with the destination address as the keyword to obtain the required RS

Sh

Source Address hash dispatch (source hashing) finds a static hash table with the source address as a keyword to obtain the required RS

Lc

The minimum number of connections is dispatched (Least-connection), and the Ipvs table stores all active connections. LB compares the connection request to the RS with the least current connection.

Lblc

Address-based minimum number of connections dispatch (locality-based least-connection): This server is not fully loaded when a request from the same destination address is assigned to the same Rs. Otherwise, the request is assigned to the RS with the smallest number of connections and is considered first as the next assignment.

Production environment selection of LVS scheduling algorithm:

1, general network services, such as http,mail,mysql, such as the common LVS scheduling algorithm:

A. Basic polling scheduling RR

B. Weighted minimum connection scheduling WLC

C. Weighted polling schedule WRC

2, the minimum connection based on local LBLC and with replication to give local minimum connection LBLCR is mainly applicable to Web cache and DB cache

3, the source address hash dispatch SH and the target address hash schedule DH can be used in the firewall cluster, can guarantee the entire system access port unique.

Practical application of these algorithms are many applications, the best reference in the core of the implementation of the connection scheduling algorithm principle, and then according to the specific business needs of a reasonable selection.

-----------------Follow-up self-summary--------------------------------------------------

Basically, the principle of lvs here, the individual still feel like to have a more comprehensive understanding of LVS, or need to go to the official documents carefully read it over. The main part is still 3 kinds of working methods and 8 scheduling algorithms. And what kind of scheduling algorithm is applicable in the production environment of actual work.

@ Turn from http://www.it165.net/admin/html/201401/2248.html

LVS three ways of working eight kinds of algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.