Using LVS to build a load-balanced cluster

Source: Internet
Author: User

Sometimes, the performance of a single server may not be able to cope with large-scale service requests, and in the event of a failure, it can cause users to be inaccessible for a period of time. With clustering technology, you can get relatively high returns on performance, reliability, and flexibility at a lower cost.

A cluster is a set of independent, interconnected computer groups that are managed in a single system mode, or used for load sharing, or for enhanced reliability, or for high-speed computing.


First, the cluster type

Lb:load balancing, load Balancing cluster, to eliminate single point of failure, when a node fails to provide services, automatically and quickly switch to another node; Common software or implementation methods LVs, Haproxy, Nginx

Ha:high availability, a highly available cluster, assigns client requests to the cluster of nodes in accordance with the proper load balancing principle; heartbeat, Corosync+pacemaker, Cman+rgmanager, Cman + Pacemaker, keepalived

Hpc:high performance Computing, high performance computing cluster; Hadoop


Ii. working principle of LVS

LVS consists of two pieces of code Ipvsadm/ipvs, Ipvsadm is a user-space program, for writing rules (defining the Cluster service, scheduling standards, indicating which backend servers, etc.) and to Ipvs;ipvs is integrated in the kernel of the real implementation of the scheduling function of the code, Work on the input chain of netfilter, it will determine the matching request message to which back-end server according to the specified scheduling criteria, then make certain processing of the request message (or modify the target IP, or modify the target Mac, etc.) and go to the postrouting chain to the corresponding back-end server

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/7E/6C/wKiom1b-UKejsr0jAABp1-8urn4482.png "title=" 2016-04-01_182703.png "width=" "height=" 306 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" WIDTH:600PX;HEIGHT:306PX; " alt= "Wkiom1b-ukejsr0jaabp1-8urn4482.png"/>


Iii. related Terms of LVS

Director: Scheduler, a host that provides Ipvsadm/ipvs, direct client-side requests to implement scheduling functions

Real server: Back-end real servers, which are the servers that really handle client requests

Cip:client IP

Vip:virtual IP, destination IP requested by client, to be configured on director

Dip:director IP

Rip:real Server IP


Iv. Types of LVs

1, Lvs-nat

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7E/67/wKioL1b-QerzQqtwAABejzYKSPA986.png "title=" 2016-04-01_173640.png "width=" "height=" 329 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" WIDTH:700PX;HEIGHT:329PX; " alt= "Wkiol1b-qerzqqtwaabejzykspa986.png"/>

In this model, the response message is forwarded back to the client through the director, so Real server and director are on the same subnet and the real server gateway must point to the director. The director makes Dnat (destination address translation) and Snat (source address translation) for the request and response messages respectively.

Characteristics of the NAT type:

The ①RS can use a private address to hide the server, and the gateway to the RS must point to the dip;

② requests and responses go through director, so in high-load scenarios, director becomes a performance bottleneck;

③ support port mapping;

④rs can use any OS;


2, LVS-DR

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7E/6A/wKioL1b-mHHitMbFAACdy3peeT4217.png "title=" 2016-04-01_234556.png "width=" "height=" 294 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" WIDTH:700PX;HEIGHT:294PX; " alt= "Wkiol1b-mhhitmbfaacdy3peet4217.png"/>

In this model, the response message does not flow through the director, but responds directly to the client, so the VIP must also be configured on the real server and hidden, that is, not allowed to advertise and respond to ARP parsing requests. Otherwise the request message may not be directly to the director and the real server;director can not do IP address translation of the request message, can only modify its destination MAC address to send it to real server, therefore, the Director and real The server is in the same physical network.

Features of the DR type:

⑴ ensures that front-end routing sends all messages destined for VIP to directory, not RS;

Solution:

① Static Address binding: operation on the front-end router (problem: There may not be a route operation permission)

②aprtables

③ modifies the kernel parameters on the RS, configures the VIP on the RS on the alias of the Lo interface, and restricts its inability to respond to the VIP Address resolution request;

⑵rs can use a private address, or you can use a public network address, which can be accessed through the Internet via RIP directly;

⑶rs and directory must be in the same physical network;

The ⑷ request message passes through the director, but the response message does not pass through the director, so the director load is greatly reduced compared to the NAT model.

⑸ does not support port mapping;

⑹rs can be the most common OS;

⑺rs gateways are never allowed to point to dips;

⑻ Disadvantage: Rs binding VIP, the risk of large


Configuration of the LVS-DR:

First configure the kernel parameters on the ⑴rs:

Echo 1 >/proc/sys/net/ipv4/conf/lo/arp_ignore

Echo 2 >/proc/sys/net/ipv4/conf/lo/arp_announce

Echo 1 >/proc/sys/net/ipv4/conf/all/arp_ignore

Echo 2 >/proc/sys/net/ipv4/conf/all/arp_announce

Description

Arp_ignore: How to respond to a received ARP request, the default is 0, as long as the native has, the response, 1 means only when the request resolution of the IP address to receive the request message port address when the response

Arp_annouce: How to advertise the native address, the default is 0, advertise the native All interface address, 2 indicates only the address of the network directly connected interface

The purpose of this step is to prohibit the notification of VIPs and respond to ARP parsing requests to VIPs

⑵ then configure the VIP on the RS:

ifconfiglo:0 VIP netmask 255.255.255.255 broadcast VIP up

Route add-host VIP Dev lo:0 #请求报文本来是物理接口eth # received, you need to add a request message to the VIP interface (VIP is usually configured on the lo:0) route, so that the source address of the response message can be a VIP.


3, Lvs-tun

The Lvs-tun differs from LVS-DR in that it allows director and real server to be in different networks, which is implemented through tunneling mechanisms. In this model, the director will be in the request message before the original IP header to encapsulate a layer of header, so that the IP header of the request packet becomes: Inner IP header (source address: CIP, Destination Address: VIP), the outer IP header (source address: DIP, Destination Address: RIP). After the real server receives the message, the message is first unpacked to obtain the original destination address as a VIP message, the VIP address is configured on the local IP tunnel device, so the request is processed, and then the response message is returned to the client directly according to the routing table.

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M01/7E/6A/wKioL1b-paSiHC36AACHykrrvoY200.png "title=" 2016-04-02_004220.png "width=" "height=" 293 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" WIDTH:700PX;HEIGHT:293PX; " alt= "Wkiol1b-pasihc36aachykrrvoy200.png"/>


Features of the Tun type:

①rip, VIP, dip are all public network address;

The ②rs gateway will not and cannot point to the dip;

③ request message through director, but the response message must not go through the director;

④ does not support port mapping;

⑤rs OS must support tunnel function, generally need to install tunnel network card;

⑥ because the model allows director and RS not to be in the same physical network, it is often used in remote disaster recovery scenarios

The configuration of the Lvs-tun can be referenced http://www.linuxidc.com/Linux/2012-09/71340p3.htm


4, Lvs-fullnat

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/7E/7A/wKiom1cBPH7TXOBNAABhFhlgWBM183.png "title=" 2016-04-03_235223.png "width=" "height=" 236 "border=" 0 "hspace=" 0 "vspace=" 0 "style=" WIDTH:700PX;HEIGHT:236PX; " alt= "Wkiom1cbph7txobnaabhfhlgwbm183.png"/>

At the same time, the source address and destination address (CIP--VIP ==> Dip--rip) of the request message are modified, the communication between the Director and RS is realized across the subnet, and the in/out flow goes through the director;

The Fullnat mode is not integrated into the Linux kernel, and is used to patch the kernel and is not supported by IPVSADM, but you can use KeepAlive to generate rules.


Four, LVs scheduling method

Grep-i ' VS '/boot/config-version

⑴ static method: Dispatch only according to the scheduling algorithm itself

Rr:round Robin, turns, polls, rounds

Wrr:weighted round robin, weighted polling

Sh:source hashing, the source address hash, indicates that the request from the same CIP is always directed to the same rs;session hold;

Dh:destination hashing, target address hash, backend is generally cache server, can improve cache hit ratio

⑵ dynamic method: Scheduling based on the algorithm and the current load status of each RS

Lc:least Connection, Minimum connection

Overhead=active*256+inactive #overhead值越小越优先被选中, if the Overhead value is the same, the top-down selection based on the real server list

wlc:weighted LC, Default scheduling method

Overhead= (active*256+inactive)/weight

Sed:shortest expection delay;

The WLC has a drawback, that is, when the overhead value is the same, the weight of small if the top of the list will be preferred, when the request volume is only one or a few, this scheduling is not appropriate, and the main purpose of SED scheduling is to give priority to the right to accept requests

Overhead= (active+1) *256/weight

Nq:never Queue, first poll by weight, then select according to the overhead value of SED

lblc:locality-based Least Connection, dynamic version dh

Lblcr:replicated LBLC


V. Configuring the Director with Ipvsadm

Yum-y Install Ipvsadm

⑴ defines a Cluster service:

ipvsadm-a| E-t|u|f Service-address [-S scheduler]

-A: Add

-E: Modify

-f:firewall Mark

Service-address:

-t|u:vip:port

-F: #

Example: Ipvsadm-a-T 172.16.100.7:80-s WLC

⑵ Add an RS to an already existing Cluster service:

Ipvsadm-a|e-t|u|f service-address-r server-address [options]

-A: Add RS Record

-E: Modify RS Record

Options

-W Weight

-G,--gatewaying: Specifies the LVS operating mode for Dr Mode, default mode

-I,--IPIP:IPIP encapsulation (tunneling)

-M,--masquerading:masquerading (NAT)

Example: Ipvsadm-a-T 172.16.100.7:80-r 10.0.0.8-m-W 2

⑶ view the defined Cluster service and RS:

Ipvsadm-l-N

-C: View each connection

--stats: Statistical data

--rate: Rate

--exact: Exact value

⑷ removing rs:ipvsadm-d-t|u|f service-address-r server-address from the Cluster service

⑸ Delete Cluster service: ipvsadm-d-t|u|f service-address

⑹ emptying all cluster services: IPVSADM-C

⑺ Save the Cluster service definition:

Ipvsadm-s >/path/to/some_rule_file

Ipvsadm-save >/path/to/some_rule_file

⑻ let the rules in the rules file take effect:

Ipvsadm-r </path/from/some_rule_file

Ipvsadm-restore </path/from/some_rule_file

⑼ defining cluster services based on firewall tags

Function: Defines a service with multiple different ports belonging to the same group of applications as a cluster service, unified scheduling; For example, HTTP and HTTPS

Combining NetFilter to realize a Cluster service definition mechanism;

① defines the rules in the prerouting chain of the mangle table, implementing the specified firewall tag;

Iptables-t mangle-a prerouting-d vip-p {tcp|udp}--dport port-j MARK--set-mark #

② defines the Cluster service based on previous tags;

Ipvsadm-a-F # [-S METHOD]

Ipvsadm-a-F #-R RS [options]

For example:

Iptables-t mangle-a prerouting-d 192.168.30.13-p tcp-m multiport--dports 80,243-j MARK--set-mark 2

Ipvsadm-a-F 2-s WLC

Ipvsadm-a-F 2-r 172.16.100.2-g W 3

⑽lvs Persistent Connection function

Regardless of the scheduling method used, the persistent connection feature ensures that requests from the same client are always directed to the same RS for a specified period of time. When using LVS for persistent connections, the scheduler uses connection tracking (persistent connection template) to record the mappings of each client and the real server assigned to it

-P,--persistent [timeout]: Use persistent connection, default length is 300 seconds

Persistent connection type:

①PCC (Persistent client connections): Persistent client connection, also known as 0 port connection

When defining a Cluster service based on TCP or UDP, requests from the same client for all ports are always directed to the previously selected RS; 0 as the port number

Example: Ipvsadm-a-T 172.16.100.7:0-S RR-P

②PPC (persistent port connections): Persistent port connection, will be requested on the same port, always directed to the previously selected RS; single service scheduling; separate scheduling for each cluster service

Example: Ipvsadm-a-T 172.16.100.7:21-s rr-p

③PFMC (Persistentfirewall mark Connections): Persistent firewall tag Connection

Iptables-t mangle-a prerouting-d 192.168.30.13-p tcp-m multiport--dports 80,243-j MARK--set-mark 2

Ipvsadm-a-F 2-s wlc-p 600


Vi. Session Persistence mechanism:

①session binding: Always directs requests from the same source IP to the same RS; no fault tolerance; lossy equalization effect;

②session replication: Synchronize sessions between RS, each with all the sessions in the cluster, not applicable to large-scale clusters;

③session server: Use a separate server to manage the session in the cluster;


Seven, real server health state detection

LVS does not have a back-end RS health detection function and requires the help of keepalive,keepalive not only to provide high availability for LVS, but also to generate rules for LVS.

1. How to detect the health status of Rs:

Detect using protocols that are dependent on the Cluster service

Probe the port of a specified protocol using the port scan or probe class tool

Probing at the network layer

2. Treatment measures:

① automatic off-line each RS;

Online--fail, probe three times and above;

Offline-OK, one time;

② if all RS are faulty, a back server should be provided;

Ipvsadm-a-T 192.168.30.13:80-r 127.0.0.1-m-W 0


Using LVS to build a load-balanced cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.