LVS (Linux virtual server), keepalived

Source: Internet
Author: User
Tags iptables haproxy

Linux Cluster:

HTTPD:AB, Benchmark;

How the System expands:
Scale up: scaling up;
XXX can better server replace the existing server;
Scale out: outward expansion;
Provide more servers to meet the same needs;

Cluster: Organize multiple hosts to meet a specific requirement;

Cluster Type:
1, lb:load Balancing, load balanced cluster;
Load balancer, Scheduler;
Upstream server (upstream server), back-end server, True server (real server);
Spof:single Point of Failure

2, Ha:high avalilability, high availability cluster;
Active: Active Server
Passive: Standby server

Availability = Average trouble-free time/(mean time to failure + mean time to repair)
Increase molecule
Decrease denominator: Reduce mean time to repair;
Redundancy:

90%, 95%, 99%, 99.9%, 99.99%, 99.999%

3. Hp:high Performance: High Performance cluster
www.top500.org

Ds:distributed System
Hadoop:
Mapreduce
Hdfs

The implementation of the LB cluster:
Hardware:
F5 Big-IP
Citrix Netscaler
A10 A10
Array
Redware
Software:
Lvs:linux Virtual Server
Haproxy
Nginx
ATS (Apache traffic server)
Perlbal

Work-based protocol hierarchy:
Transport Layer:
LVS, Haproxy (mode TCP)
Application layer:
Haproxy (mode http), Nginx, ATS, Perlbal

Implementation of the HA cluster:
Keepalived: Realize the address drift by implementing the VRRP protocol;
Ais:
Heartbeat
Cman+rgmanager (rhcs:redhat cluster Suite)
Corosync+pacemaker

System construction: Layering, partitioning
Distributed: Applications, data, storage, computing

Lvs:linux Virtual Server

Layer 4: Four layers switching, four layer routing;
Forwarding it to a server in the back-end host cluster based on the destination IP and destination port of the request message (based on the scheduling algorithm);

LVS works on the INPUT chain of the director's NetFilter table, whose data flows to prerouting------>input------->postrouting

Terms of the LVS cluster:
Vs:virtual Server
Director, Dispatcher, Balancer
Rs:real Server

Cip:client IP
Director Virtual IP:VIP
Directory Ip:dip
Real Server Ip:rip

Design-time essentials in a load-balanced cluster:
(1) session retention;
Session binding (Sticky) (source IP hash);
Session cluster (cluster) (multicast/broadcast/unicast);
Session servers (server);
(2) data sharing;
shared storage;
Nas:network attached Storage (file level);
San:storage Area Network (Block level);
Ds:distributed Storage;
Data synchronization:
Rsync
...

Type of LVS:
Lvs-nat
LVS-DR (Direct routing)
Lvs-tun (IP tunneling)
Lvs-fullnat (simultaneous change of the source IP and destination IP of the request message)

Note: The first three types are standard; Fullnat is the type that is added later, the kernel may not support it by default;

Lvs-nat:
Multi-Objective Dnat: By modifying the target address and target port of the request message to select the RIP and port of a certain RS;

(1) Rip and dip should use the private network address, RS Gateway should point to dip;
(2) The request and response messages are forwarded through the Director; In highly loaded scenarios, the director may become a system bottleneck;
(3) Support port mapping;
(4) vs must be Linux,rs can be any OS;
(5) RS rip must be in the same IP network as the director's dip;

Lvs-dr:direct Routing
By modifying the MAC address of the request message, the IP header does not change (the source IP is CIP and the destination IP is always VIP);

(1) Ensure that the front-end router will send the target IP to the VIP request message will be sent to director;
Solution:
1, static binding;
2, prohibit the RS to respond to the VIP ARP request;
(a) Arptables;
(b) Modify the core parameters of each RS, and the VIP configuration on a specific interface to implement the prohibition of its response;
(2) The rip of Rs can use the private address or the public network address;
(3) RS and director must be in the same physical network;
(4) The request message must be dispatched by the Director, but the response message must not pass through the Director;
(5) port mapping is not supported;
(6) Each RS can use most of the OS;

Lvs-tun:ip Tunneling,ip Tunnel;
Forwarding mode: Do not modify the Request packet IP header (the source IP is CIP, the target IP is VIP), but in the original IP header to encapsulate a second IP header (the source IP is dip, the target IP is RIP);

(1) RIP,DIP,VIP is a public network address;
(2) RS Gateway cannot and cannot point to dip;
(3) The request message is dispatched by the Director, but the response message will be sent directly to CIP;
(4) port mapping is not supported;
(5) The OS of RS must support the IP tunneling function;

Lvs-fullnat:
Forwarding is achieved by simultaneously modifying the source IP address (CIP-->DIP) and the Destination IP address (VIP-to-rip) of the request message;

(1) VIP is the public network address, RIP and dip is the private network address, and can not be in the same IP network, but need to communicate with each other through the route;
(2) The source IP of the request message received by RS is dip, so its response message will be sent to dip;
(3) The request message and the response message must pass through the Director;
(4) Support port mapping;
(5) RS can use any OS;

LVS Scheduler: Scheduling algorithm
A, static method: Only according to the algorithm itself scheduling;
Rr:round Robin, rotation, polling, round call;
Wrr:weighted RR, weighted polling;
Sh:source IP hash, source address hash, the mechanism to implement session retention; Requests from the same IP are always dispatched to the same RS;
Dh:desination IP hash, target address hash, improved cache hit mechanism (LVS not supported); The request to the same target is always sent to the same RS;

B, dynamic method: According to the algorithm and the current load status of each RS (Overhead) scheduling;
Lc:least connection, minimum connection; overhead=active256+inactive
wlc:weighted LC, weighted minimum connection; overhead= (Active
256+inactive)/weight
Sed:shortest expection delay, shortest expected delay; overhead= (active+1) *256/weight
Nq:nerver Queue, never queue
Lblc:locality-based LC, which is the dynamic DH algorithm, and the cache server dispatch in the case of forward proxy;
LBLCR:LBLC with Replication, LBLC with copy function;

Lvs:
Ipvsadm/ipvs
Ipvsadm: A command-line tool for user space to manage the Cluster service and RS on the Cluster service;
Ipvs: The program code that works on the NetFilter input hook on the kernel;
Its cluster function relies on the cluster server rules defined by IPVSADM;
Support a wide range of services based on TCP,UDP,SCTP,AH,ESP,AH_ESP and other protocols;

(1) A Ipvs host can define multiple Cluster service at the same time;
(2) There should be at least one real server on a cluster service;
When defined, specify the Lvs-type, as well as the LVS scheduler;

Manage Cluster Services:
Ipvsadm?-a| E.-t|u|f? service-address? [-S scheduler]
Ipvsadm?-d?-t|u|f service-address
-A: Add
-E: Modify
-D: Delete

Service-address:
Tcp:-t? vip:port
Udp:-u? vip:port
Fwm:-f?? MARK

-S Scheduler: the default is WLC;

Manage RS on the Cluster service:
Ipvsadm-a|e-t|u|f service-address-r server-address [-g|i|m] [-w weight]
ipvsadm-d-t|u|f service-address-r server-address
-A: Add an RS
-E: Modify an RS
-D: Remove an RS

Server-address:
rip[:p ORT]

-G:LVS-DR Model (default)
-i:lvs-tun model
-m:lvs-nat model

View:
ipvsadm-l|l [Options]
-n:numeric, the number format displays the address and the port;
-c:connection, show Ipvs connection;
--stats: Statistical data;
--rate: Rate
--exact: Exact value

Purge rule:
Ipvsadm?-C

Save and Reload:
Save:
Ipvsadm-s?>/path/to/some_rule_file
Ipvsadm-save?>/path/to/some_rule_file

Overload:
Ipvsadm-R </path/from/some_rule_file
Ipvsadm-restore </path/from/some_rule_file

Counter Clear 0:
Ipvsadm?-Z? [-t|u|f service-address]

Fwm:firewall Mark

ipvsadm-a| E-t|u|f Service-address [-S scheduler]
-T,-u:service-address
Ip:port
-f:service-address
Firewall mark

Features of the iptables:
Filter, NAT, Mangle, Raw

Mangle
Target:mark
--set-mark Value[/mask]

To define a Cluster service based on FWM:
(1) Marking

Iptables-t mangle-a prerouting-d $VIP-P $protocol--dport $serviceport-j MARK--set-mark

#: Integer
(2) Define the Cluster service

Ipvsadm-a-F # [-S scheduler]

LVS Persistence: Persistent connection; Attaching to the Cluster service when you define it?-P? #
Function: Regardless of which scheduler the Ipvs uses, it is capable of always sending a request from the same IP address to the same RS within a specified time frame, which is implemented by the LVS Persistent connection template, which is independent of the scheduling method;

Ipvs Persistent connection mode: Default time is 5 minutes
Per-Port persistent (PPC): Single-Service persistent scheduling
Per FWM Persistent (PFWMC): Single firewall token (FWM) Persistent dispatch
Per client Persistence (PCC): Port No. 0, which defines the TCP or UDP protocol, is the Cluster service port, and the Director recognizes any request from the user as a cluster service and dispatches it to RS
Example:

Ipvsadm-a-T 172.20.120.71:0-S rr-pipvsadm-a-T 172.20.120.71:0-R 172.20.120.41-gipvsadm-a-T 172.20.120.71:0-R 17 2.20.120.42-g

Example Lvs-nat Model: The topology diagram is as follows: 1. Request and response messages are all via DIRECTOR;2, each RS gateway must point to dip;
?
Director

Ipvsadm-a-T 172.20.120.40:80-s rripvsadm-a-T 172.20.120.40:80-r 192.168.20.11-mipvsadm-a-T 172.20.120.40:80-r 19 2.168.20.12-m

Example LVS-DR model: topology diagram as follows; key: 1, the request message through the Director, the response message directly from each RS response; 2, each RS because all have VIP, therefore need to solve ARP broadcast and ARP response problem; 3, request message arrives director, By modifying the MAC address to reach the selected Realserver;4, director and each RS in the same physical network;

Director

Ifconfig eno16777736:0 172.20.120.71/32 uproute add-host 172.20.120.71 Dev eno16777736:0ipvsadm-a-t 172.20.120.71:80-r ssipvsadm-a-T 172.20 .120.71:80-r 172.20.120.41-gipvsadm-a-T 172.20.120.71:80-r 172.20.120.42-g

Each RS: script is as follows lvs-dr.sh
#!/bin/bash
vip=172.20.120.71
Interface=lo

Case $ in
Start
Echo 1 >/proc/sys/net/ipv4/conf/all/arp_ignore
Echo 1 >/proc/sys/net/ipv4/conf/lo/arp_ignore
Echo 2 >/proc/sys/net/ipv4/conf/all/arp_announce
Echo 2 >/proc/sys/net/ipv4/conf/lo/arp_announce

Ifconfig $interface: 0 $vip/32 up
Route add-host $vip Dev $interface: 0
;;
Stop
Route del-host $vip Dev $interface: 0
Ifconfig $interface: 0 Dow

echo 0 >/proc/sys/net/ipv4/conf/all/arp_ignore
echo 0 >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo 0 >/proc/sys/net/ipv4/conf/all/arp_announce
echo 0 >/proc/sys/net/ipv4/conf/lo/arp_announce
;;
Esac

Another situation: VIP and rip are not in the same network segment

LVs ha Features:
Director: Redundancy at the node level; HA cluster solution: keepalived;
Real server: Allow Director to do health detection, and automatically complete the addition or removal of management functions based on the results of detection;
When healthy: Online
Non-healthy: offline

1, how to do the real server health state detection:
(1) Network layer: detect the host's survival status; ICMP ping
(2) Transport layer: the availability of the probe port, TCP ping
(3) Application layer: Request a key resource; curl

2, check the frequency:

3. State judgment:
Offline: OK---failure--and failure-failure???????????? Rs:down
Online: Failure----ok????????????????????????????????????? Rs:up

4, Backup_server (Sorry_server): That is, all realserver are offline, the temporary provision of the Web page;

HA: Provides redundant hosts to improve system availability;
availability= (MTBF) Average trouble-free time/(mean time to failure + mean time to repair (MTTR))
Measurements: 95%,99%,99.9%,99.99%,99.999%

Solution for HA cluster:
1, the realization of VRRP agreement; keepalived
2, AIS: Complete ha cluster; heartbeat, Corosync

Keepalived is the realization of VRRP, the original design purpose for IPVS services to provide high availability;
VRRP protocol on the Linux host to the implementation of the daemon process, the ability to generate IPVS rules according to the configuration file, and the health of each RS to do detection; vrrp_script, vrrp_track;
VRRP terminology: Virtual redundant routing protocol dummy Redundant routing protocol
Virtual Router, Vrid (identity 0-255 of the virtual router), Master, Backup, VIP, VMAC (00-00-5e-00-01-vrid), priority, preemptive, non-preemptive, keepalived using gratuitous ARP (free ARP);
Working mode: Preemptive, non-preemptive
Operating mode: Master, master/master (configure multiple virtual routers)
Authentication method: No authentication, simple string authentication (pre-shared key), MD5 authentication

Component:
1. Control Components: Profile Analyzer
2. Memory Management
3. IO multiplexing
4. Core components: VRRP stack, checker, Ipvs wrapper, watch dog

Configuration Prerequisites for HA cluster (? keepalived):
1, each node time to synchronize;
NTP protocol, Chrony;
2, to ensure that iptables and SELinux will not become an obstacle;
3, (keepalived optional) between the nodes can communicate with each other through the host name, that is, the resolution of the name resolution service must be consistent with the results of the "uname-n" command;
4, (keepalived optional) the root user between each node can be based on the key authentication SSH communication;

Multicast address: 224-239

Install keepalived:centos6.4+, package already available in base source

Yum Install Keepalived-y

Master configuration file:/etc/keepalived/keepalived.conf
Unit File:/usr/lib/systemd/systemd/keepalived.service, its configuration file/etc/sysconfig/keepalived
The configuration file content block is divided into three segments:
1. Global Configuration segment: Global_defs {?...?}
2. VRRP configuration section:
A, Vrrp_sync_group? Group_name?? { ?... }
B, vrrp_instance? Instance_name?? { ?... ?}
3, LVS configuration section: Virtual_server_group? Vsg_name?? { ? ?... ? ?}
Virtual_server? Ip? PORT? | ? fwmark int? {
Protocol? Tcp
...
Real_server?<ipaddr>? <PORT>? {
...
}
Real_server?<ipaddr>? <PORT>? {
...
}
}

Global configuration:
Global_defs {
Notification_email {
...
}: Recipient email address
Notification_email_from: Sender e-mail address
Smtp_server: Mail sending server IP;
Smtp_connect_timeout: The mail server establishes a connection timeout;
router_id Lvs_devel: identifier of the physical node, establishing the use hostname;
Vrrp_mcast_group4:ipv4 multicast address, default 224.0.0.18;

VRRP instance configuration:
Vrrp_instance NAME {
...
}
Common configuration:
State master| BACKUP: The initial state of this node in the current VRRP instance;
Interface? IFACE_NAME:VRRP is used to bind the VIP interface;
VIRTUAL_ROUTER_ID #: The Vrid of the current VRRP instance, the available range is 0-255, the default is 51;
Priority #: Priorities of the current node, available range 0-255;
Advert_int 1: Notification interval;
Authentication {?? # Authentication block authentication mechanism

pass| | Ahpass-simple Passwd (suggested) ah-ipsec (not recommended))

Auth_type PASS

Password for accessing vrrpd.should is the same for all machines. Only the first eight (8) characters is used.

Auth_pass 1234
}
virtual_ipaddress {
<IPADDR>/<MASK> brd <IPADDR> Dev <STRING> scope <SCOPE> label <LABEL>
}
track_interface{#定义要监控的接口
Eth0
Eth1

Nopreempt: Working in non-preemption mode (default to preemption mode)

To define a notification script:
Vrrp_instance?<string>? {
...
Notify_master?<string>|<quoted-string>
Notify_backup?<string>|<quoted-string>
Notify_fault?<string>|<quoted-string>
Notify?<string>|<quoted-string>
}

Sample script:
#!/bin/bash

Author:MageEduDescription:An Example of Notify script

#
contact= ' [email protected] '

Notify () {
Mailsubject= "$ (hostname) to be $1:VIP floating"
Mailbody= "$ (date + '%F%h:%m:%s '): VRRP transition, $ (hostname) changed to be $"
echo $mailbody | Mail-s "$mailsubject" $contact
}

Case $ in
Master
Notify Master
Exit 0
;;
Backup
Notify Backup
Exit 0
;;
Fault
Notify fault
Exit 0
;;
*)
echo "Usage: $ (basename $) {Master|backup|fault}"
Exit 1
;;
Esac

Call Method:
Vrrp_instance <String> {
...
Notify_master? " /etc/keepalived/notify.sh Master "
Notify_backup? " /etc/keepalived/notify.sh Backup "
Notify_fault? " /etc/keepalived/notify.sh Fault "
}
Note: To use double quotation marks;

Ipvs Service Definition Method:
Virtual Server:
Virutal_server VIP? Port |
Virtual_server Fwmark int? {
...
}

Common parameters:
Delay_loop?<int>? Defines the service polling interval;
Lb_algo? Rr|wrr|lc|wlc|lblc|sh|dh? Define load Balancing scheduling method;
Lb_kind? Nat| dr| TUN? Define the type of cluster
Persistence_timeout?<int>? Long-lasting connection duration;
Protocol TCP Service Agreement;
Sorry_server?<ipaddr>?<port>? When all RS are faulty, provide say sorry server;

To define the RS method:
Real_server <IPADDR> <PORT>? {
...
}
Common parameters:
Weight <INT>? weight;
NOTIFY_UP the notification script that is called when the?<string>|<quoted-string> node is on line;
Notify_down?<string>|<quoted-string>? Notification scripts that are invoked when the node is offline;

Http_get| Ssl_get| tcp_check| smtp_check| Misc_check
All supported health status detection methods;

Health state detection mechanism
1. Web Application layer Detection
Http_get| Ssl_get
{
...
}
Test parameters:
Url? {
Path?<string> The URL of the requested resource when the health state is detected;
Status_code <INT>? Health status determination based on the status code of the page;
Digest <STRING> health status determination based on the digest code of the obtained content;/usr/bin/genhash generated digest code with keepalived
Nb_get_retry <int>:get The number of retries requested;
Delay_before_retry <int>: Interval between retries of two times;
Connect_timeout <integer>: Connection timeout length, default is 5s;
Connect_ip <ip address>: Send a test request to the address specified here;
Connect_port <port>: Send a test request to the port specified here;
BindTo <ip address>: Specifies the source IP of the test request message;
Bindport <port>: Specifies the source port of the test request message;
Warmup <int>: health status detection delay;
}

2. Transport Layer Health status Detection (TCP protocol layer)
Tcp_check
{
...
}
Test parameters:
Connect_timeout?<integer>

Other:
Connect_ip <ip address>
Connect_port <PORT>
BindTo <ip address>
Bind_port <PORT>

Define external scripts to detect monitoring of resources that are dependent on highly available functionality
Vrrp_script? NAME? {
Script
Interval
Weight
}

In the instance trace definition script, the monitoring mechanism used as the current node of the instance
Track_script {
NAME
}

Monitor the network interface of interest
Track_interface {
Iface_name
}

Using non-preemption mode
Nopreempt

Using deferred preemption mode
Preempt_delay? Time

Vrrp_script: Customizing a resource monitoring script; The VRRP instance can return values based on the script State to
Public definitions, which can be called by multiple instances, are therefore defined outside of the VRRP instance;

Track_script: Call the Vrrp_script-defined script to monitor the resource;
Define in the instance, call the predefined vrrp_script;

Script instance: Reduce weight to IP drift by detecting the return value of the httpd service as normal.
Vrrp_script CHK_HTTPD {
???? script "Killall-0 httpd"
???? Interval 2
???? weight-5
}
Track_script {
? ? ? ? Chk_httpd
? ? }

? ? You can then further extend the functionality with the notification script;
? ? Call Method:
Vrrp_instance <String> {
...
Notify_master? " /etc/keepalived/notify.sh Master "
Notify_backup? " /etc/keepalived/notify.sh Backup "
Notify_fault? " /etc/keepalived/notify.sh Fault "
}
Note: To use double quotation marks;

? ? Sample script:
#!/bin/bash

Author:MageEduDescription:An Example of Notify script

#
contact= ' [email protected] '

Notify () {
Mailsubject= "$ (hostname) to be $1:VIP floating"
Mailbody= "$ (date + '%F%h:%m:%s '): VRRP transition, $ (hostname) changed to be $"
echo $mailbody | Mail-s "$mailsubject" $contact
}

Case $ in
Master
Notify Master
Exit 0
;;
Backup
Notify Backup
Exit 0
;;
Fault
Notify fault
Exit 0
;;
*)
echo "Usage: $ (basename $) {Master|backup|fault}"
Exit 1
;;
Esac

LVS (Linux virtual server), keepalived

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.