利用lvs keepalived配置redis高可用及負載均衡

最後更新：2014-07-16 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：des http 使用檔案 os 資料

需求

我們用ES做Log Service,架構是上遊資料來源=>redis=>logstash=>ES

redis目前還是單點, 沒有做高可用, 現在資料量越來越多, 如果下遊消費不出問題還好, redis裡面資料來了就走,但是下遊一旦出問題, 分給redis的記憶體半小時就撐滿了.

看到redis3.0 beta版本已經提供了叢集功能, 但是需要client以叢集模式接入, 我們這麼多上遊使用者, 不太可能統一要求他們改造.

公司也有硬體的LB, 同事在E公司的時候就是用的硬體LB. 但接入還要申請, 而且目前redis結構還沒確定,變化還比較大, 以後要改來改去, 公司流程挺麻煩的… 二來想也自己折騰一下~ 就選擇了LVS的方案.

目標

高可用. 每個伺服器都跑一個(或者多個)redis-server執行個體, 一個執行個體掛了, 或者一個伺服器當了, 可以無縫移交到另外的執行個體/伺服器. 資料可能會有丟失,如果以後對資料可靠性有高要求,會配合dump,還有master slave, 現在暫不考慮
負載平衡. 只考慮高可用的話, 其實可以用keepalived, 一個redis-server/伺服器掛了, VIP就轉到另外一台, 但backup的那台機器就資源空閑著, 我們公司小, 不能這麼浪費..
不需要客戶做改造, 不需要他們重啟, 讓他們感覺不到!.. 讓他們改東西..,還是算了, 二來,統計有多少客戶都不好統計,怪自己平時不喜歡寫文檔記錄..

設計

兩個real server, 192.168.81.51, 192.168.81.234

一個VIP 192.168.81.229

每個上面起一個redis-server執行個體, 6379連接埠.

VIP在master上, round robin轉串連到其中一個伺服器. (每個客戶過來的資料量大小不同,而且redis基本上都是長串連,不像Http,所以沒有做到完全的負載平衡)

以後可以考慮做master slave. 比如在B機器上面跑一個17379的執行個體做A機器上面6379的slave. 反之亦然.

環境

#uname -aLinux VMS02564 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux#cat /etc/*releaseCentOS release 5.8 (Final)[[email protected] etc]$

實現軟體準備起來安裝lvs核心模組, 這個預設已經安裝了

 modprobe -l|grep -i ipvs/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_dh.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_ftp.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_lblc.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_lblcr.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_lc.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_nq.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_rr.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_sed.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_sh.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_wlc.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_wrr.ko

安裝ipvsadm.

yum就可以安裝. 這個其實有沒有都行, 是管理lvs用的. 還沒有仔細看用法 , 以後會看看

安裝keepalived.

現在最新版本是1.2.13, 但源碼下下來, 一直少個依賴, 沒搞定, 拉倒. 換1.2.8. keepalived有個坑爹的地方, 就是如果設定檔有錯,或者乾脆就沒有設定檔,啟動的時候也不會報錯. 預設設定檔使用/etc/keepalived/keepalived.conf, 如果安裝在其他地方,請考過來.

配置起來

配置起來之前一定要懂得原理,這個原理也是我這次配置學到的最多的東西,也是記下來的最重要的原因.明白了原理遇到困難就可以快速診斷解決,否則只能黑盒子亂猜,猜對了是運氣,更是個坑.

keepalived配置

vrrp_instance VI_1 {    state MASTER            #以master啟動, 若別的節點優先順序高,轉成backup    interface eth0    virtual_router_id 51    #node之間的ID要一樣    priority 100                #優先順序大的做master    advert_int 1    authentication {        auth_type PASS      #節點間的認證方式        auth_pass 1111          #節點間一致    }    virtual_ipaddress {        192.168.81.229    }}#虛擬機器主機配置virtual_server 192.168.81.229 6379 {  #設定VIP port     delay_loop 6           #每個6秒檢查一次real_server狀態     lb_algo rr             #lvs調度演算法這裡使用加權輪詢 有：rr|wrr|lc|wlc|lblc|sh|dh     lb_kind DR             #負載平衡轉寄規則NAT|DR|TUN     #persistence_timeout 60 #會話保持時間     protocol TCP           #使用協議TCP或者UDP  real_server 192.168.81.51 6379 {         weight 50        TCP_CHECK {            #tcp健全狀態檢查             #connect_timeout 3     #連線逾時時間             #nb_get_retry 2         #重連次數             #delay_before_retry 3   #重連間隔時間             connect_port 6379        #健全狀態檢查連接埠             }     }     real_server 192.168.81.234 6379 {         weight 50            TCP_CHECK {            #tcp健全狀態檢查             connect_port 6379        #健全狀態檢查連接埠         }     }  }

分兩部分, 上半部分是建一個vrrp執行個體(什麼是vrrp?). 如果不要下面的虛擬機器主機配置,就是HA, redis client會連到當前VIP所在的節點. keepalived掛了之後, backup會變成master,VIP換到新的master上面. 但這樣不能做Load balance.

手工配置虛IP
配置:
ifconfig eth0:1 VIP netmask 255.255.255.0
刪除:
ifconfig ethos:1 down

下半部分, 就是load balance配置了. 我想, keepalived就是按照這個配置去配置了一下lvs. 用ipvsadm可以看到.

#ipvsadmIP Virtual Server version 1.2.1 (size=4096)Prot LocalAddress:Port Scheduler Flags  -> RemoteAddress:Port           Forward Weight ActiveConn InActConnTCP  192.168.81.229:6379 rr  -> 192.168.81.234:6379          Route   1      0          0  -> VMS02245:6379                Local   1      0          0

兩個節點上起來keepalived就可以了, 嗎? 如果不需要LB, 只做HA,只要上半部分配置,跑起來就好了. 但如果要LB,還需要下面的系統配置.

系統配置

在瞭解系統配置之前,一定要先搞明白lvs的原理. 如果只是急著把它配起來,還是不要繼續看了, 否則是個坑.

DR轉寄原理

我是用的lvs裡面的DR(direct routing)轉寄方式.

當一個client發送一個WEB請求到VIP，LVS伺服器根據VIP選擇對應的real-server的Pool，根據演算法，在Pool中選擇一台Real-server，LVS在hash表中記錄該次串連，然後將client的請求包發給選擇的Real-server(只修改了包的目的mac地址)，最後選擇的Real-server把應答包直接傳給client；當client繼續發包過來時，LVS根據更才記錄的hash表的資訊，將屬於此次串連的請求直接發到剛才選擇的Real-server上；當串連中止或者逾時，hash表中的記錄將被刪除。
from LVS的三種模式區別詳解 — Jason Wu’s Thoughts and Writings

由於DR轉寄只是改了目的MAC地址,目的IP並沒有變,還是VIP, 所以如果realserver上面沒有配置這個VIP,包會被直接丟棄. 所以,必須在realserver上面也配置一個掩碼為32的VIP,如下:

ifconfig lo:1 VIP netmask 255.255.255.0 up

但是這樣, 帶來一個麻煩問題: 有人問誰的IP是192.168.81.229的時候, 這兩個網卡都說, 是我是我是我. 那包發給誰呢, 那就看誰的回答先到了. 看圖:

#tcpdump -e -nn host 192.168.81.229tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes22:27:50.720431 00:50:56:92:05:b9 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: arp who-has 192.168.81.229 tell 192.168.81.15622:27:50.720858 00:50:56:92:4d:6d > 00:50:56:92:05:b9, ethertype ARP (0x0806), length 60: arp reply 192.168.81.229 is-at 00:50:56:92:4d:6d22:27:50.720881 00:50:56:92:05:b9 > 00:50:56:92:4d:6d, ethertype IPv4 (0x0800), length 98: 192.168.81.156 > 192.168.81.229: ICMP echo request, id 31307, seq 1, length 6422:27:50.721040 00:50:56:92:36:44 > 00:50:56:92:05:b9, ethertype ARP (0x0806), length 60: arp reply 192.168.81.229 is-at 00:50:56:92:36:4422:27:50.721130 00:50:56:92:4d:6d > 00:50:56:92:05:b9, ethertype IPv4 (0x0800), length 98: 192.168.81.229 > 192.168.81.156: ICMP echo reply, id 31307, seq 1, length 64

在另外一台主機C上Ping 192.168.81.229的時候, 兩個節點都說229在這裡. C主機選擇最先回答的主機發了icmp包. 這太不靠譜了, 我們一定要讓我們的包發到真正的主機上.

還好Linux系統有個關於arp請求響應的配置~

        echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore        echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce        echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore        echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce

關於這個配置及其含義:

arp_announce - INTEGER    Define different restriction levels for announcing the local    source IP address from IP packets in ARP requests sent on    interface:    0 - (default) Use any local address, configured on any interface    1 - Try to avoid local addresses that are not in the target‘s    subnet for this interface. This mode is useful when target    hosts reachable via this interface require the source IP    address in ARP requests to be part of their logical network    configured on the receiving interface. When we generate the    request we will check all our subnets that include the    target IP and will preserve the source address if it is from    such subnet. If there is no such subnet we select source    address according to the rules for level 2.    2 - Always use the best local address for this target.    In this mode we ignore the source address in the IP packet    and try to select local address that we prefer for talks with    the target host. Such local address is selected by looking    for primary IP addresses on all our subnets on the outgoing    interface that include the target IP address. If no suitable    local address is found we select the first local address    we have on the outgoing interface or on all other interfaces,    with the hope we will receive reply for our request and    even sometimes no matter the source IP address we announce.    The max value from conf/{all,interface}/arp_announce is used.    Increasing the restriction level gives more chance for    receiving answer from the resolved target while decreasing    the level announces more valid sender‘s information.arp_ignore - INTEGER    Define different modes for sending replies in response to    received ARP requests that resolve local target IP addresses:    0 - (default): reply for any local target IP address, configured    on any interface    1 - reply only if the target IP address is local address    configured on the incoming interface    2 - reply only if the target IP address is local address    configured on the incoming interface and both with the    sender‘s IP address are part from same subnet on this interface    3 - do not reply for local addresses configured with scope host,    only resolutions for global and link addresses are replied    4-7 - reserved    8 - do not reply for all local addresses    The max value from conf/{all,interface}/arp_ignore is used    when ARP request is received on the {interface}

from Using arp announce/arp ignore to disable ARP - LVSKB

配置lo

ifconfig lo:1 VIP netmask 255.255.255.0 up

配置arp

echo “1” >/proc/sys/net/ipv4/conf/lo/arp_ignore

echo “2” >/proc/sys/net/ipv4/conf/lo/arp_announce

echo “1” >/proc/sys/net/ipv4/conf/all/arp_ignore

echo “2” >/proc/sys/net/ipv4/conf/all/arp_announce

同時只能有一個節點上跑lvs轉寄

如果兩個上面都跑了相同配置的keepalived, 那麼A轉到B的資料, B再轉給A, A再轉給B, B再轉給A, 就是不回給你… 所以呢, 同時只能在一個上面跑lvs.

我本來想, keepalive應該支援這種配置,就是變成master的時候,才啟用某些配置(比如說virtual server),但好像是不行. 於是只能用一種比較繞的方式了, 話不多話, 最終配置:

#cat master.conf

global_defs {   router_id LVS_DEVEL}vrrp_instance VI_1 {    state BACKUP    interface eth0    virtual_router_id 51    priority 99    advert_int 1    authentication {        auth_type PASS        auth_pass 1111    }    virtual_ipaddress {        192.168.81.229/24    }    notify_master "/etc/keepalived/notify_master.sh"    notify_backup "/etc/keepalived/notify_backup.sh"}virtual_server 192.168.81.229 6379 {    delay_loop 6    lb_algo rr    lb_kind DR    persistence_timeout 0    protocol TCP    real_server 192.168.81.51 6379 {        weight 1        TCP_CHECK {          connect_port    6379          connect_timeout 3        }    }    real_server 192.168.81.234 6379 {        weight 1        TCP_CHECK {          connect_port    6379          connect_timeout 3        }    }}

#cat notify_master.sh

#!/bin/shecho “0” >/proc/sys/net/ipv4/conf/lo/arp_ignoreecho “0” >/proc/sys/net/ipv4/conf/lo/arp_announceecho “0” >/proc/sys/net/ipv4/conf/all/arp_ignoreecho “0” >/proc/sys/net/ipv4/conf/all/arp_announce  diff /etc/keepalived/keepalived.conf /etc/keepalived/master.confif test "$?" != "0"; then    cp /etc/keepalived/master.conf /etc/keepalived/keepalived.conf    killall -HUP keepalivedfi

#cat notify_backup.sh

#!/bin/shecho "1" >/proc/sys/net/ipv4/conf/lo/arp_ignoreecho "2" >/proc/sys/net/ipv4/conf/lo/arp_announceecho "1" >/proc/sys/net/ipv4/conf/all/arp_ignoreecho "2" >/proc/sys/net/ipv4/conf/all/arp_announce  diff /etc/keepalived/keepalived.conf /etc/keepalived/backup.confif test "$?" != "0"; then    cp /etc/keepalived/backup.conf /etc/keepalived/keepalived.conf    killall -HUP keepalivedfi

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More