標籤:des http 使用 檔案 os 資料
需求
我們用ES做Log Service,架構是 上遊資料來源=>redis=>logstash=>ES
redis目前還是單點, 沒有做高可用, 現在資料量越來越多, 如果下遊消費不出問題還好, redis裡面資料來了就走,但是下遊一旦出問題, 分給redis的記憶體半小時就撐滿了.
看到redis3.0 beta版本已經提供了叢集功能, 但是需要client以叢集模式接入, 我們這麼多上遊使用者, 不太可能統一要求他們改造.
公司也有硬體的LB, 同事在E公司的時候就是用的硬體LB. 但接入還要申請, 而且目前redis結構還沒確定,變化還比較大, 以後要改來改去, 公司流程挺麻煩的… 二來想也自己折騰一下~ 就選擇了LVS的方案.
目標
- 高可用. 每個伺服器都跑一個(或者多個)redis-server執行個體, 一個執行個體掛了, 或者一個伺服器當了, 可以無縫移交到另外的執行個體/伺服器. 資料可能會有丟失,如果以後對資料可靠性有高要求,會配合dump,還有master slave, 現在暫不考慮
- 負載平衡. 只考慮高可用的話, 其實可以用keepalived, 一個redis-server/伺服器掛了, VIP就轉到另外一台, 但backup的那台機器就資源空閑著, 我們公司小, 不能這麼浪費..
- 不需要客戶做改造, 不需要他們重啟, 讓他們感覺不到!.. 讓他們改東西..,還是算了, 二來,統計有多少客戶都不好統計,怪自己平時不喜歡寫文檔記錄..
設計
兩個real server, 192.168.81.51, 192.168.81.234
一個VIP 192.168.81.229
每個上面起一個redis-server執行個體, 6379連接埠.
VIP在master上, round robin轉串連到其中一個伺服器. (每個客戶過來的資料量大小不同,而且redis基本上都是長串連,不像Http,所以沒有做到完全的負載平衡)
以後可以考慮做master slave. 比如在B機器上面跑一個17379的執行個體做A機器上面6379的slave. 反之亦然.
環境
#uname -aLinux VMS02564 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux#cat /etc/*releaseCentOS release 5.8 (Final)[[email protected] etc]$
實現軟體準備起來安裝lvs核心模組, 這個預設已經安裝了
modprobe -l|grep -i ipvs/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_dh.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_ftp.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_lblc.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_lblcr.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_lc.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_nq.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_rr.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_sed.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_sh.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_wlc.ko/lib/modules/2.6.18-308.el5/kernel/net/ipv4/ipvs/ip_vs_wrr.ko
安裝ipvsadm.
yum就可以安裝. 這個其實有沒有都行, 是管理lvs用的. 還沒有仔細看用法 , 以後會看看
安裝keepalived.
現在最新版本是1.2.13, 但源碼下下來, 一直少個依賴, 沒搞定, 拉倒. 換1.2.8. keepalived有個坑爹的地方, 就是如果設定檔有錯,或者乾脆就沒有設定檔,啟動的時候也不會報錯. 預設設定檔使用/etc/keepalived/keepalived.conf, 如果安裝在其他地方,請考過來.
配置起來
配置起來之前一定要懂得原理,這個原理也是我這次配置學到的最多的東西,也是記下來的最重要的原因.明白了原理遇到困難就可以快速診斷解決,否則只能黑盒子亂猜,猜對了是運氣,更是個坑.
keepalived配置
vrrp_instance VI_1 { state MASTER #以master啟動, 若別的節點優先順序高,轉成backup interface eth0 virtual_router_id 51 #node之間的ID要一樣 priority 100 #優先順序大的做master advert_int 1 authentication { auth_type PASS #節點間的認證方式 auth_pass 1111 #節點間一致 } virtual_ipaddress { 192.168.81.229 }}#虛擬機器主機配置virtual_server 192.168.81.229 6379 { #設定VIP port delay_loop 6 #每個6秒檢查一次real_server狀態 lb_algo rr #lvs調度演算法這裡使用加權輪詢 有:rr|wrr|lc|wlc|lblc|sh|dh lb_kind DR #負載平衡轉寄規則NAT|DR|TUN #persistence_timeout 60 #會話保持時間 protocol TCP #使用協議TCP或者UDP real_server 192.168.81.51 6379 { weight 50 TCP_CHECK { #tcp健全狀態檢查 #connect_timeout 3 #連線逾時時間 #nb_get_retry 2 #重連次數 #delay_before_retry 3 #重連間隔時間 connect_port 6379 #健全狀態檢查連接埠 } } real_server 192.168.81.234 6379 { weight 50 TCP_CHECK { #tcp健全狀態檢查 connect_port 6379 #健全狀態檢查連接埠 } } }
分兩部分, 上半部分是建一個vrrp執行個體(什麼是vrrp?). 如果不要下面的虛擬機器主機配置,就是HA, redis client會連到當前VIP所在的節點. keepalived掛了之後, backup會變成master,VIP換到新的master上面. 但這樣不能做Load balance.
手工配置虛IP
配置:
ifconfig eth0:1 VIP netmask 255.255.255.0
刪除:
ifconfig ethos:1 down
下半部分, 就是load balance配置了. 我想, keepalived就是按照這個配置去配置了一下lvs. 用ipvsadm可以看到.
#ipvsadmIP Virtual Server version 1.2.1 (size=4096)Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConnTCP 192.168.81.229:6379 rr -> 192.168.81.234:6379 Route 1 0 0 -> VMS02245:6379 Local 1 0 0
兩個節點上起來keepalived就可以了, 嗎? 如果不需要LB, 只做HA,只要上半部分配置,跑起來就好了. 但如果要LB,還需要下面的系統配置.
系統配置
在瞭解系統配置之前,一定要先搞明白lvs的原理. 如果只是急著把它配起來,還是不要繼續看了, 否則是個坑.
DR轉寄原理
我是用的lvs裡面的DR(direct routing)轉寄方式.
當一個client發送一個WEB請求到VIP,LVS伺服器根據VIP選擇對應的real-server的Pool,根據演算法,在Pool中選擇一台Real-server,LVS在hash表中記錄該次串連,然後將client的請求包發給選擇的Real-server(只修改了包的目的mac地址),最後選擇的Real-server把應答包直接傳給client;當client繼續發包過來時,LVS根據更才記錄的hash表的資訊,將屬於此次串連的請求直接發到剛才選擇的Real-server上;當串連中止或者逾時,hash表中的記錄將被刪除。
from LVS的三種模式區別詳解 — Jason Wu’s Thoughts and Writings
由於DR轉寄只是改了目的MAC地址,目的IP並沒有變,還是VIP, 所以如果realserver上面沒有配置這個VIP,包會被直接丟棄. 所以,必須在realserver上面也配置一個掩碼為32的VIP,如下:
ifconfig lo:1 VIP netmask 255.255.255.0 up
但是這樣, 帶來一個麻煩問題: 有人問誰的IP是192.168.81.229的時候, 這兩個網卡都說, 是我是我是我. 那包發給誰呢, 那就看誰的回答先
到了. 看圖:
#tcpdump -e -nn host 192.168.81.229tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes22:27:50.720431 00:50:56:92:05:b9 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: arp who-has 192.168.81.229 tell 192.168.81.15622:27:50.720858 00:50:56:92:4d:6d > 00:50:56:92:05:b9, ethertype ARP (0x0806), length 60: arp reply 192.168.81.229 is-at 00:50:56:92:4d:6d22:27:50.720881 00:50:56:92:05:b9 > 00:50:56:92:4d:6d, ethertype IPv4 (0x0800), length 98: 192.168.81.156 > 192.168.81.229: ICMP echo request, id 31307, seq 1, length 6422:27:50.721040 00:50:56:92:36:44 > 00:50:56:92:05:b9, ethertype ARP (0x0806), length 60: arp reply 192.168.81.229 is-at 00:50:56:92:36:4422:27:50.721130 00:50:56:92:4d:6d > 00:50:56:92:05:b9, ethertype IPv4 (0x0800), length 98: 192.168.81.229 > 192.168.81.156: ICMP echo reply, id 31307, seq 1, length 64
在另外一台主機C上Ping 192.168.81.229的時候, 兩個節點都說229在這裡. C主機選擇最先回答的主機發了icmp包. 這太不靠譜了, 我們一定要讓我們的包發到真正的主機上.
還好Linux系統有個關於arp請求響應的配置~
echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
關於這個配置及其含義:
arp_announce - INTEGER Define different restriction levels for announcing the local source IP address from IP packets in ARP requests sent on interface: 0 - (default) Use any local address, configured on any interface 1 - Try to avoid local addresses that are not in the target‘s subnet for this interface. This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface. When we generate the request we will check all our subnets that include the target IP and will preserve the source address if it is from such subnet. If there is no such subnet we select source address according to the rules for level 2. 2 - Always use the best local address for this target. In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host. Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address. If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we will receive reply for our request and even sometimes no matter the source IP address we announce. The max value from conf/{all,interface}/arp_announce is used. Increasing the restriction level gives more chance for receiving answer from the resolved target while decreasing the level announces more valid sender‘s information.arp_ignore - INTEGER Define different modes for sending replies in response to received ARP requests that resolve local target IP addresses: 0 - (default): reply for any local target IP address, configured on any interface 1 - reply only if the target IP address is local address configured on the incoming interface 2 - reply only if the target IP address is local address configured on the incoming interface and both with the sender‘s IP address are part from same subnet on this interface 3 - do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied 4-7 - reserved 8 - do not reply for all local addresses The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface}
from Using arp announce/arp ignore to disable ARP - LVSKB
配置lo
ifconfig lo:1 VIP netmask 255.255.255.0 up
配置arp
echo “1” >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo “2” >/proc/sys/net/ipv4/conf/lo/arp_announce
echo “1” >/proc/sys/net/ipv4/conf/all/arp_ignore
echo “2” >/proc/sys/net/ipv4/conf/all/arp_announce
同時只能有一個節點上跑lvs轉寄
如果兩個上面都跑了相同配置的keepalived, 那麼A轉到B的資料, B再轉給A, A再轉給B, B再轉給A, 就是不回給你… 所以呢, 同時只能在一個上面跑lvs.
我本來想, keepalive應該支援這種配置,就是變成master的時候,才啟用某些配置(比如說virtual server),但好像是不行. 於是只能用一種比較繞的方式了, 話不多話, 最終配置:
#cat master.conf
global_defs { router_id LVS_DEVEL}vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 99 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.81.229/24 } notify_master "/etc/keepalived/notify_master.sh" notify_backup "/etc/keepalived/notify_backup.sh"}virtual_server 192.168.81.229 6379 { delay_loop 6 lb_algo rr lb_kind DR persistence_timeout 0 protocol TCP real_server 192.168.81.51 6379 { weight 1 TCP_CHECK { connect_port 6379 connect_timeout 3 } } real_server 192.168.81.234 6379 { weight 1 TCP_CHECK { connect_port 6379 connect_timeout 3 } }}
#cat notify_master.sh
#!/bin/shecho “0” >/proc/sys/net/ipv4/conf/lo/arp_ignoreecho “0” >/proc/sys/net/ipv4/conf/lo/arp_announceecho “0” >/proc/sys/net/ipv4/conf/all/arp_ignoreecho “0” >/proc/sys/net/ipv4/conf/all/arp_announce diff /etc/keepalived/keepalived.conf /etc/keepalived/master.confif test "$?" != "0"; then cp /etc/keepalived/master.conf /etc/keepalived/keepalived.conf killall -HUP keepalivedfi
#cat notify_backup.sh
#!/bin/shecho "1" >/proc/sys/net/ipv4/conf/lo/arp_ignoreecho "2" >/proc/sys/net/ipv4/conf/lo/arp_announceecho "1" >/proc/sys/net/ipv4/conf/all/arp_ignoreecho "2" >/proc/sys/net/ipv4/conf/all/arp_announce diff /etc/keepalived/keepalived.conf /etc/keepalived/backup.confif test "$?" != "0"; then cp /etc/keepalived/backup.conf /etc/keepalived/keepalived.conf killall -HUP keepalivedfi