標籤:ha spark 工作原理
Spark高可用HA實戰
Spark叢集工作原理詳解
資源主要指記憶體、CPU
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
如果是單點的話,Master如果出現故障,則叢集不能對外工作
Spark通過Zookeeper做HA,一般做HA是一個active層級,standby
active就是當前工作
standby是隨時準備active的掛了之後會切換成為active層級
以前一般是2台機器,一個active,一個standby
現在一般是3台機器,一個active,兩個standby,甚至3台以上
Zookeeper包含了哪些內容:所有的Woker、Driver、Application,當active掛了之後,Zookeeper會選取一台standby作為leader然後恢複Woker、Driver、Application,這些都完成之後,leader才會編程covering,才能編程active,只有當它變成active之後才能對外繼續提供服務,進行正常的作業提交
一個叢集的運行會不會因為Master的切換影響業務程式的運行?粗粒度下不會,因為程式在運行之前,已經向Master 申請過資源了,之後的作業只是worker和excutor之間的互動,和Master沒關係,這個是切換之前已經分配好資源了。這個是粗粒度的情況下。細粒度下會。
細粒度弊端,任務啟動非常慢。正常一般大資料處理,實際情況下,一般都是粗粒度。
開始動手:
==========下載Zookeeper============
http://zookeeper.apache.org/
下載好之後解壓到/usr/local/ 下,然後配置環境變數
export ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.6
export PATH=$PATH:${ZOOKEEPER_HOME}/bin
zookeeper是單獨安裝的
因為是做分布式的,所以zookeeper要放到多台機器上
我們把它放到Worker1、Worker2中
進入到zookeeper下,建立data和logs兩個目錄
[email protected]:/usr/local/zookeeper-3.4.6# mkdir data
[email protected]:/usr/local/zookeeper-3.4.6# mkdir logs
[email protected]:/usr/local/zookeeper-3.4.6/conf# vi zoo_sample.cfg
dataDir這個必須修改,不然重啟之後資料就會被刪除
[email protected]:/usr/local/zookeeper-3.4.6/conf# cp zoo_sample.cfg zoo.cfg
[email protected]:/usr/local/zookeeper-3.4.6/conf# vi zoo.cfg
修改(做3台機器的叢集)
dataDir=/usr/local/zookeeper-3.4.6/data
dataLogDir=/usr/local/zookeeper-3.4.6/logs
server.0=Master:2888:3888
server.1=Worker1:2888:3888
server.2=Worker2:2888:3888
[email protected]:/usr/local/zookeeper-3.4.6/conf# cd ../data/
為機器編號
[email protected]:/usr/local/zookeeper-3.4.6/data# echo 0>myid
[email protected]:/usr/local/zookeeper-3.4.6/data# echo 0>>myid
[email protected]:/usr/local/zookeeper-3.4.6/data# ls
myid
[email protected]:/usr/local/zookeeper-3.4.6/data# cat myid
[email protected]:/usr/local/zookeeper-3.4.6/data# vi myid
[email protected]:/usr/local/zookeeper-3.4.6/data# cat myid
0
[email protected]:/usr/local/zookeeper-3.4.6/data#
到這個時候一台機器已經配置好了
Zookeeper配置好叢集之後,當spark重啟之後,可以把上次叢集的資訊全部同步過來。不然spark重啟就一切重頭再來。
[email protected]:/usr/local# scp -r ./zookeeper-3.4.6 [email protected]:/usr/local
[email protected]:/usr/local# scp -r ./zookeeper-3.4.6 [email protected]:/usr/local
然後分別進去Worker1和Worker2更改myid為1和2
[email protected]:/usr/local/zookeeper-3.4.6# cd bin
[email protected]:/usr/local/zookeeper-3.4.6/bin# ls
README.txt zkCli.cmd zkEnv.cmd zkServer.cmd
zkCleanup.sh zkCli.sh zkEnv.sh zkServer.sh
[email protected]:/usr/local/zookeeper-3.4.6/bin# cd bin
bash: cd: bin: No such file or directory
[email protected]:/usr/local/zookeeper-3.4.6/bin# zkServer.sh start
然後三台機器分別啟動
下一步就是讓Spark支援zookeeper下HA
到spark-env.sh中配置
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# vi spark-env.sh
//整個叢集的狀態的維護和恢複都是通過zookeeper的,狀態資訊都是
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=Master:2181,Worker1:2181,Worker2:2181 -Dspark.deploy.zookeeper.dir=/spark"
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
已經配置叢集了,所以還要注釋
#export SPARK_MASTER_IP=Master
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh [email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
spark-env.sh 100% 500 0.5KB/s 00:00
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh [email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
spark-env.sh 100% 500 0.5KB/s 00:00
然後就是啟動Spark,./start-all.sh
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
為什麼Master裡面有Master進程,Worker1、2沒有?
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/sbin# cd ../conf/
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# cat slaves
Master
Worker1
Worker2
由於我們只有一個master,所以必須到Worker1和2上啟動master
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/bin# cd $SPARK_HOME/sbin
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/sbin# ./start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-Worker1.out
[email protected]:/usr/local/zookeeper-3.4.6/bin# cd $SPARK_HOME/sbin
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/sbin# ./start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-Worker2.out
http://master:8080/看情況
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
http://worker1:8080/
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
沒有worker,是standby狀態,完全是備胎狀態
http://worker2:8080/
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
沒有worker,是standby狀態,完全是備胎狀態
用spark-shell來實驗HA
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# cd $SPARK_HOME/bin
[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/bin# ./spark-shell --master spark://Master:7077,Worker1:7077,Worker2:7077
開機記錄有註冊:
16/02/04 21:12:29 INFO client.AppClient$ClientEndpoint: Connecting to master spark://Master:7077...
16/02/04 21:12:29 INFO client.AppClient$ClientEndpoint: Connecting to master spark://Worker1:7077...
16/02/04 21:12:29 INFO client.AppClient$ClientEndpoint: Connecting to master spark://Worker2:7077...
但實際只跟active的那台機器互動
來cd $SPARK_HOME/sbin
./stop-master.sh
看到spark-shell裡面有動靜了
scala> 16/02/04 21:17:11 WARN client.AppClient$ClientEndpoint: Connection to Master:7077 failed; waiting for master to reconnect...
16/02/04 21:17:11 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
16/02/04 21:17:11 WARN client.AppClient$ClientEndpoint: Connection to Master:7077 failed; waiting for master to reconnect...
16/02/04 21:17:44 INFO client.AppClient$ClientEndpoint: Master has changed, new master is at spark://Worker1:7077
Worker1變成Active花了半分鐘,不會是馬上變的,編程Active視叢集規模而言
http://master:8080/ 連不上了
http://worker1:8080/
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
http://worker2:8080/ 還是剛才那樣
運行下PI計算
./spark-submit --class org.apache.spark.examples.SparkPi --master spark://Master:7077,Worker1:7077,Worker2:7077 ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 100
然後啟動Master機器的master
此時Master機器變成了standby模式
http://master:8080/
650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />
一般生產環境100、200G記憶體
王家林老師名片:
中國Spark第一人
新浪微博:http://weibo.com/ilovepains
公眾號:DT_Spark
部落格:http://blog.sina.com.cn/ilovepains
手機:18610086859
QQ:1740415547
郵箱:[email protected]
本文出自 “一枝花傲寒” 部落格,謝絕轉載!
HA下Spark叢集工作原理(DT大資料夢工廠)