HA下Spark叢集工作原理(DT大資料夢工廠)

來源:互聯網
上載者:User

標籤:ha   spark   工作原理   

Spark高可用HA實戰

Spark叢集工作原理詳解

資源主要指記憶體、CPU

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

如果是單點的話,Master如果出現故障,則叢集不能對外工作

Spark通過Zookeeper做HA,一般做HA是一個active層級,standby

active就是當前工作

standby是隨時準備active的掛了之後會切換成為active層級

以前一般是2台機器,一個active,一個standby

現在一般是3台機器,一個active,兩個standby,甚至3台以上

Zookeeper包含了哪些內容:所有的Woker、Driver、Application,當active掛了之後,Zookeeper會選取一台standby作為leader然後恢複Woker、Driver、Application,這些都完成之後,leader才會編程covering,才能編程active,只有當它變成active之後才能對外繼續提供服務,進行正常的作業提交

一個叢集的運行會不會因為Master的切換影響業務程式的運行?粗粒度下不會,因為程式在運行之前,已經向Master 申請過資源了,之後的作業只是worker和excutor之間的互動,和Master沒關係,這個是切換之前已經分配好資源了。這個是粗粒度的情況下。細粒度下會。

細粒度弊端,任務啟動非常慢。正常一般大資料處理,實際情況下,一般都是粗粒度。

開始動手:

==========下載Zookeeper============

http://zookeeper.apache.org/

下載好之後解壓到/usr/local/ 下,然後配置環境變數

export ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.6

export PATH=$PATH:${ZOOKEEPER_HOME}/bin

zookeeper是單獨安裝的

因為是做分布式的,所以zookeeper要放到多台機器上

我們把它放到Worker1、Worker2中

進入到zookeeper下,建立data和logs兩個目錄

[email protected]:/usr/local/zookeeper-3.4.6# mkdir data

[email protected]:/usr/local/zookeeper-3.4.6# mkdir logs

[email protected]:/usr/local/zookeeper-3.4.6/conf# vi zoo_sample.cfg

dataDir這個必須修改,不然重啟之後資料就會被刪除

[email protected]:/usr/local/zookeeper-3.4.6/conf# cp zoo_sample.cfg zoo.cfg

[email protected]:/usr/local/zookeeper-3.4.6/conf# vi zoo.cfg

修改(做3台機器的叢集)

dataDir=/usr/local/zookeeper-3.4.6/data

dataLogDir=/usr/local/zookeeper-3.4.6/logs

server.0=Master:2888:3888

server.1=Worker1:2888:3888

server.2=Worker2:2888:3888

[email protected]:/usr/local/zookeeper-3.4.6/conf# cd ../data/

為機器編號

[email protected]:/usr/local/zookeeper-3.4.6/data# echo 0>myid

[email protected]:/usr/local/zookeeper-3.4.6/data# echo 0>>myid

[email protected]:/usr/local/zookeeper-3.4.6/data# ls

myid

[email protected]:/usr/local/zookeeper-3.4.6/data# cat myid

[email protected]:/usr/local/zookeeper-3.4.6/data# vi myid

[email protected]:/usr/local/zookeeper-3.4.6/data# cat myid

0

[email protected]:/usr/local/zookeeper-3.4.6/data#

到這個時候一台機器已經配置好了

Zookeeper配置好叢集之後,當spark重啟之後,可以把上次叢集的資訊全部同步過來。不然spark重啟就一切重頭再來。

[email protected]:/usr/local# scp -r ./zookeeper-3.4.6 [email protected]:/usr/local

[email protected]:/usr/local# scp -r ./zookeeper-3.4.6 [email protected]:/usr/local

然後分別進去Worker1和Worker2更改myid為1和2

[email protected]:/usr/local/zookeeper-3.4.6# cd bin

[email protected]:/usr/local/zookeeper-3.4.6/bin# ls

README.txt    zkCli.cmd  zkEnv.cmd  zkServer.cmd

zkCleanup.sh  zkCli.sh   zkEnv.sh   zkServer.sh

[email protected]:/usr/local/zookeeper-3.4.6/bin# cd bin

bash: cd: bin: No such file or directory

[email protected]:/usr/local/zookeeper-3.4.6/bin# zkServer.sh start

然後三台機器分別啟動

下一步就是讓Spark支援zookeeper下HA

到spark-env.sh中配置

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# vi spark-env.sh

//整個叢集的狀態的維護和恢複都是通過zookeeper的,狀態資訊都是

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=Master:2181,Worker1:2181,Worker2:2181 -Dspark.deploy.zookeeper.dir=/spark"

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

已經配置叢集了,所以還要注釋

#export SPARK_MASTER_IP=Master

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh [email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh

spark-env.sh                                  100%  500     0.5KB/s   00:00   

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh [email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh

spark-env.sh                                  100%  500     0.5KB/s   00:00

然後就是啟動Spark,./start-all.sh

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

為什麼Master裡面有Master進程,Worker1、2沒有?

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/sbin# cd ../conf/

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# cat slaves

Master

Worker1

Worker2

由於我們只有一個master,所以必須到Worker1和2上啟動master

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/bin# cd $SPARK_HOME/sbin

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/sbin# ./start-master.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-Worker1.out

[email protected]:/usr/local/zookeeper-3.4.6/bin# cd $SPARK_HOME/sbin

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/sbin# ./start-master.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-1.6.0-bin-hadoop2.6/logs/spark-root-org.apache.spark.deploy.master.Master-1-Worker2.out

http://master:8080/看情況

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

http://worker1:8080/

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

沒有worker,是standby狀態,完全是備胎狀態

http://worker2:8080/

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

沒有worker,是standby狀態,完全是備胎狀態

用spark-shell來實驗HA

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# cd $SPARK_HOME/bin

[email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/bin# ./spark-shell --master spark://Master:7077,Worker1:7077,Worker2:7077

開機記錄有註冊:

16/02/04 21:12:29 INFO client.AppClient$ClientEndpoint: Connecting to master spark://Master:7077...

16/02/04 21:12:29 INFO client.AppClient$ClientEndpoint: Connecting to master spark://Worker1:7077...

16/02/04 21:12:29 INFO client.AppClient$ClientEndpoint: Connecting to master spark://Worker2:7077...

但實際只跟active的那台機器互動

來cd $SPARK_HOME/sbin

./stop-master.sh

看到spark-shell裡面有動靜了

scala> 16/02/04 21:17:11 WARN client.AppClient$ClientEndpoint: Connection to Master:7077 failed; waiting for master to reconnect...

16/02/04 21:17:11 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...

16/02/04 21:17:11 WARN client.AppClient$ClientEndpoint: Connection to Master:7077 failed; waiting for master to reconnect...

16/02/04 21:17:44 INFO client.AppClient$ClientEndpoint: Master has changed, new master is at spark://Worker1:7077

Worker1變成Active花了半分鐘,不會是馬上變的,編程Active視叢集規模而言

http://master:8080/  連不上了

http://worker1:8080/

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

http://worker2:8080/ 還是剛才那樣

運行下PI計算

./spark-submit  --class org.apache.spark.examples.SparkPi --master spark://Master:7077,Worker1:7077,Worker2:7077 ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 100

然後啟動Master機器的master

此時Master機器變成了standby模式

http://master:8080/

650) this.width=650;" src="/e/u261/themes/default/images/spacer.gif" style="background:url("/e/u261/lang/zh-cn/images/localimage.png") no-repeat center;border:1px solid #ddd;" alt="spacer.gif" />

一般生產環境100、200G記憶體

王家林老師名片:

中國Spark第一人

新浪微博:http://weibo.com/ilovepains

公眾號:DT_Spark

部落格:http://blog.sina.com.cn/ilovepains

手機:18610086859

QQ:1740415547

郵箱:[email protected]


本文出自 “一枝花傲寒” 部落格,謝絕轉載!

HA下Spark叢集工作原理(DT大資料夢工廠)

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.