標籤:style color io os ar strong 資料 sp art
“決勝雲端運算大資料時代”
Spark亞太地區研究院100期公益大講堂 【第15期互動問答分享】
Q1:AppClient和worker、master之間的關係是什嗎?
:AppClient是在StandAlone模式下SparkContext.runJob的時候在Client機器上應 用程式的代表,要完成程式的registerApplication等功能;
當程式完成註冊後Master會通過Akka發送訊息給用戶端來啟動Driver;
在Driver中管理Task和控制Worker上的Executor來協同工作;
Q2:Spark的shuffle 和hadoop的shuffle的區別大嗎?
Q3:Spark 的HA怎麼處理的?
對於Master的HA,在Standalone模式下,Worker節點自動是HA的,對於Master的HA,一般採用Zookeeper;
Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected “leader” and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master’s state, and then resume scheduling. The entire recovery process (from the time the the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications – applications that were already running during Master failover are unaffected;
對於Yarn和Mesos模式,ResourceManager一般也會採用ZooKeeper進行HA;
【互動問答分享】第15期決勝雲端運算大資料時代Spark亞太地區研究院公益大講堂