標籤:yarn ha zookeeper resourcemanager failover
The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs).
Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure
在任何一個時間點,只有一個ResourceManager是Active的,其餘的一個或者多個是Standby狀態。狀態切換既可以通過cli手動切換,也可以通過 integrated failover-controller切換。如果是自動切換,就必須要用到zookeepe了。下面就詳細介紹YARN HA自動切換模式的相關配置。
1、首先修改yarn-site.xml檔案,以下藍色字型部分為新新增內容。
[[email protected] hadoop]$ vi yarn-site.xml <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <!--add start 20160627 --> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>hadoop01:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop01:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop01:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>hadoop01:8033</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop01:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!-- add end 20160627 --> <!-- add start 20161012 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>rmCluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop01</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop02</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop01:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop02:8088</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <!-- add end 20161012 --> </configuration>
|
2、在hadoop01伺服器上,啟動hadoop叢集。("..." 部分為路徑縮寫),輸出顯示,start-all.sh只啟動了一個ResourceManager。
[[email protected] hadoop]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 16/07/04 12:17:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop01 hadoop02] hadoop02: starting namenode, logging to /.../hadoop-hadoop-namenode-hadoop02.out hadoop01: starting namenode, logging to /.../hadoop-hadoop-namenode-hadoop01.out hadoop02: starting datanode, logging to /.../hadoop-hadoop-datanode-hadoop02.out hadoop01: starting datanode, logging to /.../hadoop-hadoop-datanode-hadoop01.out hadoop03: starting datanode, logging to /.../hadoop-hadoop-datanode-hadoop03.out Starting journal nodes [hadoop01 hadoop02 hadoop03] hadoop02: starting journalnode, logging to /.../hadoop-hadoop-journalnode-hadoop02.out hadoop01: starting journalnode, logging to /.../hadoop-hadoop-journalnode-hadoop01.out hadoop03: starting journalnode, logging to /.../hadoop-hadoop-journalnode-hadoop03.out Starting ZK Failover Controllers on NN hosts [hadoop01 hadoop02] hadoop02: starting zkfc, logging to /.../hadoop-hadoop-zkfc-hadoop02.out hadoop01: starting zkfc, logging to /.../hadoop-hadoop-zkfc-hadoop01.out starting yarn daemons starting resourcemanager, logging to /.../yarn-hadoop-resourcemanager-hadoop01.out hadoop01: starting nodemanager, logging to /.../yarn-hadoop-nodemanager-hadoop01.out hadoop02: starting nodemanager, logging to /.../yarn-hadoop-nodemanager-hadoop02.out hadoop03: starting nodemanager, logging to /.../yarn-hadoop-nodemanager-hadoop03.out |
3、檢查hadoop叢集啟動進程,hadoop01機器共有以下進程。
[[email protected] hadoop]$ jps 5239 NodeManager 4839 JournalNode 5288 Jps 4632 DataNode 5032 DFSZKFailoverController 4521 NameNode 5116 ResourceManager
|
4、在hadoop02機器上啟動ResourceManager。
[[email protected] ~]$ yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /home/hadoop/hadoop-2.7.2//logs/yarn-hadoop-resourcemanager-hadoop02.out
|
5、分別檢查兩個ResourceManager的狀態
[[email protected] ~]$ yarn rmadmin -getServiceState rm1 active [[email protected] ~]$ yarn rmadmin -getServiceState rm2 standby |
6、通過圖形介面查看ResourceManager狀態
表明hadoop01上的ResourceManager是active狀態
650) this.width=650;" width="501" height="207" title="Snap1.jpg" style="width:706px;height:267px;" alt="wKiom1gLgiCzepWGAAJqge9s9kw421.jpg-wh_50" src="http://s1.51cto.com/wyfs02/M00/89/34/wKiom1gLgiCzepWGAAJqge9s9kw421.jpg-wh_500x0-wm_3-wmp_4-s_319531072.jpg" />
表名hadoop02上的ResourceManager是standby狀態
650) this.width=650;" width="501" height="211" title="Snap2.jpg" style="width:685px;height:321px;" alt="wKiom1gLgpbzyJk4AAJVfK8CxD4294.jpg-wh_50" src="http://s1.51cto.com/wyfs02/M01/89/34/wKiom1gLgpbzyJk4AAJVfK8CxD4294.jpg-wh_500x0-wm_3-wmp_4-s_2000338235.jpg" />
7、在hadoop02伺服器上,手動類比容錯移轉測試
[[email protected] ~]$ yarn rmadmin -transitionToStandby rm1 Automatic failover is enabled for [email protected] Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the --forcemanual flag.
|
8、在hadoop02伺服器上,重新檢查ResourceManager的狀態
[[email protected] ~]$ yarn rmadmin -getServiceState rm2 active |
9、通過web頁面檢查ResourceManager狀態
顯示hadoop01伺服器上的ResourceManager已經不能訪問。
650) this.width=650;" width="502" height="224" title="Snap4.jpg" style="width:721px;height:341px;" alt="wKioL1gLg_3z95qyAADv_hGLQrs783.jpg-wh_50" src="http://s4.51cto.com/wyfs02/M00/89/31/wKioL1gLg_3z95qyAADv_hGLQrs783.jpg-wh_500x0-wm_3-wmp_4-s_1231144003.jpg" />
顯示hadoop02伺服器上的狀態為active
650) this.width=650;" width="502" height="200" title="Snap3.jpg" style="width:707px;height:304px;" alt="wKioL1gLhF_ADB96AAJtrjClYIU337.jpg-wh_50" src="http://s2.51cto.com/wyfs02/M02/89/31/wKioL1gLhF_ADB96AAJtrjClYIU337.jpg-wh_500x0-wm_3-wmp_4-s_2954610084.jpg" />
10、另外,當我們訪問standby狀態的ResourceManager是,系統自動將頁面重新導向到active狀態的ResourceManager上。
Assuming a standby RM is up and running, the Standby automatically redirects all web requests to the Active, except for the “About” page.
本文出自 “沈進群” 部落格,謝絕轉載!
大資料:從入門到XX(九)