ResourceManager in YARN is responsible for resource management and scheduling of the entire system, and maintains the ApplictionMaster information, NodeManager information, and resource usage information of each application. After version 2.4, HadoopCommon also provides HA functions to solve the reliability and fault tolerance problems of such basic services. 
Resource Manager in YARN is responsible for Resource management and scheduling of the entire system, and internally maintains ApplictionMaster information, NodeManager information, and Resource usage information of each application. Later than version 2.4, Hadoop Common also provides the HA function to solve the reliability and fault tolerance problems of such basic services.
 
 
Resource Manager in YARN is responsible for Resource management and scheduling of the entire system, and internally maintains ApplictionMaster information, NodeManager information, and Resource usage information of each application. Later than version 2.4, Hadoop Common also provides the HA function to solve the reliability and fault tolerance problems of such basic services. The architecture is as follows:
Overview of ResourceManager High Availability
Rm ha and nn ha have many similarities (NameNode HA configuration details ):
(1 ).Active/Standby architecture. Only one RM is Active at a time, as shown in ).
(2 ).Depends on zooKeeper implementation. Manually switch to the yarn rmadmin command (similar to the hdfs haadmin command), while ZKFailoverController is used for automatic failover. But the difference is that zkfc is started only as a thread in RM rather than an independent daemon.
(3 ).When there are multiple RM, The yarn-site.xml used by the client needs to specify the list of RM. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) will look for the active RM in the round training mode, that is, AM
S and NMs must provide their own fault tolerance mechanisms. If the RM in the current activity status fails, the new RM will be found in the round training mode. To implement this logic, you must specify the class org. apache. hadoop. yarn. client. RMFailoverProxyProvider in yarn. client. failover-proxy-provider.
In addition, the new RM can restore the former RM State (for details, see ResourceManger Restart ). When RM Restart is started, the restarted RM loads the status information of the previous active RM and continues the previous RM operation. This way, the application periodically performs the checkpoint operation to avoid the loss of work content. In Active/standby RM, both active RM status data and standby status data must be accessed by using the FileSystemRMStateStore or zooKeeper method (ZKRMStateStore ). The latter allows only one RM to have the write permission at a time.
A common yarn rm ha configuration is as follows:
 
yarn.resourcemanager.ha.enabledtrueyarn.resourcemanager.ha.rm-idsrm1,rm2yarn.resourcemanager.hostname.rm1debugo01yarn.resourcemanager.hostname.rm2debugo02yarn.resourcemanager.recovery.enabledtrueyarn.resourcemanager.store.class        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStoreyarn.resourcemanager.zk-address        debugo01:2181,debugo02:2181,debugo03:2181        For multiple zk services, separate them with commayarn.resourcemanager.cluster-idyarn-hayarn.resourcemanager.ha.automatic-failover.enabledtrueEnable automatic failover; By default, it is enabled only when HA is enabled.  yarn.resourcemanager.ha.automatic-failover.zk-base-path  /yarn-leader-electionOptional setting. The default value is /yarn-leader-electionyarn.client.failover-proxy-providerorg.apache.hadoop.yarn.client.RMFailoverProxyProvider
 
At the same time, the yarn RM Service Listening address must be set to the following method:
   Yarn. resourcemanager. address. rm1  
  Debugo01: 8132     Yarn. resourcemanager. address. rm2  
  Debugo02: 8132     Yarn. resourcemanager. schedager. address. rm1  
  Debugo01: 8130     Yarn. resourcemanager. schedager. address. rm2  
  Debugo02: 8130     Yarn. resourcemanager. resource-tracker.address.rm1  
  Debugo01: 8131    Yarn. resourcemanager. resource-tracker.address.rm2  
  Debugo02: 8131    Yarn. resourcemanager. webapp. address. rm1  
  Debugo01: 8188     Yarn. resourcemanager. webapp. address. rm2  
  Debugo02: 8188   
Start RM
 
start-yarn.sh
 
Start RM separately on standby nodes (you can also use start-yarn.sh scripts)
 
Check status:
 
$ yarn rmadmin -getServiceState rm1active$ yarn rmadmin -getServiceState rm2standby
 
The nodemanager that accesses the rm2 node will prompt
This is standby RM. Redirecting to the current active RM: http: // debugo01: 8188/cluster/apps
KILL the resourcemanager of rm1
 
$ yarn rmadmin -getServiceState rm2active?$  yarn rmadmin -getServiceState rm114/09/14 03:08:23 INFO ipc.Client: Retrying connect to server: debugo01/192.168.46.201:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)Operation failed: Call From debugo01/192.168.46.201 to debugo01:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
Reference 
Http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
 
Http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-ha-in-cdh5/
 
Original article address: Detailed description of Resource Manager HA configuration. Thank you for sharing it with the original author.