Detailed description of ResourceManagerHA Configuration

Last Update:2018-06-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ResourceManager in YARN is responsible for resource management and scheduling of the entire system, and maintains the ApplictionMaster information, NodeManager information, and resource usage information of each application. After version 2.4, HadoopCommon also provides HA functions to solve the reliability and fault tolerance problems of such basic services.

Resource Manager in YARN is responsible for Resource management and scheduling of the entire system, and internally maintains ApplictionMaster information, NodeManager information, and Resource usage information of each application. Later than version 2.4, Hadoop Common also provides the HA function to solve the reliability and fault tolerance problems of such basic services. The architecture is as follows:

Overview of ResourceManager High Availability
Rm ha and nn ha have many similarities (NameNode HA configuration details ):
(1 ).Active/Standby architecture. Only one RM is Active at a time, as shown in ).
(2 ).Depends on zooKeeper implementation. Manually switch to the yarn rmadmin command (similar to the hdfs haadmin command), while ZKFailoverController is used for automatic failover. But the difference is that zkfc is started only as a thread in RM rather than an independent daemon.
(3 ).When there are multiple RM, The yarn-site.xml used by the client needs to specify the list of RM. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) will look for the active RM in the round training mode, that is, AM
S and NMs must provide their own fault tolerance mechanisms. If the RM in the current activity status fails, the new RM will be found in the round training mode. To implement this logic, you must specify the class org. apache. hadoop. yarn. client. RMFailoverProxyProvider in yarn. client. failover-proxy-provider.
In addition, the new RM can restore the former RM State (for details, see ResourceManger Restart ). When RM Restart is started, the restarted RM loads the status information of the previous active RM and continues the previous RM operation. This way, the application periodically performs the checkpoint operation to avoid the loss of work content. In Active/standby RM, both active RM status data and standby status data must be accessed by using the FileSystemRMStateStore or zooKeeper method (ZKRMStateStore ). The latter allows only one RM to have the write permission at a time.

A common yarn rm ha configuration is as follows:

yarn.resourcemanager.ha.enabledtrueyarn.resourcemanager.ha.rm-idsrm1,rm2yarn.resourcemanager.hostname.rm1debugo01yarn.resourcemanager.hostname.rm2debugo02yarn.resourcemanager.recovery.enabledtrueyarn.resourcemanager.store.class        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStoreyarn.resourcemanager.zk-address        debugo01:2181,debugo02:2181,debugo03:2181        For multiple zk services, separate them with commayarn.resourcemanager.cluster-idyarn-hayarn.resourcemanager.ha.automatic-failover.enabledtrueEnable automatic failover; By default, it is enabled only when HA is enabled.  yarn.resourcemanager.ha.automatic-failover.zk-base-path  /yarn-leader-electionOptional setting. The default value is /yarn-leader-electionyarn.client.failover-proxy-providerorg.apache.hadoop.yarn.client.RMFailoverProxyProvider

At the same time, the yarn RM Service Listening address must be set to the following method:

Yarn. resourcemanager. address. rm1
Debugo01: 8132 Yarn. resourcemanager. address. rm2
Debugo02: 8132 Yarn. resourcemanager. schedager. address. rm1
Debugo01: 8130 Yarn. resourcemanager. schedager. address. rm2
Debugo02: 8130 Yarn. resourcemanager. resource-tracker.address.rm1
Debugo01: 8131 Yarn. resourcemanager. resource-tracker.address.rm2
Debugo02: 8131 Yarn. resourcemanager. webapp. address. rm1
Debugo01: 8188 Yarn. resourcemanager. webapp. address. rm2
Debugo02: 8188

Start RM

start-yarn.sh

Start RM separately on standby nodes (you can also use start-yarn.sh scripts)

Check status:

$ yarn rmadmin -getServiceState rm1active$ yarn rmadmin -getServiceState rm2standby

The nodemanager that accesses the rm2 node will prompt
This is standby RM. Redirecting to the current active RM: http: // debugo01: 8188/cluster/apps
KILL the resourcemanager of rm1

$ yarn rmadmin -getServiceState rm2active?$  yarn rmadmin -getServiceState rm114/09/14 03:08:23 INFO ipc.Client: Retrying connect to server: debugo01/192.168.46.201:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)Operation failed: Call From debugo01/192.168.46.201 to debugo01:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

Reference

Http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

Http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-ha-in-cdh5/

Original article address: Detailed description of Resource Manager HA configuration. Thank you for sharing it with the original author.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More