Detailed description of ResourceManagerHA Configuration

Source: Internet
Author: User
ResourceManager in YARN is responsible for resource management and scheduling of the entire system, and maintains the ApplictionMaster information, NodeManager information, and resource usage information of each application. After version 2.4, HadoopCommon also provides HA functions to solve the reliability and fault tolerance problems of such basic services.

Resource Manager in YARN is responsible for Resource management and scheduling of the entire system, and internally maintains ApplictionMaster information, NodeManager information, and Resource usage information of each application. Later than version 2.4, Hadoop Common also provides the HA function to solve the reliability and fault tolerance problems of such basic services.

Resource Manager in YARN is responsible for Resource management and scheduling of the entire system, and internally maintains ApplictionMaster information, NodeManager information, and Resource usage information of each application. Later than version 2.4, Hadoop Common also provides the HA function to solve the reliability and fault tolerance problems of such basic services. The architecture is as follows:

Overview of ResourceManager High Availability
Rm ha and nn ha have many similarities (NameNode HA configuration details ):
(1 ).Active/Standby architecture. Only one RM is Active at a time, as shown in ).
(2 ).Depends on zooKeeper implementation. Manually switch to the yarn rmadmin command (similar to the hdfs haadmin command), while ZKFailoverController is used for automatic failover. But the difference is that zkfc is started only as a thread in RM rather than an independent daemon.
(3 ).When there are multiple RM, The yarn-site.xml used by the client needs to specify the list of RM. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) will look for the active RM in the round training mode, that is, AM
S and NMs must provide their own fault tolerance mechanisms. If the RM in the current activity status fails, the new RM will be found in the round training mode. To implement this logic, you must specify the class org. apache. hadoop. yarn. client. RMFailoverProxyProvider in yarn. client. failover-proxy-provider.
In addition, the new RM can restore the former RM State (for details, see ResourceManger Restart ). When RM Restart is started, the restarted RM loads the status information of the previous active RM and continues the previous RM operation. This way, the application periodically performs the checkpoint operation to avoid the loss of work content. In Active/standby RM, both active RM status data and standby status data must be accessed by using the FileSystemRMStateStore or zooKeeper method (ZKRMStateStore ). The latter allows only one RM to have the write permission at a time.

A common yarn rm ha configuration is as follows:

yarn.resourcemanager.ha.enabledtrueyarn.resourcemanager.ha.rm-idsrm1,rm2yarn.resourcemanager.hostname.rm1debugo01yarn.resourcemanager.hostname.rm2debugo02yarn.resourcemanager.recovery.enabledtrueyarn.resourcemanager.store.class        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStoreyarn.resourcemanager.zk-address        debugo01:2181,debugo02:2181,debugo03:2181        For multiple zk services, separate them with commayarn.resourcemanager.cluster-idyarn-hayarn.resourcemanager.ha.automatic-failover.enabledtrueEnable automatic failover; By default, it is enabled only when HA is enabled.  yarn.resourcemanager.ha.automatic-failover.zk-base-path  /yarn-leader-electionOptional setting. The default value is /yarn-leader-electionyarn.client.failover-proxy-providerorg.apache.hadoop.yarn.client.RMFailoverProxyProvider

At the same time, the yarn RM Service Listening address must be set to the following method:

Yarn. resourcemanager. address. rm1
Debugo01: 8132 Yarn. resourcemanager. address. rm2
Debugo02: 8132 Yarn. resourcemanager. schedager. address. rm1
Debugo01: 8130 Yarn. resourcemanager. schedager. address. rm2
Debugo02: 8130 Yarn. resourcemanager. resource-tracker.address.rm1
Debugo01: 8131 Yarn. resourcemanager. resource-tracker.address.rm2
Debugo02: 8131 Yarn. resourcemanager. webapp. address. rm1
Debugo01: 8188 Yarn. resourcemanager. webapp. address. rm2
Debugo02: 8188

Start RM

start-yarn.sh

Start RM separately on standby nodes (you can also use start-yarn.sh scripts)

Check status:

$ yarn rmadmin -getServiceState rm1active$ yarn rmadmin -getServiceState rm2standby

The nodemanager that accesses the rm2 node will prompt
This is standby RM. Redirecting to the current active RM: http: // debugo01: 8188/cluster/apps
KILL the resourcemanager of rm1

$ yarn rmadmin -getServiceState rm2active?$  yarn rmadmin -getServiceState rm114/09/14 03:08:23 INFO ipc.Client: Retrying connect to server: debugo01/192.168.46.201:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)Operation failed: Call From debugo01/192.168.46.201 to debugo01:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
Reference

Http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

Http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-ha-in-cdh5/

Original article address: Detailed description of Resource Manager HA configuration. Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.