Analysis of two kinds of common fault-tolerant scenarios in Hadoop MapReduce

Source: Internet
Author: User
Keywords Hadoop mapreduce
Tags address analysis application client configuration configuration parameters configure configured

Note that before you configure these parameters, you should fully understand the implications of these parameters in order to prevent the pitfalls caused by the misuse of the cluster. In addition, these parameters need to be configured in Yarn-site.xml.

1. ResourceManager Related configuration parameters

(1) yarn.resourcemanager.address

Parameter explanation: The address that the ResourceManager exposes to the client. The client submits the application to RM via this address, kills the application, and so on.

Default value: ${yarn.resourcemanager.hostname}:8032

(2) Yarn.resourcemanager.scheduler.address

Parameter explanation: ResourceManager access address to applicationmaster exposure. Applicationmaster uses this address to request resources from RM, release resources, and so on.

Default value: ${yarn.resourcemanager.hostname}:8030

(3) Yarn.resourcemanager.resource-tracker.address

Parameter explanation: ResourceManager address to NodeManager exposure. NodeManager through this address to the RM report heartbeat, pick up the task and so on.

Default value: ${yarn.resourcemanager.hostname}:8031

(4) Yarn.resourcemanager.admin.address

Parameter explanation: ResourceManager access address that is exposed to an administrator. Administrators send administrative commands to RM through this address.

Default value: ${yarn.resourcemanager.hostname}:8033

(5) Yarn.resourcemanager.webapp.address

Parameter explanation: ResourceManager external Web UI address. This address allows users to view clusters of information in a browser.

Default value: ${yarn.resourcemanager.hostname}:8088

(6) Yarn.resourcemanager.scheduler.class

Parameter explanation: The resource Scheduler main class that is enabled. Currently available are FIFO, Capacity Scheduler and fair Scheduler.

Default value:

Org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

(7) Yarn.resourcemanager.resource-tracker.client.thread-count

Parameter interpretation: The number of handler processing RPC requests from NodeManager.

Default value: 50

(8) Yarn.resourcemanager.scheduler.client.thread-count

Parameter interpretation: The number of handler processing RPC requests from Applicationmaster.

Default value: 50

(9) YARN.SCHEDULER.MINIMUM-ALLOCATION-MB/YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB

Parameter explanation: The minimum/maximum amount of memory resources that can be requested per single. For example, set to 1024 and 3072, when running the MAPREDCE job, each task can request at least 1024MB of memory, up to 3072MB memory.

Default value: 1024/8192

(a) Yarn.scheduler.minimum-allocation-vcores/yarn.scheduler.maximum-allocation-vcores

Parameter explanation: The minimum/maximum number of virtual CPUs that can be applied individually. For example, set to 1 and 4, when running the MAPREDCE job, each task can request at least 1 virtual CPUs, up to 4 virtual CPUs. What is a virtual CPU, you can read my article: "YARN Resource Scheduler Profiler."

Default value: 1/32

(one) Yarn.resourcemanager.nodes.include-path/yarn.resourcemanager.nodes.exclude-path

Parameter explanation: NodeManager black and white list. If a number of NodeManager are found to be problematic, such as a high failure rate and a high failure rate for a task, you can add it to the blacklist. Note that these two configuration parameters can take effect dynamically. (Invoke a Refresh command)

Default value: ""

(YARN.RESOURCEMANAGER.NODEMANAGERS.HEARTBEAT-INTERVAL-MS)

Parameter explanation: NodeManager heartbeat interval

Default value: 1000 (ms)

2. NodeManager Related configuration parameters

(1) YARN.NODEMANAGER.RESOURCE.MEMORY-MB

Parameter explanation: NodeManager total available physical memory. Note that this parameter is not modifiable, and once set, the entire operation cannot be dynamically modified. In addition, the default value of this parameter is 8192MB, even if your machine memory is not enough 8192mb,yarn will also follow these memory to use (silly not silly?), therefore, this value must be configured. However, Apache is already trying to make this parameter dynamically modifiable.

Default value: 8192

(2) Yarn.nodemanager.vmem-pmem-ratio

Parameter explanation: The maximum number of virtual memory available for each use of 1MB physical memory.

Default value: 2.1

(3) Yarn.nodemanager.resource.cpu-vcores

Parameter explanation: NodeManager Total number of virtual CPUs available.

Default value: 8

(4) Yarn.nodemanager.local-dirs

Parameter explanation: The intermediate result is stored in a position similar to the Mapred.local.dir in 1.0. Note that this parameter typically configures multiple directories and the disk IO load is allocated.

Default value: ${hadoop.tmp.dir}/nm-local-dir

(5) Yarn.nodemanager.log-dirs

Parameter explanation: Log store address (multiple directories can be configured).

Default value: ${yarn.log.dir}/userlogs

(6) Yarn.nodemanager.log.retain-seconds

Parameter explanation: The maximum amount of time on the NodeManager log (valid when the log aggregation function is not enabled).

Default: 10800 (3 hours)

(7) Yarn.nodemanager.aux-services

Parameter explanation: The ancillary services running on the NodeManager. You need to configure Mapreduce_shuffle to run the MapReduce program

Default value: ""

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.