In Hadoop2.0, YARN manages resources (memory, CPU, etc.) in MapReduce and packages them into iner. in this way, MapReduce can be streamlined to focus on the data processing tasks it is good at, without the need to consider resource scheduling. as shown in, YARN manages available computing resources for all machines in the cluster. YARN schedules applications based on these resources (for example 
In Hadoop2.0, YARN manages resources (memory, CPU, etc.) in MapReduce and packages them into iner. in this way, MapReduce can be streamlined to focus on the data processing tasks it is good at, without the need to consider resource scheduling. as shown in, YARN manages available computing resources for all machines in the cluster. YARN schedules applications based on these resources (for example
 
 
In Hadoop2.0, YARN manages resources (memory, CPU, etc.) in MapReduce and packages them into iner. in this way, MapReduce can be streamlined to focus on the data processing tasks it is good at, without the need to consider resource scheduling. as shown in 
 
YARN manages available computing resources for all machines in the cluster. based on these resources, YARN schedules resource requests sent from applications (such as MapReduce), and then YARN allocates Container to provide processing capabilities for each application. Container is the basic unit of processing capabilities in YARN, encapsulation of memory and CPU. 
 
This article assumes that each node in the cluster is configured with 48 gb memory, 12 hard disks, and 2 hex core CPUs (12 cores ). 
1. Configure YARN 
In a Hadoop cluster, balanced memory and CPU usage are very important, so as to avoid the computing power of the entire cluster from being limited by certain resources. according to the recommendations of Hortonworks, one or two containers of each hard disk and core can achieve the best cluster balance. if each node in the cluster has 12 hard disks and 12 cores, it is best to have up to 20 containers on each node. 
 
Because each node has 48 gb memory, we reserve some memory for the operating system. Therefore, 40 Gb memory is allocated to YARN and 8 GB memory is reserved for the operating system. the following configuration shows the maximum memory that YARN can use on each node. 
 
In the yarn-site.xml 
 
yarn.nodemanager.resource.memory-mb40960
 
Then we need to configure how to allocate these resources to the Container. We can configure the Minimum Memory allocated to the Container, because we allow each node to have up to 20 Container, therefore, the memory size of each Container is 40 Gb/20 = 2 GB. 
 
In the yarn-site.xml 
 
yarn.scheduler.minimum-allocation-mb2048
 
 
2 .? Configure mapcece2 
MapReduce2 is built on YARN and uses the YARN iner to schedule and run its map and reduce tasks. 
 
When configuring MapReduce resources on YARN, consider the following: 
 
 
 - Physical memory limit for each Map and Reduce task
- JVM stack size for each task
- Virtual Memory of each task
You can set the maximum memory for each map and reduce task. The value must be greater than or equal to the minimum memory of the Container. for example, we set the minimum memory (yarn. scheduler. minimum-allocation-mb) is 2 GB, so we can set the memory of the map task to 4 GB, and the memory of the reduce task to 8 GB: 
 
In the mapred-site.xml 
 
mapreduce.map.memory.mb4096mapreduce.reduce.memory.mb8192
 
Each Container runs a JVM for each map and reduce task. The JVM stack size should be smaller than the memory size of map and reduce: 
 
In the mapred-site.xml 
 
mapreduce.map.java.opts-Xmx3072mmapreduce.reduce.java.opts-Xmx6144m
 
The preceding settings show the physical memory available for map and reduce tasks. The upper limit of the virtual memory (physical memory + paged memory) is determined by the ratio of the virtual memory of each Container, the default value is 2.1: 
 
In yarn-site.xml: 
 
yarn.nodemanager.vmem-pmem-ratio2.1
 
According to all previous settings, the memory of each map task is allocated 
 
 
 - Physical memory = 4 GB
- JVM stack of map task Container = 3 GB
- Virtual Memory size = 4*2.1 = 8.4 GB
In YARN and MapReduce2, there is no resource pre-configuration for other map and reduce tasks. the entire cluster can dynamically allocate map and reduce based on job requirements. For example, in this example, YARN can configure up to 10 (40/4) mapper or 5 (40/8) reducer instances, or other suitable combinations. 
 
 
 
References: 
 
[1].? How to Plan and Configure YARN and MapReduce 2 in HDP 2.0 
 
 
 
Original article address: configure the memory resources of Hadoop2.0. Thank you for sharing it with the original author.