Configure memory resources of Hadoop2.0

Source: Internet
Author: User
In Hadoop2.0, YARN manages resources (memory, CPU, etc.) in MapReduce and packages them into iner. in this way, MapReduce can be streamlined to focus on the data processing tasks it is good at, without the need to consider resource scheduling. as shown in, YARN manages available computing resources for all machines in the cluster. YARN schedules applications based on these resources (for example

In Hadoop2.0, YARN manages resources (memory, CPU, etc.) in MapReduce and packages them into iner. in this way, MapReduce can be streamlined to focus on the data processing tasks it is good at, without the need to consider resource scheduling. as shown in, YARN manages available computing resources for all machines in the cluster. YARN schedules applications based on these resources (for example


In Hadoop2.0, YARN manages resources (memory, CPU, etc.) in MapReduce and packages them into iner. in this way, MapReduce can be streamlined to focus on the data processing tasks it is good at, without the need to consider resource scheduling. as shown in

YARN manages available computing resources for all machines in the cluster. based on these resources, YARN schedules resource requests sent from applications (such as MapReduce), and then YARN allocates Container to provide processing capabilities for each application. Container is the basic unit of processing capabilities in YARN, encapsulation of memory and CPU.

This article assumes that each node in the cluster is configured with 48 gb memory, 12 hard disks, and 2 hex core CPUs (12 cores ).
1. Configure YARN
In a Hadoop cluster, balanced memory and CPU usage are very important, so as to avoid the computing power of the entire cluster from being limited by certain resources. according to the recommendations of Hortonworks, one or two containers of each hard disk and core can achieve the best cluster balance. if each node in the cluster has 12 hard disks and 12 cores, it is best to have up to 20 containers on each node.

Because each node has 48 gb memory, we reserve some memory for the operating system. Therefore, 40 Gb memory is allocated to YARN and 8 GB memory is reserved for the operating system. the following configuration shows the maximum memory that YARN can use on each node.

In the yarn-site.xml
yarn.nodemanager.resource.memory-mb40960

Then we need to configure how to allocate these resources to the Container. We can configure the Minimum Memory allocated to the Container, because we allow each node to have up to 20 Container, therefore, the memory size of each Container is 40 Gb/20 = 2 GB.

In the yarn-site.xml
yarn.scheduler.minimum-allocation-mb2048


2 .? Configure mapcece2
MapReduce2 is built on YARN and uses the YARN iner to schedule and run its map and reduce tasks.

When configuring MapReduce resources on YARN, consider the following:
  1. Physical memory limit for each Map and Reduce task
  2. JVM stack size for each task
  3. Virtual Memory of each task

You can set the maximum memory for each map and reduce task. The value must be greater than or equal to the minimum memory of the Container. for example, we set the minimum memory (yarn. scheduler. minimum-allocation-mb) is 2 GB, so we can set the memory of the map task to 4 GB, and the memory of the reduce task to 8 GB:

In the mapred-site.xml
mapreduce.map.memory.mb4096mapreduce.reduce.memory.mb8192

Each Container runs a JVM for each map and reduce task. The JVM stack size should be smaller than the memory size of map and reduce:

In the mapred-site.xml
mapreduce.map.java.opts-Xmx3072mmapreduce.reduce.java.opts-Xmx6144m

The preceding settings show the physical memory available for map and reduce tasks. The upper limit of the virtual memory (physical memory + paged memory) is determined by the ratio of the virtual memory of each Container, the default value is 2.1:

In yarn-site.xml:
yarn.nodemanager.vmem-pmem-ratio2.1

According to all previous settings, the memory of each map task is allocated
  • Physical memory = 4 GB
  • JVM stack of map task Container = 3 GB
  • Virtual Memory size = 4*2.1 = 8.4 GB

In YARN and MapReduce2, there is no resource pre-configuration for other map and reduce tasks. the entire cluster can dynamically allocate map and reduce based on job requirements. For example, in this example, YARN can configure up to 10 (40/4) mapper or 5 (40/8) reducer instances, or other suitable combinations.



References:

[1].? How to Plan and Configure YARN and MapReduce 2 in HDP 2.0


Original article address: configure the memory resources of Hadoop2.0. Thank you for sharing it with the original author.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.