Yarn's memory and CPU configuration

Last Update:2015-09-20 Source: Internet

Author: User

Tags configuration settings

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

time 2015-06-05 00:00:00 javachen ' s Blog Original http://blog.javachen.com/2015/06/05/yarn-memory-and-cpu-configuration.html ThemeYARN

Hadoop yarn supports two resource scheduling for both memory and CPU, this article describes how to configure yarn for memory and CPU usage.

Yarn, as a resource scheduler, should take into account the computing resources of each machine in the cluster, and then allocate container according to the resources requested by application. Container is the basic unit of resource allocation in yarn, which has some memory and CPU resources.

In the yarn cluster, it is important to balance the memory, CPU, and disk resources, according to experience, every two container use a disk and a CPU core can make the cluster's resources get a better use.

Memory configuration

For memory -related configurations, you can refer to Hortonwork company's documentation determine HDP memory configuration Settings to configure your cluster.

All available memory resources for yarn and mapreduce should be removed from the system operation and other Hadoop programs, with total memory reserved = System memory +hbase memory.

You can refer to the following table to determine the memory that should be retained:

memory per machine	memory required by the system	the memory required by HBase
4GB	1GB	1GB
8GB	2GB	1GB
16GB	2GB	2GB
24GB	4GB	4GB
48GB	6GB	8GB
64GB	8GB	8GB
72GB	8GB	8GB
96GB	12GB	16GB
128GB	24GB	24GB
255GB	32GB	32GB
512GB	64GB	64GB

To calculate the maximum number of container per machine, you can use the following formula:

containers = min (2*cores, 1.8*disks, (total available RAM)/min_container_size)

Description

CORESThe number of CPU cores for the machine
DISKSThe number of disks mounted on the machine
Total available RAMFor total machine memory
MIN_CONTAINER_SIZERefers to the container minimum capacity size, which needs to be set according to the specific situation, you can refer to the following table:

RAM available for each machine	Container Minimum Value
Less than 4GB	256MB
4GB to 8GB	512MB
8GB to 24GB	1024MB
Greater than 24GB	2048MB

The average used memory size for each container is calculated as follows:

Ram-per-container = Max (min_container_size, (total Available RAM)/containers)

With the above calculations, yarn and MapReduce can be configured like this:

configuration file	Configuration Settings	Default Value	Calculated Value
Yarn-site.xml	Yarn.nodemanager.resource.memory-mb	8192 MB	= Containers * Ram-per-container
Yarn-site.xml	Yarn.scheduler.minimum-allocation-mb	1024MB	= Ram-per-container
Yarn-site.xml	Yarn.scheduler.maximum-allocation-mb	8192 MB	= Containers * Ram-per-container
Yarn-site.xml (check)	Yarn.app.mapreduce.am.resource.mb	1536 MB	= 2 * Ram-per-container
Yarn-site.xml (check)	Yarn.app.mapreduce.am.command-opts	-xmx1024m	= 0.8 * 2 * Ram-per-container
Mapred-site.xml	Mapreduce.map.memory.mb	1024x768 MB	= Ram-per-container
Mapred-site.xml	Mapreduce.reduce.memory.mb	1024x768 MB	= 2 * Ram-per-container
Mapred-site.xml	Mapreduce.map.java.opts		= 0.8 * Ram-per-container
Mapred-site.xml	Mapreduce.reduce.java.opts		= 0.8 * 2 * Ram-per-container

For example: For 128G memory, 32-core CPU machine, mounted 7 disks, according to the above instructions, the system reserves the memory 24G, does not adapt to hbase situation, the system remaining available memory is 104G, the calculation containers value is as follows:

containers = min (2*32, 1.8* 7, (128-24)/2) = min (64, 12.6, 51) = 13

The calculated Ram-per-container values are as follows:

Ram-per-container = Max (2, (124-24)/13) = Max (2, 8) = 8

The following parameter configuration values in the cluster are as follows:

tr>

configuration file	Configuration Settings	Calculated Value
yarn-site.xml	yarn.nodemanager.resource.memory-mb	= * 8 =104 G
yarn-site.xml	yarn.scheduler.minimum-allocation-mb	= 8G
yarn-site. XML	yarn.scheduler.maximum-allocation-mb	= 8 = 104G
yarn-site.xml (chec k)	yarn.app.mapreduce.am.resource.mb	= 2 * 8=16g
yarn-site.xml (check)	yarn.app.mapreduce.am.command-opts	= 0.8 * 2 * 8=12.8g
mapred-site.xml	MAPREDUCE.MAP.MEMORY.MB	= 8G
mapred-site.xml	mapreduce.reduce.memory.mb	= 2 * 8=16g
mapred-site.xml	mapreduce.map.java.opts	= 0.8 * 8=6.4g
mapred-site.xml	mapreduce.reduce.java.opts	= 0.8 * 2 * 8=12.8g

You can also use the script yarn-utils.py to calculate the above values:

python yarn-utils.py -c 32 -m 128 -d 7 -k False

The returned results are as follows:

 Using cores=memory=128GB disks=7 Hbase=falseProfile:cores=memory=106496MB reserved=24GB usablemem=104GB disks=7Num Container=13Container Ram=8192MBUsed Ram=104GBUnused Ram=24GBYarn.scheduler.minimum-allocation-mb=8192Yarn.scheduler.maximum-allocation-mb=106496Yarn.nodemanager.resource.memory-mb=106496 mapreduce. Map.memory.mb=8192 mapreduce. Map.java.opts=-xmx6553m mapreduce.reduce.memory.mb =8192 mapreduce.reduce.java.opts =-xmx6553m yarn.app.mapreduce.am.resource.mb=< Span class= "s" >8192 yarn.app.mapreduce.am.command-opts=< Span class= "s" >-xmx6553m mapreduce.task.io. Sort.mb=3276

The corresponding XML configuration is:

<Property>    <Name>yarn.nodemanager.resource.memory-mb</Name> <value>106496</Value></Property><Property> <Name>yarn.scheduler.minimum-allocation-mb</Name> <value>8192</Value></Property><Property> <Name>yarn.scheduler.maximum-allocation-mb</Name> <value>106496</Value></Property><Property> <Name>yarn.app.mapreduce.am.resource.mb</name> < Value>8192</value>  </property> < property> < name>yarn.app.mapreduce.am.command-opts</name> <value >-xmx6553m</value>  </PROPERTY>

In addition, there are several parameters:

yarn.nodemanager.vmem-pmem-ratio: The maximum amount of virtual memory is used for each task using 1MB of physical RAM, which defaults to 2.1.
yarn.nodemanager.pmem-check-enabled: Whether to start a thread that checks the amount of physical memory that each task is using, and if the task exceeds the assigned value, it is killed directly and is true by default.
yarn.nodemanager.vmem-pmem-ratio: Whether to start a thread that checks the amount of virtual memory that each task is using, and if the task exceeds the assigned value, it is killed directly and is true by default.

The first parameter means that when a map task allocates a total of 8G of physical memory, the task's container allocates up to 6.4G of heap memory, and the maximum amount of virtual memory that can be allocated is 8*2.1=16.8g. In addition, according to this calculation, each node yarn can start a map number of 104/8=13, it seems to be less, this is mainly related to the number of disks we mount is too small, artificial adjustment RAM-per-container of the value of 4G or a smaller value is more reasonable? Of course, this is to monitor the actual operation of the cluster to determine.

CPU Configuration

Yarn in the current CPU is divided into virtual CPUs (CPU virtual Core), where the virtual CPU is yarn itself introduced the concept, the original intention is that the different nodes of the CPU performance may be different, each CPU has the same computing power, For example, a physical CPU might be twice times more computationally capable than another physical CPU, and you can compensate for this difference by configuring several virtual CPUs for the first physical CPU. When a user submits a job, you can specify the number of virtual CPUs that each task requires.

In yarn, the CPU-related configuration parameters are as follows:

yarn.nodemanager.resource.cpu-vcores: Indicates the number of virtual CPUs that yarn can use on the node, by default, 8, and it is recommended that the value be set to the same number as the physical CPU cores. If your node has less than 8 CPU cores, you need to reduce this value, and yarn will not intelligently probe the total number of physical CPUs of a node.
yarn.scheduler.minimum-allocation-vcores: The minimum number of virtual CPUs that a single task can request, the default is 1, and the corresponding value is changed to this number if the number of CPUs for a task request is less.
yarn.scheduler.maximum-allocation-vcores: The maximum number of virtual CPUs that a single task can request, by default, 32.

For a cluster with a large CPU core, the above default configuration is obviously inappropriate, in my test cluster, 4 nodes per machine CPU core number is 32, can be configured as:

  <Property>  <Name>yarn.nodemanager.resource.cpu-vcores</name> <value>32</value> </property> <property > <name>yarn.scheduler.maximum-allocation-vcores</name> < value>128</value> </property>

Summarize

According to the instructions above, the cluster node metrics in my test cluster are as follows:

The number of physical memory, virtual memory, and CPU cores allocated per node is as follows:

In the actual production environment, it may not be set up like above, such as not allocating all the CPU cores of all nodes to spark, leaving a kernel for the system to use, and a memory cap setting.

Yarn's memory and CPU configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More