Yarn's memory and CPU configuration

Source: Internet
Author: User
Tags configuration settings

time 2015-06-05 00:00:00 javachen ' s Blog Original http://blog.javachen.com/2015/06/05/yarn-memory-and-cpu-configuration.html ThemeYARN

Hadoop yarn supports two resource scheduling for both memory and CPU, this article describes how to configure yarn for memory and CPU usage.

Yarn, as a resource scheduler, should take into account the computing resources of each machine in the cluster, and then allocate container according to the resources requested by application. Container is the basic unit of resource allocation in yarn, which has some memory and CPU resources.

In the yarn cluster, it is important to balance the memory, CPU, and disk resources, according to experience, every two container use a disk and a CPU core can make the cluster's resources get a better use.

Memory configuration

For memory -related configurations, you can refer to Hortonwork company's documentation determine HDP memory configuration Settings to configure your cluster.

All available memory resources for yarn and mapreduce should be removed from the system operation and other Hadoop programs, with total memory reserved = System memory +hbase memory.

You can refer to the following table to determine the memory that should be retained:

memory per machine memory required by the system the memory required by HBase
4GB 1GB 1GB
8GB 2GB 1GB
16GB 2GB 2GB
24GB 4GB 4GB
48GB 6GB 8GB
64GB 8GB 8GB
72GB 8GB 8GB
96GB 12GB 16GB
128GB 24GB 24GB
255GB 32GB 32GB
512GB 64GB 64GB

To calculate the maximum number of container per machine, you can use the following formula:

containers = min (2*cores, 1.8*disks, (total available RAM)/min_container_size)

Description

    • CORESThe number of CPU cores for the machine
    • DISKSThe number of disks mounted on the machine
    • Total available RAMFor total machine memory
    • MIN_CONTAINER_SIZERefers to the container minimum capacity size, which needs to be set according to the specific situation, you can refer to the following table:
RAM available for each machine Container Minimum Value
Less than 4GB 256MB
4GB to 8GB 512MB
8GB to 24GB 1024MB
Greater than 24GB 2048MB

The average used memory size for each container is calculated as follows:

Ram-per-container = Max (min_container_size, (total Available RAM)/containers)

With the above calculations, yarn and MapReduce can be configured like this:

configuration file Configuration Settings Default Value Calculated Value
Yarn-site.xml Yarn.nodemanager.resource.memory-mb 8192 MB = Containers * Ram-per-container
Yarn-site.xml Yarn.scheduler.minimum-allocation-mb 1024MB = Ram-per-container
Yarn-site.xml Yarn.scheduler.maximum-allocation-mb 8192 MB = Containers * Ram-per-container
Yarn-site.xml (check) Yarn.app.mapreduce.am.resource.mb 1536 MB = 2 * Ram-per-container
Yarn-site.xml (check) Yarn.app.mapreduce.am.command-opts -xmx1024m = 0.8 * 2 * Ram-per-container
Mapred-site.xml Mapreduce.map.memory.mb 1024x768 MB = Ram-per-container
Mapred-site.xml Mapreduce.reduce.memory.mb 1024x768 MB = 2 * Ram-per-container
Mapred-site.xml Mapreduce.map.java.opts = 0.8 * Ram-per-container
Mapred-site.xml Mapreduce.reduce.java.opts = 0.8 * 2 * Ram-per-container

For example: For 128G memory, 32-core CPU machine, mounted 7 disks, according to the above instructions, the system reserves the memory 24G, does not adapt to hbase situation, the system remaining available memory is 104G, the calculation containers value is as follows:

containers = min (2*32, 1.8* 7, (128-24)/2) = min (64, 12.6, 51) = 13

The calculated Ram-per-container values are as follows:

Ram-per-container = Max (2, (124-24)/13) = Max (2, 8) = 8

The following parameter configuration values in the cluster are as follows:

tr>
configuration file Configuration Settings Calculated Value
yarn-site.xml yarn.nodemanager.resource.memory-mb = * 8 =104 G
yarn-site.xml yarn.scheduler.minimum-allocation-mb = 8G
yarn-site. XML yarn.scheduler.maximum-allocation-mb = 8 = 104G
yarn-site.xml (chec k) yarn.app.mapreduce.am.resource.mb = 2 * 8=16g
yarn-site.xml (check) yarn.app.mapreduce.am.command-opts = 0.8 * 2 * 8=12.8g
mapred-site.xml MAPREDUCE.MAP.MEMORY.MB = 8G
mapred-site.xml mapreduce.reduce.memory.mb = 2 * 8=16g
mapred-site.xml mapreduce.map.java.opts = 0.8 * 8=6.4g
mapred-site.xml mapreduce.reduce.java.opts = 0.8 * 2 * 8=12.8g

You can also use the script yarn-utils.py to calculate the above values:

python yarn-utils.py -c 32 -m 128 -d 7 -k False

The returned results are as follows:

 Using cores=memory=128GB disks=7 Hbase=falseProfile:cores=memory=106496MB reserved=24GB usablemem=104GB disks=7Num Container=13Container Ram=8192MBUsed Ram=104GBUnused Ram=24GBYarn.scheduler.minimum-allocation-mb=8192Yarn.scheduler.maximum-allocation-mb=106496Yarn.nodemanager.resource.memory-mb=106496 mapreduce. Map.memory.mb=8192 mapreduce. Map.java.opts=-xmx6553m mapreduce.reduce.memory.mb =8192 mapreduce.reduce.java.opts =-xmx6553m yarn.app.mapreduce.am.resource.mb=< Span class= "s" >8192 yarn.app.mapreduce.am.command-opts=< Span class= "s" >-xmx6553m mapreduce.task.io. Sort.mb=3276        

The corresponding XML configuration is:

<Property>    <Name>yarn.nodemanager.resource.memory-mb</Name> <value>106496</Value></Property><Property> <Name>yarn.scheduler.minimum-allocation-mb</Name> <value>8192</Value></Property><Property> <Name>yarn.scheduler.maximum-allocation-mb</Name> <value>106496</Value></Property><Property> <Name>yarn.app.mapreduce.am.resource.mb</name> < Value>8192</value>  </property> < property> < name>yarn.app.mapreduce.am.command-opts</name> <value >-xmx6553m</value>  </PROPERTY>           

In addition, there are several parameters:

    • yarn.nodemanager.vmem-pmem-ratio: The maximum amount of virtual memory is used for each task using 1MB of physical RAM, which defaults to 2.1.
    • yarn.nodemanager.pmem-check-enabled: Whether to start a thread that checks the amount of physical memory that each task is using, and if the task exceeds the assigned value, it is killed directly and is true by default.
    • yarn.nodemanager.vmem-pmem-ratio: Whether to start a thread that checks the amount of virtual memory that each task is using, and if the task exceeds the assigned value, it is killed directly and is true by default.

The first parameter means that when a map task allocates a total of 8G of physical memory, the task's container allocates up to 6.4G of heap memory, and the maximum amount of virtual memory that can be allocated is 8*2.1=16.8g. In addition, according to this calculation, each node yarn can start a map number of 104/8=13, it seems to be less, this is mainly related to the number of disks we mount is too small, artificial adjustment RAM-per-container of the value of 4G or a smaller value is more reasonable? Of course, this is to monitor the actual operation of the cluster to determine.

CPU Configuration

Yarn in the current CPU is divided into virtual CPUs (CPU virtual Core), where the virtual CPU is yarn itself introduced the concept, the original intention is that the different nodes of the CPU performance may be different, each CPU has the same computing power, For example, a physical CPU might be twice times more computationally capable than another physical CPU, and you can compensate for this difference by configuring several virtual CPUs for the first physical CPU. When a user submits a job, you can specify the number of virtual CPUs that each task requires.

In yarn, the CPU-related configuration parameters are as follows:

    • yarn.nodemanager.resource.cpu-vcores: Indicates the number of virtual CPUs that yarn can use on the node, by default, 8, and it is recommended that the value be set to the same number as the physical CPU cores. If your node has less than 8 CPU cores, you need to reduce this value, and yarn will not intelligently probe the total number of physical CPUs of a node.
    • yarn.scheduler.minimum-allocation-vcores: The minimum number of virtual CPUs that a single task can request, the default is 1, and the corresponding value is changed to this number if the number of CPUs for a task request is less.
    • yarn.scheduler.maximum-allocation-vcores: The maximum number of virtual CPUs that a single task can request, by default, 32.

For a cluster with a large CPU core, the above default configuration is obviously inappropriate, in my test cluster, 4 nodes per machine CPU core number is 32, can be configured as:

  <Property>  <Name>yarn.nodemanager.resource.cpu-vcores</name> <value>32</value> </property> <property > <name>yarn.scheduler.maximum-allocation-vcores</name> < value>128</value> </property>        
Summarize

According to the instructions above, the cluster node metrics in my test cluster are as follows:

The number of physical memory, virtual memory, and CPU cores allocated per node is as follows:

In the actual production environment, it may not be set up like above, such as not allocating all the CPU cores of all nodes to spark, leaving a kernel for the system to use, and a memory cap setting.

Yarn's memory and CPU configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.