time 2015-06-05 00:00:00 javachen ' s Blog Original http://blog.javachen.com/2015/06/05/yarn-memory-and-cpu-configuration.html ThemeYARN
Hadoop yarn supports two resource scheduling for both memory and CPU, this article describes how to configure yarn for memory and CPU usage.
Yarn, as a resource scheduler, should take into account the computing resources of each machine in the cluster, and then allocate container according to the resources requested by application. Container is the basic unit of resource allocation in yarn, which has some memory and CPU resources.
In the yarn cluster, it is important to balance the memory, CPU, and disk resources, according to experience, every two container use a disk and a CPU core can make the cluster's resources get a better use.
Memory configuration
For memory -related configurations, you can refer to Hortonwork company's documentation determine HDP memory configuration Settings to configure your cluster.
All available memory resources for yarn and mapreduce should be removed from the system operation and other Hadoop programs, with total memory reserved = System memory +hbase memory.
You can refer to the following table to determine the memory that should be retained:
memory per machine |
memory required by the system |
the memory required by HBase |
4GB |
1GB |
1GB |
8GB |
2GB |
1GB |
16GB |
2GB |
2GB |
24GB |
4GB |
4GB |
48GB |
6GB |
8GB |
64GB |
8GB |
8GB |
72GB |
8GB |
8GB |
96GB |
12GB |
16GB |
128GB |
24GB |
24GB |
255GB |
32GB |
32GB |
512GB |
64GB |
64GB |
To calculate the maximum number of container per machine, you can use the following formula:
containers = min (2*cores, 1.8*disks, (total available RAM)/min_container_size)
Description
CORES
The number of CPU cores for the machine
DISKS
The number of disks mounted on the machine
Total available RAM
For total machine memory
MIN_CONTAINER_SIZE
Refers to the container minimum capacity size, which needs to be set according to the specific situation, you can refer to the following table:
RAM available for each machine |
Container Minimum Value |
Less than 4GB |
256MB |
4GB to 8GB |
512MB |
8GB to 24GB |
1024MB |
Greater than 24GB |
2048MB |
The average used memory size for each container is calculated as follows:
Ram-per-container = Max (min_container_size, (total Available RAM)/containers)
With the above calculations, yarn and MapReduce can be configured like this:
configuration file |
Configuration Settings |
Default Value |
Calculated Value |
Yarn-site.xml |
Yarn.nodemanager.resource.memory-mb |
8192 MB |
= Containers * Ram-per-container |
Yarn-site.xml |
Yarn.scheduler.minimum-allocation-mb |
1024MB |
= Ram-per-container |
Yarn-site.xml |
Yarn.scheduler.maximum-allocation-mb |
8192 MB |
= Containers * Ram-per-container |
Yarn-site.xml (check) |
Yarn.app.mapreduce.am.resource.mb |
1536 MB |
= 2 * Ram-per-container |
Yarn-site.xml (check) |
Yarn.app.mapreduce.am.command-opts |
-xmx1024m |
= 0.8 * 2 * Ram-per-container |
Mapred-site.xml |
Mapreduce.map.memory.mb |
1024x768 MB |
= Ram-per-container |
Mapred-site.xml |
Mapreduce.reduce.memory.mb |
1024x768 MB |
= 2 * Ram-per-container |
Mapred-site.xml |
Mapreduce.map.java.opts |
|
= 0.8 * Ram-per-container |
Mapred-site.xml |
Mapreduce.reduce.java.opts |
|
= 0.8 * 2 * Ram-per-container |
For example: For 128G memory, 32-core CPU machine, mounted 7 disks, according to the above instructions, the system reserves the memory 24G, does not adapt to hbase situation, the system remaining available memory is 104G, the calculation containers value is as follows:
containers = min (2*32, 1.8* 7, (128-24)/2) = min (64, 12.6, 51) = 13
The calculated Ram-per-container values are as follows:
Ram-per-container = Max (2, (124-24)/13) = Max (2, 8) = 8
The following parameter configuration values in the cluster are as follows:
configuration file |
Configuration Settings |
Calculated Value |
yarn-site.xml |
yarn.nodemanager.resource.memory-mb |
= * 8 =104 G |
tr>
yarn-site.xml |
yarn.scheduler.minimum-allocation-mb |
= 8G |
yarn-site. XML |
yarn.scheduler.maximum-allocation-mb |
= 8 = 104G |
yarn-site.xml (chec k) |
yarn.app.mapreduce.am.resource.mb |
= 2 * 8=16g |
yarn-site.xml (check) |
yarn.app.mapreduce.am.command-opts |
= 0.8 * 2 * 8=12.8g |
mapred-site.xml |
MAPREDUCE.MAP.MEMORY.MB |
= 8G |
mapred-site.xml |
mapreduce.reduce.memory.mb |
= 2 * 8=16g |
mapred-site.xml |
mapreduce.map.java.opts |
= 0.8 * 8=6.4g |
mapred-site.xml |
mapreduce.reduce.java.opts |
= 0.8 * 2 * 8=12.8g |
You can also use the script yarn-utils.py to calculate the above values:
python yarn-utils.py -c 32 -m 128 -d 7 -k False
The returned results are as follows:
Using cores=memory=128GB disks=7 Hbase=falseProfile:cores=memory=106496MB reserved=24GB usablemem=104GB disks=7Num Container=13Container Ram=8192MBUsed Ram=104GBUnused Ram=24GBYarn.scheduler.minimum-allocation-mb=8192Yarn.scheduler.maximum-allocation-mb=106496Yarn.nodemanager.resource.memory-mb=106496 mapreduce. Map.memory.mb=8192 mapreduce. Map.java.opts=-xmx6553m mapreduce.reduce.memory.mb =8192 mapreduce.reduce.java.opts =-xmx6553m yarn.app.mapreduce.am.resource.mb=< Span class= "s" >8192 yarn.app.mapreduce.am.command-opts=< Span class= "s" >-xmx6553m mapreduce.task.io. Sort.mb=3276
The corresponding XML configuration is:
<Property> <Name>yarn.nodemanager.resource.memory-mb</Name> <value>106496</Value></Property><Property> <Name>yarn.scheduler.minimum-allocation-mb</Name> <value>8192</Value></Property><Property> <Name>yarn.scheduler.maximum-allocation-mb</Name> <value>106496</Value></Property><Property> <Name>yarn.app.mapreduce.am.resource.mb</name> < Value>8192</value> </property> < property> < name>yarn.app.mapreduce.am.command-opts</name> <value >-xmx6553m</value> </PROPERTY>
In addition, there are several parameters:
yarn.nodemanager.vmem-pmem-ratio
: The maximum amount of virtual memory is used for each task using 1MB of physical RAM, which defaults to 2.1.
yarn.nodemanager.pmem-check-enabled
: Whether to start a thread that checks the amount of physical memory that each task is using, and if the task exceeds the assigned value, it is killed directly and is true by default.
yarn.nodemanager.vmem-pmem-ratio
: Whether to start a thread that checks the amount of virtual memory that each task is using, and if the task exceeds the assigned value, it is killed directly and is true by default.
The first parameter means that when a map task allocates a total of 8G of physical memory, the task's container allocates up to 6.4G of heap memory, and the maximum amount of virtual memory that can be allocated is 8*2.1=16.8g. In addition, according to this calculation, each node yarn can start a map number of 104/8=13, it seems to be less, this is mainly related to the number of disks we mount is too small, artificial adjustment RAM-per-container
of the value of 4G or a smaller value is more reasonable? Of course, this is to monitor the actual operation of the cluster to determine.
CPU Configuration
Yarn in the current CPU is divided into virtual CPUs (CPU virtual Core), where the virtual CPU is yarn itself introduced the concept, the original intention is that the different nodes of the CPU performance may be different, each CPU has the same computing power, For example, a physical CPU might be twice times more computationally capable than another physical CPU, and you can compensate for this difference by configuring several virtual CPUs for the first physical CPU. When a user submits a job, you can specify the number of virtual CPUs that each task requires.
In yarn, the CPU-related configuration parameters are as follows:
yarn.nodemanager.resource.cpu-vcores
: Indicates the number of virtual CPUs that yarn can use on the node, by default, 8, and it is recommended that the value be set to the same number as the physical CPU cores. If your node has less than 8 CPU cores, you need to reduce this value, and yarn will not intelligently probe the total number of physical CPUs of a node.
yarn.scheduler.minimum-allocation-vcores
: The minimum number of virtual CPUs that a single task can request, the default is 1, and the corresponding value is changed to this number if the number of CPUs for a task request is less.
yarn.scheduler.maximum-allocation-vcores
: The maximum number of virtual CPUs that a single task can request, by default, 32.
For a cluster with a large CPU core, the above default configuration is obviously inappropriate, in my test cluster, 4 nodes per machine CPU core number is 32, can be configured as:
<Property> <Name>yarn.nodemanager.resource.cpu-vcores</name> <value>32</value> </property> <property > <name>yarn.scheduler.maximum-allocation-vcores</name> < value>128</value> </property>
Summarize
According to the instructions above, the cluster node metrics in my test cluster are as follows:
The number of physical memory, virtual memory, and CPU cores allocated per node is as follows:
In the actual production environment, it may not be set up like above, such as not allocating all the CPU cores of all nodes to spark, leaving a kernel for the system to use, and a memory cap setting.
Yarn's memory and CPU configuration