Yarn requires a lot of memory configuration, this article only gives some recommendations and suggestions, actually according to the specific business logic to set
First, it needs to be clear that in yarn, the entire cluster of resources requires memory, hard disk, CPU (CPU core number) Three to decide, must realize the balance of three, in the actual production environment, hard disk is large enough, so rarely consider the hard drive, here for the time being a hard disk as a factor as a reference.
When computing the memory of a node, you need to consider the memory requirements of the operating system, NM memory requirements, and the memory requirements of other systems on that node (for example, HBase, below, for example, HBase),
So yarn available memory = Total system memory-preserve memory for the operating system-preserves memory for HBase
The operating system and HBase memory reference values are as follows
Node Total memory |
Memory reserved by the operating system |
Memory reserved by hbase |
4 GB |
1 GB |
1 GB |
8 GB |
2 GB |
1 GB |
GB |
2 GB |
2 GB |
GB |
4 GB |
4 GB |
GB |
6 GB |
8 GB |
GB |
8 GB |
8 GB |
GB |
8 GB |
8 GB |
GB |
GB |
GB |
128 GB |
GB |
GB |
256 GB |
Gb |
Gb |
GB |
GB |
GB |
Then, the maximum number of containers per node can be calculated by using the following formula
Containers=min (2*cpu,1.8disks, yarn free memory)/container minimum memory)
Each container minimum memory is dependent on the yarn available memory, and the minimum memory and available memory relationships are as follows:
Available memory per node |
Container Minimum memory recommended value |
Less than 4 GB |
256 MB |
Between 4 GB and 8 GB |
Mb |
Between 8 GB and GB |
1024 MB |
Above GB |
2048 MB |
According to the above reference values and calculation formulas, we can calculate the number of nodes container, then each container can use the memory can be obtained by the following formula
Each container memory =max (container minimum memory, yarn number of available memory/container)
With the above calculations, the yarn and Mr Memory recommendations are configured as follows:
Configuration file |
Configuration Item Name |
Configuration Item Value |
Yarn-site.xml |
Yarn.nodemanager.resource.memory-mb |
= containers number * Each container memory |
Yarn-site.xml |
Yarn.scheduler.minimum-allocation-mb |
= per container memory |
Yarn-site.xml |
Yarn.scheduler.maximum-allocation-mb |
= containers number * Each container memory |
Mapred-site.xml |
Mapreduce.map.memory.mb |
= per container memory |
Mapred-site.xml |
Mapreduce.reduce.memory.mb |
= 2 * per container memory |
Mapred-site.xml |
Mapreduce.map.java.opts |
= 0.8 * per container memory |
Mapred-site.xml |
Mapreduce.reduce.java.opts |
= 0.8 * 2 * per container memory |
Yarn-site.xml (check) |
Yarn.app.mapreduce.am.resource.mb |
= 2 * per container memory |
Yarn-site.xml (check) |
Yarn.app.mapreduce.am.command-opts |
= 0.8 * 2 * per container memory |
HDP also publishes a Python script yarn-util.py to simplify the calculation, which has four parameters
Parameters |
Describe |
-C Cores |
CPU cores per node |
-M MEMORY |
Total memory per node (unit g) |
-D Disks |
Number of hard disks per node |
-K HBASE |
True if HBase is installed, or false |
For example 16 nuclear CPU, 64G memory, 4 hard disk, not installed HBase, its calculation recommended configuration is as follows
Using cores=16 MEMORY=64GB disks=4 hbase=false
Profile:cores=16 MEMORY=57344MB RESERVED=8GB USABLEMEM=56GB disks=4
Num container=8
Container RAM=7168MB
Used RAM=56GB
Unused RAM=8GB
yarn.scheduler.minimum-allocation-mb=7168
yarn.scheduler.maximum-allocation-mb=57344
yarn.nodemanager.resource.memory-mb=57344
mapreduce.map.memory.mb=7168
mapreduce.map.java.opts=-xmx5734m
mapreduce.reduce.memory.mb=7168
mapreduce.reduce.java.opts=-xmx5734m
yarn.app.mapreduce.am.resource.mb=7168
yarn.app.mapreduce.am.command-opts=-xmx5734m
mapreduce.task.io.sort.mb=2867
This script downloads the address yarn-util.py