Yarn memory allocation management mechanism and related parameter configuration, yarn Mechanism
Understanding the memory management and allocation mechanism of Yarn is particularly important for us to build and deploy clusters, develop and maintain applications, and I have done some research for your reference.
I. Related configurations
Yarn memory allocation and management mainly involves the concepts of ResourceManage, ApplicationMatser, and NodeManager, and related optimization should also be carried out closely around these aspects. There is also a concept of iner. Now we can understand it as a Container for running map/reduce tasks, which will be described in detail later.
1.1 RM memory resource configuration, which is related to resource scheduling
RM1:yarn.scheduler.minimum-allocation-mb
Minimum memory that can be applied to a single AM container
RM2:yarn.scheduler.maximum-allocation-mb
Maximum memory that can be applied to a single AM container
Note:
- The minimum value can calculate the maximum number of containers on a node.
- Once set, it cannot be changed dynamically
Memory resource configuration of 1.2 NM, which is related to hardware resources
NM1:yarn.nodemanager.resource.memory-mb
Maximum node memory available
NM2:yarn.nodemanager.vmem-pmem-ratio
Virtual Memory rate, 2.1 by default
Note:
- The RM1 and RM2 values cannot be greater than the NM1 values.
- NM1 can calculate the maximum number of containers on a node, max (Container) = NM1/RM1
- Once set, it cannot be changed dynamically
1.3 AM memory configuration parameters, which are task-related
AM1:mapreduce.map.memory.mb
Memory size allocated to map iner
AM2:mapreduce.reduce.memory.mb
Memory size allocated to reduce iner
- The two values should be between the two values RM1 and RM2.
- The value of AM2 should be twice the value of AM1.
- These two values can be changed at startup.
AM3:mapreduce.map.java.opts
Jvm parameters for running a map task, such as-Xmx and-Xms
AM4:mapreduce.reduce.java.opts
Jvm parameters for running reduce tasks, such as-Xmx and-Xms
Note:
- The two values should be between AM1 and AM2.
2. understanding of these configuration concepts
If you know the parameters, you still need to understand how to allocate them. The following figure shows the meaning of each parameter.
As shown in, first look at the bottom brown part,
AM Parametersmapreduce.map.memory.mb=1536MB
Which indicates that AM needs to apply for 2048 MB of resources for map iner, but the memory actually allocated by RM is MB, becauseyarn.scheduler.mininum-allocation-mb=1024MB
This defines that RM should be allocated at least 1024 MB, and 1536MB exceeds this value, so the actual value allocated to AM is 2048 MB (this involves the normalization factor, at the end of this Article ).
AM Parametersmapreduce.map.java.opts=-Xmx 1024m
Indicates that the jvm memory used to run the map task is 1024 MB. Because the map task runs in the Container, the value of this parameter is slightly smallermapreduce.map.memory.mb=1536MB
This value.
NM Parametersyarn.nodemanager.vmem-pmem-radio=2.1
This indicates that NodeManager can allocate 2.1 times of virtual memory to map/reduce iner, the actual Virtual Memory allocated to the map iner Container is 2048*2.1 = 3225.6 MB. If the actual memory used exceeds this value, NM will kill the map Container, an exception occurs during task execution.
AM Parametersmapreduce.reduce.memory.mb=3072MB
The Container size allocated to reduce Container is 3072 MB, while the size of map Container is 1536 MB, the reduce iner Container is preferably twice the map Container size.
NM Parametersyarn.nodemanager.resource.mem.mb=24576MB
The value indicates the available memory allocated to the Node Manager, that is, the memory size of the node used to execute the yarn task. This value should be configured based on the actual server memory size. For example, if the memory of the hadoop Cluster machine is 128 GB, we can allocate 80% of the memory to yarn, that is, 102 GB.
The two RM parameters are 8192 MB and MB respectively, indicating the maximum and minimum values allocated to AM map/reduce Container.
Iii. Task submission process 3.1 task submission process
3.2 about Container
(1) Container is the abstraction of resources in YARN. It encapsulates a certain amount of resources on a node (CPU and memory resources ). It has nothing to do with Linux iner. It is just a concept proposed by YARN (implementation can be seen as a Java class that can be serialized/deserialized ).
(2) The resource scheduler in ResouceManager asynchronously allocates the Container application from ApplicationMaster to ResourceManager to ApplicationMaster;
(3) The Container operation is initiated by the ApplicationMaster to the NodeManager where the resource is located. The Container must provide internal execution task commands (any commands can be used, such as java, Python, and C ++ process startup commands) and the environment variables and external resources (such as dictionary files, executable files, and jar packages) required for executing the command ).
In addition, the Container required by an application is divided into two categories:
(1) run the appliner of ApplicationMaster: This is applied and started by ResourceManager (to the internal resource Scheduler). When you submit an application, you can specify the resources required for the unique ApplicationMaster;
(2) Container for running various tasks: This is applied from ApplicationMaster to ResourceManager and started by ApplicationMaster to communicate with NodeManager.
The above two types of iner may be on any node, and their locations are generally random, that is, the ApplicationMaster may run on the same node with the tasks it manages.
Container is one of the most important concepts in YARN. It is important to understand the resource model of YARN.
Note: For example, map/reduce tasks run in the iner, so the mapreduce mentioned above. map (reduce ). memory. the mb size is greater than that of mapreduce. map (reduce ). java. the size of the opts value.
Iv. HDP platform parameter optimization suggestions
Based on the knowledge above, we can set relevant parameters according to our actual situation. Of course, we also need to continuously check and adjust the parameters during the testing process.
The following are the configuration suggestions provided by hortonworks:
Http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_installing_manually_book/content/rpm-chap1-11.html
4.1 Memory Allocation
Reserved Memory = Reserved for stack memory + Reserved for HBase Memory (If HBase is on the same node)
The total system memory is 126 GB, and the reserved memory is 24 GB for the operating system. If Hbase exists, the reserved memory is 24 GB for hbase.
The following calculation assumes that Hbase is deployed on the Datanode node.
4.2 containers calculation:
MIN_CONTAINER_SIZE = 2048 MBcontainers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)\# of containers = min (2*12, 1.8*12, (78 * 1024) / 2048)\# of containers = min (24,21.6,39)\# of containers = 22
Container memory computing:
RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))RAM-per-container = max(2048, (78 * 1024) / 22))RAM-per-container = 3630 MB
4.3 Yarn and Mapreduce parameter configuration:
yarn.nodemanager.resource.memory-mb = containers * RAM-per-containeryarn.scheduler.minimum-allocation-mb = RAM-per-containeryarn.scheduler.maximum-allocation-mb = containers * RAM-per-containermapreduce.map.memory.mb = RAM-per-containermapreduce.reduce.memory.mb = 2 * RAM-per-containermapreduce.map.java.opts = 0.8 * RAM-per-containermapreduce.reduce.java.opts = 0.8 * 2 * RAM-per-containeryarn.nodemanager.resource.memory-mb = 22 * 3630 MByarn.scheduler.minimum-allocation-mb = 3630 MByarn.scheduler.maximum-allocation-mb = 22 * 3630 MBmapreduce.map.memory.mb = 3630 MBmapreduce.reduce.memory.mb = 22 * 3630 MBmapreduce.map.java.opts = 0.8 * 3630 MBmapreduce.reduce.java.opts = 0.8 * 2 * 3630 MB
Appendix: normalization factor Introduction
To facilitate resource management and scheduling, Hadoop YARN has a built-in resource normalization algorithm, which specifies the minimum available resource, maximum available resource, and resource normalization factor, if the amount of resources requested by the application is less than the minimum amount of resources that can be requested, YARN will change its size to the minimum amount that can be applied. That is to say, the application will not obtain resources less than the amount of resources that you have applied, but it is not necessarily equal. If the amount of resources requested by the application is greater than the maximum amount of available resources, an exception will be thrown and the application cannot be successful. The normalization factor is used to normalize the application resources, if the resource requested by the application is not an integer multiple of the factor, the value corresponding to the smallest integer multiple is changed to ceil (a/B) * B, a is the resource applied by the application, and B is the normalization factor.
For example, in the yarn-site.xml, the related parameters are as follows:
Yarn. scheduler. minimum-allocation-mb: minimum Memory size that can be applied. The default value is 1024yarn. scheduler. minimum-allocation-vcores: minimum number of available CPUs. The default value is 1yarn. scheduler. maximum-allocation-mb: maximum available memory. The default value is 8096yarn. scheduler. maximum-allocation-vcores: maximum number of available CPUs. The default value is 4.
For normalization factors, different schedulers are different as follows:
FIFO and Capacity schedity. The normalization factor is equal to the minimum amount of resources that can be requested and cannot be configured separately.
Fair Scheduler: normalization factor passing Parametersyarn.scheduler.increment-allocation-mb
Andyarn.scheduler.increment-allocation-vcores
The default values are 1024 and 1.
According to the above introduction, the amount of resources requested by the application may be greater than the amount of resources requested by the resource. For example, the minimum amount of memory available for the YARN is 1024, And the normalization factor is 1024, if an application requests 1500 memory, it will get 2048 memory. If the normalization factor is 512, it will get 1536 memory.
Reprinted please describe the Source: This article link
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.