MapReduce on yarn Simple memory allocation explanation

Source: Internet
Author: User

about how the MapReduce program runs on yarn memory allocation has always been a let me circle of things, alone to check any information can not be well understood. So, recently looked up a lot of information, comprehensive explanations, finally understand a relatively clear degree, here will understand the things to make a simple record, in case of forgetting.
First, paste the parameters about the memory allocation of mapreduce and yarn on the first:
Yarn.scheduler.minimum-allocation-mb
Yarn.scheduler.maximum-allocation-mb
Yarn.nodemanager.resource.memory-mb
Yarn.nodemanager.vmem-pmem-ratio
Yarn.scheduler.increment-allocation-mb
Mapreduce.map.memory.mb
Mapreduce.reduce.memory.mb
Mapreduce.map.java.opts
Mapreduce.reduce.java.opts
Personally, for mapreduce tasks, these parameters can only be really understood if they are put together, and if considered separately, the understanding is not clear. We'll start with a detailed explanation below.
First, understand the parameter yarn.nodemanager.resource.memory-mb,yarn.nodemanager.vmem-pmem-ratio
YARN.NODEMANAGER.RESOURCE.MEMORY-MB is very simple, it is your server node ready to be allocated to yarn memory;
Yarn.nodemanager.vmem-pmem-ratio online interpretation is "every use of 1MB physical memory, the maximum amount of virtual memory available, default 2.1", but at present I still do not quite understand what its role is, there are friends who know that can be explained in detail.
Ii. Understanding Parameters YARN.SCHEDULER.MINIMUM-ALLOCATION-MB and YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB
Know that each task is run in a separate container when running a program on yarn, and the minimum and maximum memory limits that a single container can request are the two parameters, note that not both parameters determine the size of a single container request memory. And it's just a range of limits.
Iii. Understanding Yarn's memory normalization factor and memory regularization algorithm
First of all, do not say with which parameters, a simple understanding of the concept. Example:
If the normalized factor b=512m, The above-mentioned parameter YARN.SCHEDULER.MINIMUM-ALLOCATION-MB is 1024,YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB to 8096, and then I intend to request memory resources for a single map task ( MAPREDUCE.MAP.MEMORY.MB):
When the requested resource is a=1000m, the actual resulting container memory size is 1024M ( Less than YARN.SCHEDULER.MINIMUM-ALLOCATION-MB words automatically set to YARN.SCHEDULER.MINIMUM-ALLOCATION-MB);
When the requested resource is a=1500m, the actual container memory size is 1536M and the formula is: ceiling (A/b) *b, ceiling (A/b) =ceiling (1500/512) =3,3*512=1536. If b=1024 is here, container actual memory size is 2048M
That is, container the actual memory size to a minimum of YARN.SCHEDULER.MINIMUM-ALLOCATION-MB value, and then increase the minimum increase to the normalized factor B, the maximum is not more than YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB
Iv. Understanding MAPREDUCE.MAP.MEMORY.MB and MAPREDUCE.REDUCE.MEMORY.MB
The "three" mentioned in the "plan to apply memory resources to a single map task" is a, in fact, refers to the "MAPREDUCE.MAP.MEMORY.MB" or "MAPREDUCE.REDUCE.MEMORY.MB", Note that the value does not exceed YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB
V. Understanding Mapreduce.map.java.opts and Mapreduce.reduce.java.opts
In the case of the map task, container is actually executing a script file, and the script file executes a Java sub-process, which is the real map task,mapreduce.map.java.opts is actually the start of the JVM virtual machine, The startup parameters passed to the virtual machine, and the default value,-xmx200m, indicates the maximum heap memory that the Java program can use, and once this size is exceeded, the JVM throws an out of memory exception and terminates the process. And MAPREDUCE.MAP.MEMORY.MB set is the Container memory limit, this parameter is read and controlled by NodeManager, when the Container memory size exceeds this parameter value, NodeManager will be responsible for kill Container. The process of NodeManager monitoring Container memory (including virtual memory and physical memory) and killing Container is explained when the Yarn.nodemanager.vmem-pmem-ratio parameter is analyzed later.
In other words, mapreduce.map.java.opts must be smaller than MAPREDUCE.MAP.MEMORY.MB
Mapreduce.reduce.java.opts the same truth as mapreduce.map.java.opts.
Six, understanding the normalization factor refers to which parameter
"Three" refers to the regularization factor is B, specifically, which parameter is related to yarn used by the scheduler, a total of three scheduler: Capacity scheduler (Default scheduler), Fair Scheduler and FIFO Scheduler
When using Capacity Scheduler or FIFO scheduler, Regularization factor refers to the parameter YARN.SCHEDULER.MINIMUM-ALLOCATION-MB, can not be configured individually, that is, YARN.SCHEDULER.INCREMENT-ALLOCATION-MB is not useful;
When using Fair scheduler, the normalization factor refers to the parameter YARN.SCHEDULER.INCREMENT-ALLOCATION-MB
So far, the task memory configuration problem with yarn and mapreduce is over, and this is the level I understand now.

MapReduce on yarn Simple memory allocation explanation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.