Yarn Memory Configuration Guide

Source: Internet
Author: User
Tags python script

Yarn requires a lot of memory configuration, this article only gives some recommendations and suggestions, actually according to the specific business logic to set

First, it needs to be clear that in yarn, the entire cluster of resources requires memory, hard disk, CPU (CPU core number) Three to decide, must realize the balance of three, in the actual production environment, hard disk is large enough, so rarely consider the hard drive, here for the time being a hard disk as a factor as a reference.

When computing the memory of a node, you need to consider the memory requirements of the operating system, NM memory requirements, and the memory requirements of other systems on that node (for example, HBase, below, for example, HBase),

So yarn available memory = Total system memory-preserve memory for the operating system-preserves memory for HBase

The operating system and HBase memory reference values are as follows

Node Total memory Memory reserved by the operating system Memory reserved by hbase
4 GB 1 GB 1 GB
8 GB 2 GB 1 GB
GB 2 GB 2 GB
GB 4 GB 4 GB
GB 6 GB 8 GB
GB 8 GB 8 GB
GB 8 GB 8 GB
GB GB GB
128 GB GB GB
256 GB Gb Gb
GB GB GB

Then, the maximum number of containers per node can be calculated by using the following formula

Containers=min (2*cpu,1.8disks, yarn free memory)/container minimum memory)

Each container minimum memory is dependent on the yarn available memory, and the minimum memory and available memory relationships are as follows:

Available memory per node Container Minimum memory recommended value
Less than 4 GB 256 MB
Between 4 GB and 8 GB Mb
Between 8 GB and GB 1024 MB
Above GB 2048 MB

According to the above reference values and calculation formulas, we can calculate the number of nodes container, then each container can use the memory can be obtained by the following formula

Each container memory =max (container minimum memory, yarn number of available memory/container)

With the above calculations, the yarn and Mr Memory recommendations are configured as follows:

Configuration file Configuration Item Name Configuration Item Value
Yarn-site.xml Yarn.nodemanager.resource.memory-mb = containers number * Each container memory
Yarn-site.xml Yarn.scheduler.minimum-allocation-mb = per container memory
Yarn-site.xml Yarn.scheduler.maximum-allocation-mb = containers number * Each container memory
Mapred-site.xml Mapreduce.map.memory.mb = per container memory
Mapred-site.xml Mapreduce.reduce.memory.mb = 2 * per container memory
Mapred-site.xml Mapreduce.map.java.opts = 0.8 * per container memory
Mapred-site.xml Mapreduce.reduce.java.opts = 0.8 * 2 * per container memory
Yarn-site.xml (check) Yarn.app.mapreduce.am.resource.mb = 2 * per container memory
Yarn-site.xml (check) Yarn.app.mapreduce.am.command-opts = 0.8 * 2 * per container memory

HDP also publishes a Python script yarn-util.py to simplify the calculation, which has four parameters

Parameters Describe
-C Cores CPU cores per node
-M MEMORY Total memory per node (unit g)
-D Disks Number of hard disks per node
-K HBASE True if HBase is installed, or false

For example 16 nuclear CPU, 64G memory, 4 hard disk, not installed HBase, its calculation recommended configuration is as follows

Using cores=16 MEMORY=64GB disks=4 hbase=false
Profile:cores=16 MEMORY=57344MB RESERVED=8GB USABLEMEM=56GB disks=4
Num container=8
Container RAM=7168MB
Used RAM=56GB
Unused RAM=8GB
yarn.scheduler.minimum-allocation-mb=7168
yarn.scheduler.maximum-allocation-mb=57344
yarn.nodemanager.resource.memory-mb=57344
mapreduce.map.memory.mb=7168
mapreduce.map.java.opts=-xmx5734m
mapreduce.reduce.memory.mb=7168
mapreduce.reduce.java.opts=-xmx5734m
yarn.app.mapreduce.am.resource.mb=7168
yarn.app.mapreduce.am.command-opts=-xmx5734m
mapreduce.task.io.sort.mb=2867


This script downloads the address yarn-util.py

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.