CDH Cluster tuning: Memory, Vcores, and DRF

Source: Internet
Author: User

Original address: Http://blog.selfup.cn/1631.html?utm_source=tuicool&utm_medium=referral

Spit Groove

Recently "idle" to have nothing to do, through the CM to vcores use situation to look at a glance, found that no matter how many tasks in the cluster running, the allocated vcores will never exceed 120. The available vcores for the cluster are 360 (15 machines x24 virtual cores). That's equivalent to 1/3 of CPU resources, and as a semi-obsessive-compulsive disorder, this is something that can never be tolerated.

Analysis of the process is not a table, in fact, very simple is a few parameters of the problem. I thought cm would be smart enough to get these things right, and now it doesn't seem to work. The following records the conclusion. DRF and related parameters

Drf:dominant Resource Fairness, the resources are dispatched fairly according to CPU and memory. CDH The DRF scheduling policy that is used by the dynamic resource pool by default. Simple understanding is that when the memory is not enough, the extra CPU will not be assigned to the task, let him empty, when the CPU is not enough, the extra memory will not start the task again.

After understanding this plan strategy and then reviewing the resource-related parameters of the Yarn startup task, it is found that the following parameters may have an impact: MAPREDUCE.MAP.MEMORY.MB , map task memory, CDH default 1G Mapreduce.map.cpu.vcores , Map task virtual CPU cores, CDH default 1 mapreduce.reduce.memory.mb , reduce task memory, CDH default 1G Mapreduce.reduce.cpu.vcores , reduce task virtual CPU cores, CDH default 1 yarn.nodemanager.resource.memory-mb , container memory, CDH default 8G yarn.nodemanager.resource.cpu-vcores , container virtual CPU core number, CDH default 8, but CM will automatically detect the number of cores and modify, I have been automatically changed to 24.

You can see the default configuration in which the CPU cores and memory are 1:1g to start the task.

Then look at the memory assigned to yarn, is really 8 (container memory) x15 (number of cluster nodes) =120g, so the available memory is much smaller than the available vcores (360), resulting in a maximum of 120 vcores per 1:1g scale. These are just speculations.

Test

In order to confirm my guess, the yarn.nodemanager.resource.memory-mb was tuned to 16G (we have 128G of memory, enough tube). After restarting yarn, start Mr Again, and you have the following image:

You can see the parameter adjustment, yarn available memory is 120G, adjusted to become 240g;vcores 120 before the adjustment becomes 240. At this point, the conjecture is correct.

So for this cluster, because the memory is 128G, the kernel is 24, so the YARN.NODEMANAGER.RESOURCE.MEMORY-MB parameter can be adjusted to 24G, so that all the CPU can be used up. Test Results

When the YARN.NODEMANAGER.RESOURCE.MEMORY-MB is 8G:

Time taken:3794.17 secondstotal MapReduce CPU time Spent:3 days, hours, minutes seconds 640 msec

When the YARN.NODEMANAGER.RESOURCE.MEMORY-MB is 16G:

Time taken:2077.138 secondstotal MapReduce CPU time spent:3 days hours minutes seconds

Can see really fast a lot more. (PS: Two run of the task to use the same data, so as not to cache the second run the same task will be faster than the first time, but two times the amount of data used in the task is approximately 650G) other View Vcores SQL

Select Allocated_vcores_cumulative, available_vcores where Category=yarn_pool and Servicename= "YARN" and Queuename= Root
View the memory SQL assigned to yarn
Select Allocated_memory_mb_cumulative, AVAILABLE_MEMORY_MB where Category=yarn_pool and Servicename= "YARN" and Queuename=root

Of course, the simplest way to see this is in CM's "Dynamic resource Pool" page. ------------------------------Split Line--------------------------

Blogger: This article will make it easy to understand the relationship between the memory configuration of the spark cluster and the number of vcores, which is a very good blog post. From the text we know that the original blogger's cluster at the beginning of the number of vcores can only use 1/3 is because the container memory settings in yarn is too small, and each available VCore to occupy 1G of memory, so the container memory limits the number of vcores available.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.