Hadoop-level queue Organization

Source: Internet
Author: User
Author: Dong | Sina Weibo: XI Cheng understands | reprinted, but the original source and author information and copyright statement must be indicated in the form of a hyperlink. Website: dongxicheng. orgmapreducehadoop-hierarchy-queues in Hadoop0.20.x or earlier versions, Hadoop adopts the level-level queue organization mode.

Author: Dong | Sina Weibo: XI Cheng understand | can be reproduced, but must be in the form of hyperlink to indicate the original source of the article and the author information and copyright statement Web site: http://dongxicheng.org/mapreduce/hadoop-hierarchy-queues/ in Hadoop 0.20.x version or earlier, hadoop uses a level-1 Queue organization.

Author:Dong| Sina Weibo: XI Cheng understands | reprinted, but the original source and author information and copyright statement of the article must be indicated in hyperlink form
Web: http://dongxicheng.org/mapreduce/hadoop-hierarchy-queues/

In Hadoop 0.20.x or earlier versions, Hadoop adopts the level-1 Queue organization mode. In this mode, administrators can divide users into several flat queues. In each queue, you can specify one or several queue administrators to manage these users, such as killing jobs of any users and modifying their job priorities. However, from the perspective of resource management, it is not enough to organize users by queue only. You also need to divide resources into these queues and allocate resources according to certain policies, this requires support from the Hadoop Job scheduler. In short, in Hadoop, the queue organization is the foundation of queue management and resource allocation.

As Hadoop is becoming more and more widely used, some users propose to support hierarchical queue organization. Typical application scenarios: In a Hadoop cluster, the Administrator divides all computing resources into several queues, each of which corresponds to an organization ", there is an organization "Org1", which is divided into 60% of resources. It contains three types of jobs:

(1) product line operations

(2) Experimental jobs: three unused projects: Proj1, Proj2, and Proj3

(3) Other types of jobs

The Org1 administrator wants to control these 60% resources more effectively. For example, when assigning most resources to product line jobs, the administrator can ensure that experimental jobs and other types of jobs have the minimum resource guarantee. Considering that the frequency of submitting product line jobs is very low, when a product line job is submitted, resources must be obtained immediately, and the remaining resources are assigned to other types of jobs. However, once the product line job is completed, experimental jobs and other types of jobs must immediately obtain unused resources. One possible configuration method is as follows:

grid {Org1 min=60% {priority min=90% {production min=82%proj1 min=6% max=10%proj2 min=6%proj3 min=6%}miscellaneous min=10%}Org2 min=40%}

This leads to the hierarchical queue organization mode.

(1)Sub-queue

1 )? Queues can be nested, and each queue can contain sub-queues.

2 )? You can only submit jobs to the bottom-layer queue, that is, the leaf queue.

(2) Minimum capacity

1) Each sub-queue has a "minimum capacity ratio" attribute, indicating the percentage of capacity that can be used by the parent queue

2) The scheduler always gives priority to the queue with the lowest resource usage and allocates resources to it. For example, if the two queues Q1 and Q2 at the same level have a minimum capacity of 30, while Q1 has 10 and Q2 has 12, the scheduler will first allocate resources to Q1.

3) the minimum capacity is not "the minimum capacity that will always be guaranteed". That is to say, if the minimum capacity of a queue is 20, and all queues in the queue only use 5, then the remaining 15 may be allocated to other required queues.

4) the minimum capacity is not less than 0, but cannot be greater than the "Maximum capacity ".

(3)Maximum capacity

1 )? To prevent a queue from over-consuming resources, you can set a maximum capacity for the queue, which is a resource usage limit. The total amount of resources used at any time cannot exceed this value.

2) by default, the maximum capacity of a queue is infinite, which means that when a queue is allocated only 20% of resources and all other queues have no jobs, this queue may use 100% of resources. When other queues have jobs submitted, they will be returned gradually.

How do I allocate resources in a queue to its subqueues?

When a TaskTracker sends a heartbeat request to a new task, the scheduler selects the task according to the following policies:

1 )? Sort all sub-queues by ratio {used capacity}/{minimum-capaity;

2 )? Select a queue with the smallest ratio of {used capacity}/{minimum-capaity:

If a leaf queue has a pending task, select a task (cannot exceed maximum capacity );

Otherwise, select a task recursively from the sub-queue of the queue.

3 )? If no task is found, view the next queue.

Hierarchical queue organization is introduced in 0.21.x and 0.22.x, but only Capacity schedity supports this Organization (https://issues.apache.org/jira/browse/MAPREDUCE-824), of course, the latest YARN (Hadoop 0.23.x and 2.0.x-alpha) the hierarchical queue support is also added for Fair Scheduler. For more information, see https://issues.apache.org/jira/browse/yarn-187.

How to configure it?

Taking 0.21.x as an example, the administrator can configure a hierarchical queue in the profile mapred-queues.xml as follows:

 
  
   
    Org1
   
   
    
     production
    
    
     
      
       
        
         
          
         
        
       
      
     
    
   
   
    
     miscellaneous
    
    
     
      
       
      
     
    
   。。。。。。。
  
 

Administrators can set the default values of some parameters in the capacity-scheduler.xml and the configuration unique to Capacity:

 
  
   
    mapred.capacity-scheduler.default-supports-priority
   
   
    false
   
  
  
   
    mapred.capacity-scheduler.default-minimum-user-limit-percent
   
   
    100
   
  
  
   
    mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user
   
   
    2
   
  
  
   
    mapred.capacity-scheduler.init-poll-interval
   
   
    5000
   
  
  
   
    mapred.capacity-scheduler.init-worker-threads
   
   
    5
   
  
 

Original article, reprinted Please note:Reposted from Dong's blog

Link:Http://dongxicheng.org/mapreduce/hadoop-hierarchy-queues/

Author: Dong, Author: http://dongxicheng.org/about/

Copyright©2012
This feed is for personal, non-inclucial use only.
The use of this feed on other websites breaches copyright. If this content is not in your news reader, it makes the page you are viewing an infringement of the copyright. (Digital Fingerprint:
)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.