Author: those things |ArticleCan be reproduced. Please mark the original source and author information in the form of a hyperlink
Web: http://www.cnblogs.com/panfeng412/archive/2013/03/22/hadoop-capacity-scheduler-configuration.html
Refer to capacity scheduler guide and summarize the configuration parameters of capacity scheduler based on your practical experience. Most of the parts marked as red below are places where they have been pitted, hoping to help you.
Mapred. capacity-scheduler.queue. <queue-Name>. Capacity: Set the capacity of each queue in the scheduler, which refers to the percentage of slots occupied by the cluster. Note that,This configuration item must be less than or equal to 100 for all queue; otherwise, jobtracker startup may fail..
Mapred. capacity-scheduler.queue. <queue-Name>. Maximum-capacity: sets the maximum available capacity of each queue in the scheduler,The default value is-1, indicating that the cluster can occupy up to 100% of resources, which is the same as setting it to 100..
Mapred. capacity-scheduler.queue. <queue-Name>. Minimum-user-limit-percent: Limit the percentage of slots resources for each user when multiple users in the queue compete for slots. For example, when minimum-user-limit-percent is set to 25%, if four more users in the queue submit jobs at the same time, the scheduler ensures that the slots occupied by each user cannot exceed 25% of the number of slots in the queue. The default value is 100, indicating that the user is not limited.
Mapred. capacity-scheduler.queue. <queue-Name>. user-limit-factor: Set the coefficient that can be used by users in the queue. The default value is 1, indicating that each user in the queue can only occupy the capacity of the queue (that is, mapred. capacity-scheduler.queue. <queue-Name>. capacity). Therefore, you must note that,If only one queue user submits a job and you want to extend it to mapred when the cluster is not busy. capacity-scheduler.queue. <queue-Name>. the number of slots specified by maximum-capacity must increase the coefficient of user-limit-factor accordingly..
Mapred. capacity-scheduler.queue. <queue-Name>. Supports-priority: Set whether each queue in the scheduler supports the job priority.
Mapred. capacity-scheduler.maximum-system-jobs: Set the total number of jobs that can be initialized and executed concurrently in each queue in the scheduler. Note that,Each queue determines the maximum number of jobs that can be concurrently executed by each Queue according to the proportion of its own cluster slots resources (that is, mapred. capacity-scheduler.queue. <queue-Name>. capacity)! For example, if maximum-system-jobs is 20 and queue1 accounts for 10% of cluster resources, queue1 can run up to two jobs concurrently, if it happens to be a job with a long running time, other newly submitted jobs will be blocked by jobtracker and cannot be initialized!
Mapred. capacity-scheduler.queue. <queue-Name>. maximum-initialized-active-tasks: sets the upper limit of the number of tasks included in all concurrent running jobs in the queue. If this limit is exceeded, the new job submitted to this queue will be queued and cached to the disk.
Mapred. capacity-scheduler.queue. <queue-Name>. maximum-initialized-active-tasks-per-user: sets the upper limit of the number of tasks in a job run concurrently by a specific user in the queue. If this limit is exceeded, then the job submitted by the user to the queue will be queued and cached to the disk.
mapred. capacity-scheduler.queue. . init-Accept-jobs-factor: Set the coefficient of the total number of jobs (maximum-system-jobs * queue-Capacity) that can be accommodated in each queue. For example, if maximum-system-jobs is 20, queue-capacity is 10%, and init-Accept-jobs-factor is 10, the total number of jobs in the queue reaches 10*(20 * 10%) = 20, the new job will be rejected by jobtracker.