Hadoop fair scheduler Guide

Source: Internet
Author: User
ArticleDirectory
    • Basic parameters
    • Advanced Parameters

I recently saw the scheduler, and found that the official hadoop documentation has not yet been written into Chinese about the fair schedguide guide and capacity scheduler guide, google hasn't found any Chinese version yet. So, I am a new expert in this class. Here we will first provide the Chinese version of the fair scheduler. Since I have been using cloudera hadoop 0.20.1 + 152, this Chinese version is based on the documentation in it. Currently, the official fair scheduler has been changed:

    • The quota file is canceled, and parameters are all set to the mapred-site.xml.
    • The task preemption feature is also canceled.
    • Added or deleted parameters.

However, the basic principles and settings are the same. This document applies to the new version of the fair scheduler.

======================================== Gorgeous split line ================ ============

Purpose

This document describes the fair scheduler (fair sched), a plug-in MAP/reduce Scheduler for hadoop. It provides a way to share large-scale clusters.

 

Introduction

Fair Scheduling is a method for assigning job resources. It aims to make all jobs obtain equivalent shared resources on average over time. When a single job is running, it uses the entire cluster. When other jobs are submitted, the system assigns the idle time slice (slot) of the task to these new jobs, in this way, each job obtains an equivalent amount of CPU time. Unlike the hadoop default scheduler that maintains a Job Queue, this feature allows small jobs to be completed within a reasonable period of time without being "Hungry" to a large job that consumes a long time. It is also a simple way to share clusters among multiple users. Fair share can be used together with job priority-priority is used as the weight to determine the ratio of the overall computing time that each job can obtain.

The fair scheduler organizes jobs according to the resource pool and distributes resources fairly to these resource pools. By default, each user has an independent resource pool so that each user can obtain an equivalent cluster resource regardless of the number of jobs they submit. You can also set the job resource pool based on the user's Unix group or job configuration (jobconf) attribute. In each resource pool, the fairsharing method is used to share the capacity between running jobs ). You can also assign weights to the resource pool to share the cluster in a non-proportional manner.

In addition to the fair share method, the fair scheduler allows the resource pool to ensure the minimum shared resources, which are used to ensure specific users, groups, or production applications.ProgramIt is useful when sufficient resources are always obtained. When a resource pool contains a job, it can obtain at least its minimum shared resources. However, when the resource pool does not completely need the shared resources, the additional parts are split between other resource pools.

In normal operations, when a new job is submitted, the fair scheduler waits for the tasks in the running job to finish to release the time slice for the new job. However, the fair scheduler also supports preemptible running jobs after a configurable timeout period. If a new job cannot obtain the smallest shared resource within a certain period of time, the job is allowed to terminate the tasks in the running job to obtain the resources required for running. Therefore, preemption can be used to ensure that "production" jobs run within a specified period of time and that hadoop clusters can be used by lab or research jobs. In addition, the job resources do not have their fair share resources within the configurable timeout period (generally greater than the minimum shared resource timeout period) half of the time, the task can also be preemptible. When selecting the task to be completed, the fair scheduler selects the recently run tasks in all jobs to minimize wasted computing. Preemptible jobs do not fail because hadoop jobs can tolerate loss of jobs, which only makes them run longer.

Finally, the fair scheduler can limit the number of concurrent jobs run per user and per resource pool. It is useful when a user must submit hundreds of jobs at a time or when a large number of jobs are executed concurrently to ensure that the intermediate data is not filled with disk space on the cluster. Setting the job limit will cause jobs that exceed the limit to be included in the scheduler queue for waiting until some early jobs in the user/resource pool have finished running. The system runs jobs in each user/resource pool based on the job priority and submission time.

 

Install

To enable the fair scheduler to run in your hadoop, you need to put it in classpath. The simplest way isHadoop-*-fairscheduler. JarSlaveHadoop_home/build/contrib/fairschedulerCopyHadoop_home/lib. You can also modifyHadoop_conf_dir/hadoop-env.shInHadoop_classpath, Add the jar package of the fair scheduler.

You also need to configure the file in hadoopHadoop_conf_dir/mapred-site.xmlSet the following attributes for hadoop to use the fair Scheduler:

    <  Property  >  

< Name > Mapred. jobtracker. taskscheduler </ Name >

< Value > Org. Apache. hadoop. mapred. fairscheduler </ Value >

</ Property >

After restarting the cluster, you can useHttp: // <jobtrackerurl>/schedurlCheck whether the fair scheduler is running. You should be able to see a "job scheduler admini" page. We will describe this page in the management chapter.

If you want to compile the fair scheduler from the source codeHadoop_homeRun in DirectoryAnt package. This operation will buildBuild/contrib/fair-scheduler/hadoop-*-fairscheduler. Jar.

 

Configuration

The fair scheduler has two configuration files --AlgorithmParameters inMapred-site.xmlAnd is calledQuota file (Allocation File)Can be used to configure resource pools, minimum shared resources, running job limits, and preemption timeout. The quota file is reloaded regularly during running, which allows you to modify the resource pool settings without restarting the hadoop cluster.

You do not need to configure the quota file for minimal installation that only requires equivalent sharing among users. If you have configured the quota file, you need to setMapred-site.xmlInMapred. fairscheduler. allocation. FileThe parameter tells the scheduler how to locate the quota file.

 

Scheduler parameters in mapred-site.xml

You canMapred-site.xmlSet the following parameters to influence the behavior of the fair Scheduler:

Basic parameters

Attribute name

Description

Mapred. fairscheduler. allocation. File

Specify the absolute path of an XML file. The file contains the minimum shared resources of each resource pool, the number of concurrent jobs run by each resource pool and the preemption timeout time. If this attribute is not set, these features will not be used. The format of the quota file is described later.

Mapred. fairschedtion. Preemption

Whether to enable the preemptible Boolean attribute. The default value is false.

Mapred. fairschedproperty. poolnameproperty

Specifies which job Configuration Attribute is used to determine the resource pool to which the job belongs. String format. Default Value:User. Name(That is, each user has a resource pool ). Another useful value isGroup. NameThat is, each Unix group has a resource pool. A common setting is to use non-standard attributes suchPool. NameAs the name attribute of the resource pool, and then add the following settings to makeUser. NameBecome default:

<Property>

<Name> pool. Name </Name>

<Value >$ {user. name} </value>

</Property>

In this way, you can explicitly specify the name of the resource pool for some jobs through job configuration attributes (for example, when there is a default user resource pool-Dpool. Name = <Name>ToBin/hadoop jar).

Advanced Parameters

Attribute name

Description

Mapred. fairschedight. sizebasedweight

When calculating the fair share weight of a job, consider the job size. By default, the weight is only based on the priority of the job. If this flag is set to true, the weight will also consider the job size (number of required tasks), but it is not linear (the weight is proportional to the logarithm of the number of required tasks ). This setting allows large jobs to provide sufficient resources for small jobs while obtaining larger resources for fair sharing, so that they can be completed quickly. Boolean value. The default value is false.

Mapred. fairschedtion. preemption. Only. Log

This flag allows the scheduler to easily record when it wants to seize a task when it encounters a preemption computation, rather than truly preemptible the task. Boolean value. The default value is false. This attribute is useful for preemptible dry run before preemption is enabled to ensure that you do not set the timeout time to be too aggressive. You will output logs in jobtracker (Hadoop_log_dir/hadoop-jobtracker-*. Log. The information is similar to the following:

Shocould preempt 2 tasks for job_20090101337_0001: tasksduetominshare = 2, tasksduetofairshare = 0

Mapred. fairscheduler. Update. Interval

Fair Share Resource calculation update interval. The default 500 ms is applicable to clusters with less than 500 nodes, but a large value can reduce the jobtracker load of larger clusters. Integer. Unit: millisecond. The default value is 500.

Mapred. fairschedtion. preemption. Interval

Check the task preemption interval. The default value is 15 seconds. It is not recommended that the timeout value be earlier than this value, but if you have set such a timeout value, you can use this value for more preemption calculation. However, the value smaller than 5 seconds is too small because it is less than the interval of heartbeat. Integer. Unit: millisecond. The default value is 15000.

Mapred. fairschedster. weightadjuster

An extension point allows you to specify a class to adjust the weight of a running job. This class should be implementedWeightadjusterInterface. There is currently an example to implement --NewjobweightboosterIt will increase the weight of the job five minutes before the job lifecycle, so that small jobs can be completed more quickly. To use this example, set the weightadjuster attribute to the full name of the class, org. Apache. hadoop. mapred. newjobweightbooster. Newjobweightbooster itself provides two parameters to set the duration and growth factor.

  • Mapred. newjobweightbooster. factor: the growth factor of the new job weight. The default value is 3.
  • Mapred. newjobweightbooster. Duration: the duration of growth, in milliseconds. The default value is, 5 minutes.

Mapred. fairschedmanager. loadmanager

An extension point allows you to specify a class to determine how many map and reduce tasks can run on a given tasktracker. This class should be implementedLoadmanagerInterface. The task load in the hadoop configuration file is used by default, but this option can be used to make the load based on such as available memory and CPU utilization.

Mapred. fairschedtor. taskselector

An extension point allows you to specify a class to determine which task in the job runs on the specified tracker. This feature can be used to change localization policies (for example, to allow some jobs to run in a specific rack) or to speculate on the algorithm (when to execute a speculative task ). The default implementation uses the default Algorithm in hadoop's jobinprogress.

 

Quota File Format

The quota file configures the minimum shared resource, running job limit, weight, and preemption timeout time for each resource pool.Hadoop_home/CONF/fair-scheduler.xml.templateAn example is provided. The quota file can contain the following types of elements:

    • PoolElement to configure each resource pool. They may contain the following column child elements:

.MinmapsAndMinreducesTo set the minimum task time slice shared by the resource pool.
.MaxrunningjobsTo limit the number of jobs that run simultaneously from the resource pool (the default value is unlimited ).
.WeightTo share the cluster with other resource pools in a non-proportional manner (1.0 by default ).
.MinsharepreemptiontimeoutIf the resource pool resource is lower than its minimum shared resource, the number of seconds that will wait before the task of other resource pools ends (the default value is unlimited ).

      •  UserElement may containMaxrunningjobsAttribute to restrict jobs. By default, each user has a resource pool, so each user's limit is not required.
      •  PoolmaxjobsdefaultTo set the default values for the resource pools that do not specify the limit for running jobs.
      • UsermaxjobsdefaultTo set default values for users who do not specify the limit for running jobs.
      • Defaultminsharepreemptiontimeout, Set the default value for those resource pools that do not specify the minimum shared resource preemption timeout time.
      • FairsharepreemptiontimeoutSet the timeout value for a job that has less than half of its fair share of resources.

    The pool and user elements are required only when you set a non-default value for the resource pool/user. That is to say, before running the fair scheduler, you do not have to declare all users and resource pools in the configuration file. If a user or resource pool is not listed in the configuration file, the default values such as running job limit and preemption timeout are used.

    The following is an example of a quota file:

      <?  XML version = "1.0"  ?>  

    < Allocations >

    < Poolname = "Sample_pool" >

    < Minmaps > 5 </ Minmaps >

    < Minreduces > 5 </ Minreduces >

    < Weight > 2.0 </ Weight >

    </ Pool >

    < Username = "Sample_user" >

    < Maxrunningjobs > 6 </ Maxrunningjobs >

    </ User >

    < Usermaxjobsdefault > 3 </ Usermaxjobsdefault >

    </ Allocations >

    In this example, a resource pool sample_pool is created to ensure that there are at least five map time slices and five reduce time slices. The resource pool has a weight of 2.0, which means that it shares resources with a cluster that is twice as large as other resource pools (by default, the weight is 1 ). In addition to sample_user, this example limits the number of other jobs per user to three. Any resource pool not defined in the quota file will receive unguaranteed capacity and 1.0 weight. At the same time, any resource pool that does not set the maximum number of Running jobs in the quota file or the user is allowed to run unlimited jobs.

    Hadoop_home/CONF/fair-scheduler.xml.templateIs a more detailed example file, and also sets the preemption timeout time.

     

    Management

    The fair scheduler provides runtime management support in two ways:

    1. Modify the minimum shared resource, running job limit, weight, and preemption timeout value in the quota file during running. The scheduler will reload the quota file within 10-15 seconds after it detects that it has been modified.

    2. Web Interface in jobtrackerHttp: // <jobtrackerurl>/schedurlTo view the current job, resource pool, and fair share resources. With this interface, you can modify the priority of a job or transfer the job from one resource pool to another, and see the effect on fair share resources (this requires JavaScript ).

    The following job fields are displayed on the web interface:

      • Submitted-Date and time when a job is submitted.
      • Jobid, User, Name-Job ID, which is the same as that on the standard Web user interface.
      • Pool-The resource pool to which the job belongs. Select another value to move the job to another resource pool.
      • Priority-Current priority. Select another value to change the priority of the job.
      • Maps/reduces finished: Number of completed tasks/total number of tasks.
      • Maps/reduces running: The task that is currently running.
      • MAP/reduce Fair Share: According to the fair share method, the average task time slice that this job should obtain at any time. The actual task time slice fluctuates up and down according to the existing computing time of the job, but generally it gets its fair amount of shared resources.

    In addition, you canHttp: // <jobtrackerurl>/Scheduler? AdvancedView the "advanced" version of the web user interface. This will show more columns:

      • Maps/reduce weight: Weight of a job in fair share calculation. This depends on priority. if priority is enabledSizebasedweightAndNewjobweightboosterIt is also related to the job size and age.
      • MAP/reduce deficit: Job Scheduling shortage on the machine-the amount of resources to be obtained based on the job's fair share of resources, minus its actual income. A positive shortage means that the job will be scheduled soon because it needs to achieve its fair share of resources. The scheduler will first schedule jobs with a high shortage. For details, see the implementation section in this article.

     

    Implementation

    Fair Scheduling is divided into two aspects: computing the fair share resources of each job, and which job to choose to run when the time slice of a task is available.

    After a job is selected, the scheduler will track the "missing" of each job-the difference between the computing time that the job deserves on the ideal scheduler and the actual computing time obtained. This is a measure of the "unfair" treatment of measurement operations. Every several hundred milliseconds, the scheduler will update the shortage of each job by checking the difference between the number of tasks run by each job in this interval and its fair share resources. When a task time slice is available, it is assigned to a job with the highest number of missing jobs. But there is one exception -- if one or more jobs fail to meet the guaranteed capacity of their resource pool, we only select between these "poor" jobs (based on their missing resources again) to ensure that the scheduler can meet the guaranteed volume of the resource pool as soon as possible.

    Fair Resource Sharing is calculated based on the "weight" of each job by dividing the cluster capacity among running jobs. The default weight is based on the job priority, and the weight of each level of priority is twice that of the next level (for example, very_high has four times the weight of normal ). However, as described in the configuration section, the weight can also be based on the job size and age. For jobs in a resource pool, fair resource sharing also considers the minimum guarantee of the resource pool, and then divides the capacity among jobs in the resource pool based on the weight of the job.

    When the limit for running jobs in a user or resource pool is not reached, we use the same method as the standard hadoop scheduler. When selecting the job to run, first, all jobs are sorted Based on the job priority, and then sorted Based on the submission time. Jobs in the preceding sorting queue that exceed the user/resource pool limit will be queued and waiting for idle time slices until they can run. During this period of time, they were ignored by fair share computing and would not receive or lose the missing value (their average fair weight was set to 0 ).

    Preemption is done by regularly checking whether the resources of jobs are less than the minimum or half of their shared resources. If a job's resources are longer than its resource sharing time, it is allowed to end tasks of other jobs. The selected task is the recently run tasks in all jobs to minimize wasted computing.

    Finally, the fair scheduler provides several extension points in the basic scalability. For example, weight calculation can be changed to a "minimum job priority" policy that increases the weight of a new job and further reduces the response time of the notebook. These extensions are listed in advanced mapred-site.xml properties.

     

    PS: Due to the layout problem of the editor, it is a bit messy. PDF download is attached here. --> Click to download <--

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.