Hadoop backup task scheduling mechanism

Source: Internet
Author: User

From: http://dongxicheng.org/mapreduce/hadoop-schedulers/

With the popularity of mapreduce, its open-source implementation of hadoop has become increasingly popular. In the hadoop system, a component is very important, that is, the scheduler. Its role is
Allocate idle resources in the system to the job according to certain policies. In hadoop, the scheduler is a pluggable module. You can design a scheduler based on your actual application requirements. Common in hadoop
There are three types of schedulers:

(1) default scheduler FIFO

The default scheduler in hadoop, which first selects the executed job according to the priority of the job and then the time of arrival.

(2) Capacity schedity

Multiple queues are supported, and each queue can be configured with a certain amount of resources. Each queue adopts a FIFO scheduling policy. To prevent tasks of the same user from occupying Resources in the queue exclusively, the scheduler submits requests to the same user.
The amount of resources occupied by jobs is limited. During scheduling, select an appropriate Queue according to the following policy: calculate the ratio between the number of running tasks in each queue and the computing resources that should be allocated, and select a ratio
The smallest queue. Then, select a job in the queue according to the following policies: select jobs based on the job priority and submission time sequence, and consider the user resource limit and memory limit.

(3) fair scheduler fair Scheduler

Similar to the computing capability scheduler, the scheduler supports multiple queues and multiple users. Resources in each queue can be configured. Jobs in the same queue share all resources in the queue fairly.AlgorithmSee my blog article "hadoop fair scheduler algorithm analysis"

In fact, there are more than three types of hadoop schedulers. Recently, many hadoop schedulers for new applications have emerged.

(4) scheduler late for heterogeneous Clusters

The existing hadoop schedulers are built on the assumption of homogeneous clusters. The specific assumptions are as follows:

1) the performance of each node in the cluster is identical

2) For reduce tasks, each of the three stages is copy, sort, and reduce. Each stage occupies 1/3 of the total time.

3) Tasks of the same type in the same job are completed in batches, and they are basically the same.

The existing hadoop scheduler has a major defect, mainly reflected in the algorithm for detecting backward tasks: if the progress of a task lags behind 20% of the progress of the same type of tasks
As a backward task (this task determines the job's completion time, you need to shorten its execution time as much as possible), so as to start a backup task (speculative
Task ). If the cluster is heterogeneous, the execution time of the same task on the same node is significantly different. Therefore, a large number of backup tasks are easily generated in the heterogeneous cluster.

Late (longest approximate time
End, reference [4]) The scheduler solves the problem of the existing scheduler to some extent. It defines three thresholds: speculativecap, which is executed at the maximum time in the system.
Number of speculative tasks (the recommended value is 10% of the total number of slots );
Slownodethreshold (recommended by the author): the node (fast node) with a score lower than the threshold (for the score calculation method, see the paper) will not start
Speculative
Task; slowtaskthreshold (recommended by the author): When the task progress is lower than the average progress of similar tasks in the same batch
When slowtaskthreshold is enabled, speculative is started for the task.
Task. Its scheduling policy is: when a node has idle resources and the total number of backup tasks in the system is smaller than speculativecap, (1) if the node is a fast node (the node score is higher
Slownodethreshold), the request is ignored. (2) sort the currently running tasks by the estimated remaining completion time
(3) Select the task with the maximum remaining completion time and the progress below slowtaskthreshold to start the backup task for the task.

(5) deadline schedline and constraint-based Scheduler for real-time jobs

This scheduler is mainly used for a time-limited job (deadline job), that is, a deadline time for the job to be completed within that time. In fact, this type of scheduler can be divided into two types: Soft Real-Time (allow a job to have a certain timeout) Job scheduler and hard real-time (the job must be completed strictly on time) Job scheduler.

Deadline schedline (reference [5]) is mainly used for Soft Real-time jobs. The scheduler dynamically adjusts the resources of jobs based on their running progress and remaining time, so that the job can be completed within the deadline time as much as possible.

constraint-based
scheduler (reference [6]) mainly targets hard real-time jobs, based on the deadline of the job and the running status of the real-time job in the current system, the scheduler predicts that the newly submitted real-time job cannot be completed within the deadline time. If not, the job is fed back to the user and asked to re-adjust the deadline of the job.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.