Introduction of three job scheduling algorithms in Hadoop cluster

Source: Internet
Author: User

There are three job scheduling algorithms in Hadoop cluster, FIFO, fair scheduling algorithm and computing ability scheduling algorithm.
First -Come-first service (FIFO)
Default Scheduler in HadoopFIFO, it first according to the priority level of the job, and then according to the time of arrival to choose the job to be executed.
FIFO is simple, there is only one job queue in Hadoop, and the submitted jobs are queued in sequence in the job queue, and the new jobs are inserted at the end of the team. After a job runs, it always takes a job from the first team to run. The advantages of this scheduling strategy are simple, easy to implement, and reduce the burden of jobtracker. But its shortcomings are also obvious, it is all the work of the same, did not take into account the urgency of the operation, in addition to the operation of small jobs unfavorable.
Fair Scheduling Strategy
This strategy configures a task slot in the system, and a task slot can run a task that is a small job with a large job being sliced. When a user submits multiple jobs, each job can be assigned to a certain task slot to perform task tasks (here the task slots can be understood to run a map task or reduce task). If the whole Hadoop cluster job scheduling is compared with the operating system job scheduling, the first FIFO is equivalent to the early single-channel batch processing system in the operating system, each time in the system only one job is running, and the fair dispatch is equivalent to multi-channel batch processing system, it realizes the same time multi-channel operation simultaneously. Since Linux is multi-user, what happens if multiple users submit multiple jobs at the same time? In this strategy, each user is assigned a job pool, and then a minimum number of shared slots is set for each job pool, and what is the minimum number of shared slots? To understand a minimum meaning, the minimum is that as long as this job pool needs, the scheduler should ensure that the minimum number of task slots for this job pool needs to be satisfied, but how to ensure that it needs to have an empty task slot, one way is to assign a certain number of slots fixed to the job pool does not move, This number is at least the minimum task slot value, so as long as the job pool needs to be allocated to it, but this is not used in this job pool so many task slots can be wasted, this strategy is actually done, when the job pool needs not reached the minimum number of task slots, Nominally is their own remaining task slots will be assigned to other needs of the job pool, when a job pool needs to apply for a task slot if the system is not, this time will not go to preempt others (also do not know who rob AH), as long as the current one empty task slot release will be immediately assigned to the job pool.
In a user's job pool, how many jobs are allocated slots This can be self-selected such as FIFO. So this scheduling strategy is divided into two levels:
First level, allocate slots between pools, and in multi-user scenarios, each user is assigned a job pool.
Second level, within the job pool, each user can use a different scheduling policy.
Computing Power Scheduling
Computing power scheduling and fair scheduling is a bit similar, the fair scheduling strategy is to assign task slots in the job pool, and the compute capacity scheduling is to allocate Tasktracker (a node in the cluster) in the queue, which configures multiple queues. Each queue is configured with a minimum amount of tasktracker, similar to a fair scheduling policy, when a queue has idle tasktracker, the scheduler allocates the idle to other queues, and when there are idle Tasktracker, Since there may be multiple queues that do not receive the minimum amount of tasktracker and are applying for new, idle Tasktracker will be prioritized into the most hungry queues, how to measure the level of hunger? It is possible to determine whether the ratio between the number of tasks running in the queue and the compute resources it is assigned to is the lowest, indicating that the higher the hunger level is.
The computational capacity scheduling policy organizes jobs in a queue, so a user's job may be in multiple queues, and if the user is not restricted, it is likely that there will be a serious injustice between multiple users. So when the new job is selected to run, you also need to consider whether the user who owns the job exceeds the resource limit, and if it is exceeded, the job will not be selected.
For the same queue, this policy uses a priority-based FIFO policy, but does not preempt.

Introduction of three job scheduling algorithms in Hadoop cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.