Distributed System hadoop source code reading and Analysis (I): Job scheduler implementation mechanism

Source: Internet
Author: User

In the previous blog, we introduced the hadoop Job scheduler. We know that jobtracker and tasktracker are the two core parts in the hadoop job scheduling process. The former is responsible for scheduling and dispatching MAP/reduce jobs, the latter is responsible for the actual execution of MAP/reduce jobs and communication between them through the RPC mechanism. The source code of Job Scheduling in hadoop version 0.20.2 is analyzed below. The source code of Job Scheduling in jobtracker and tasktracker is not described in detail.

1.
Jobtracker

1.1 jobtracker

JobtrackerIt is a control class for job scheduling.IntertrackerprotocolAndTasktrackermanagerInterface (of course, there are other interfaces, Here we only care about scheduling-related interfaces), maintainTaskschedulerAnd host its lifecycle.

In Jobtracker First, construct Jobtracker Instance object, in This In process, Taskscheduler The instance will be accompanied Jobtracker Constructed together. In addition RPC , Construct RPC Server Intertrackerserver ; When Complete Jobtracker After Jobtracker Of Taskscheduler Type Variable Taskscheduler Set Tasktrackermanager The instance field is the current Jobtracker Instance . Then, call Jobtracker Of Offerservice Method to start providing services.

Jobtracker. heartbeatThe main method isTasktrackerThe method used for remote call is to assign specific tasks (TaskEncapsulatedLaunchtaskactionAnd encapsulateTaskDistributeTasktracker.

1.2 taskscheduler

taskscheduler is the abstract base class of the Job scheduler, implemented the retriable interface, specific implementations include jobqueuetaskscheduler and limittasksperjobtaskscheduler .

InTaskschedulerThrough MaintenanceConfigurationAndTasktrackermanagerMember variable to implement allJobFor more information about the scheduling process, seeAssigntasksMethod.

1.3 jobqueuetaskscheduler

JobqueuetaskschedulerInherited fromTaskscheduler, IsHadoopInDefault scheduler.FIFOScheduling queue: Job Queues are sorted by priority and submission time.

1.4 limittasksperjobtaskscheduler

LimittasksperjobtaskschedulerInherited fromJobqueuetaskscheduler, InJobqueuetaskschedulerBased on,For eachJobOfTaskThe total number is limited.

1.5 tasktrackermanager

TasktrackermanagerInterface is used to manage runningClusterAllTasktrackerInformation, it isJobtrackerImplementation, at the same timeTaskschedulerScheduling.

1.6 intertrackerprotocol

IntertrackerprotocolInterface isJobtrackerAndTasktrackerProtocol interface for inter-communication,
JobtrackerAsRPCCalledServerEnd, implementationIntertrackerprotocolInterface,TasktrackerRemote Call.

1.7 retriable

RetriableInterface for configurationConfigurationObject.

2.
Tasktracker

 

2.1 tasktracker

TaskThe actual executionTasktrackerInitiated,TasktrackerRegular (default:3Second, seeMrconstantsClassHeartbeat_intervalVariable) andJobtrackerConduct a communication and report yourselfTaskThe execution Status of, receiveJobtracker,TasktrackerIt will be searched cyclically.

2.2 tasktrackerstatus

TasktrackerstatusYesTasktrackerStatus class, recordedTasktrackerResource usage.

2.3 heartbeatresponse

HeartbeatresponseIs the heartbeat reply Information Class, recordedTasktrackerThe heartbeat triggeredJobtrackerThe response information generated after processing.

2.4 tasktrackeraction

TasktrackeractionRecordTasktrackerOfActionWhich has four sub-classes:Killjobaction,Killtaskaction,LaunchtaskactionAndCommittaskaction.

2.5 tasktracker $ taskinprogress

TaskinprogressYesTasktrackerInternal class, which is mainly responsibleTaskTask monitoring and specific scheduling.

2.6 tasktracker $ tasklauncher

TasklauncherIs inherited fromThreadOfTasktrackerInternal class, whereTaskinprogressLinked List:

Private
List <taskinprogress> taskstolaunch;

EachTaskinprogressThe instance corresponds to oneTaskunitTask.

TheRunThe method is the key of the subject. It cyclically determines whether or notTaskstolaunchThere are new tasks to be done. If yes, it will be taken out of the list and then called.Tasktracker. startnewtask (taskinprogress)Start a new task.

2.7 task

TaskIt is an abstract class that represents a task. It has two sub-classes:Maptask
AndReducetask.

2.8 taskrunner

TaskrunnerIsTaskIt has two sub-classes:MaptaskrunnerAndReducetaskrunner.

3.
RPC Process

3.1 rpc. getserver ()

RPCServer interface. For an instance of the specified protocol, start the service on the specified address and port.

3.2 rpc. waitforproxy ()

RPCCreate a proxy for the specified server.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.