In the previous blog, we introduced the hadoop Job scheduler. We know that jobtracker and tasktracker are the two core parts in the hadoop job scheduling process. The former is responsible for scheduling and dispatching MAP/reduce jobs, the latter is responsible for the actual execution of MAP/reduce jobs and communication between them through the RPC mechanism. The source code of Job Scheduling in hadoop version 0.20.2 is analyzed below. The source code of Job Scheduling in jobtracker and tasktracker is not described in detail.
1.
Jobtracker
1.1 jobtracker
JobtrackerIt is a control class for job scheduling.IntertrackerprotocolAndTasktrackermanagerInterface (of course, there are other interfaces, Here we only care about scheduling-related interfaces), maintainTaskschedulerAnd host its lifecycle.
In Jobtracker First, construct Jobtracker Instance object, in This In process, Taskscheduler The instance will be accompanied Jobtracker Constructed together. In addition RPC , Construct RPC Server Intertrackerserver ; When Complete Jobtracker After Jobtracker Of Taskscheduler Type Variable Taskscheduler Set Tasktrackermanager The instance field is the current Jobtracker Instance . Then, call Jobtracker Of Offerservice Method to start providing services.
Jobtracker. heartbeatThe main method isTasktrackerThe method used for remote call is to assign specific tasks (TaskEncapsulatedLaunchtaskactionAnd encapsulateTaskDistributeTasktracker.
1.2 taskscheduler
taskscheduler is the abstract base class of the Job scheduler, implemented the retriable interface, specific implementations include jobqueuetaskscheduler and limittasksperjobtaskscheduler .
InTaskschedulerThrough MaintenanceConfigurationAndTasktrackermanagerMember variable to implement allJobFor more information about the scheduling process, seeAssigntasksMethod.
1.3 jobqueuetaskscheduler
JobqueuetaskschedulerInherited fromTaskscheduler, IsHadoopInDefault scheduler.FIFOScheduling queue: Job Queues are sorted by priority and submission time.
1.4 limittasksperjobtaskscheduler
LimittasksperjobtaskschedulerInherited fromJobqueuetaskscheduler, InJobqueuetaskschedulerBased on,For eachJobOfTaskThe total number is limited.
1.5 tasktrackermanager
TasktrackermanagerInterface is used to manage runningClusterAllTasktrackerInformation, it isJobtrackerImplementation, at the same timeTaskschedulerScheduling.
1.6 intertrackerprotocol
IntertrackerprotocolInterface isJobtrackerAndTasktrackerProtocol interface for inter-communication,
JobtrackerAsRPCCalledServerEnd, implementationIntertrackerprotocolInterface,TasktrackerRemote Call.
1.7 retriable
RetriableInterface for configurationConfigurationObject.
2.
Tasktracker
2.1 tasktracker
TaskThe actual executionTasktrackerInitiated,TasktrackerRegular (default:3Second, seeMrconstantsClassHeartbeat_intervalVariable) andJobtrackerConduct a communication and report yourselfTaskThe execution Status of, receiveJobtracker,TasktrackerIt will be searched cyclically.
2.2 tasktrackerstatus
TasktrackerstatusYesTasktrackerStatus class, recordedTasktrackerResource usage.
2.3 heartbeatresponse
HeartbeatresponseIs the heartbeat reply Information Class, recordedTasktrackerThe heartbeat triggeredJobtrackerThe response information generated after processing.
2.4 tasktrackeraction
TasktrackeractionRecordTasktrackerOfActionWhich has four sub-classes:Killjobaction,Killtaskaction,LaunchtaskactionAndCommittaskaction.
2.5 tasktracker $ taskinprogress
TaskinprogressYesTasktrackerInternal class, which is mainly responsibleTaskTask monitoring and specific scheduling.
2.6 tasktracker $ tasklauncher
TasklauncherIs inherited fromThreadOfTasktrackerInternal class, whereTaskinprogressLinked List:
Private
List <taskinprogress> taskstolaunch;
EachTaskinprogressThe instance corresponds to oneTaskunitTask.
TheRunThe method is the key of the subject. It cyclically determines whether or notTaskstolaunchThere are new tasks to be done. If yes, it will be taken out of the list and then called.Tasktracker. startnewtask (taskinprogress)Start a new task.
2.7 task
TaskIt is an abstract class that represents a task. It has two sub-classes:Maptask
AndReducetask.
2.8 taskrunner
TaskrunnerIsTaskIt has two sub-classes:MaptaskrunnerAndReducetaskrunner.
3.
RPC Process
3.1 rpc. getserver ()
RPCServer interface. For an instance of the specified protocol, start the service on the specified address and port.
3.2 rpc. waitforproxy ()
RPCCreate a proxy for the specified server.