One of the most important features of Jobtracker is state monitoring, including monitoring of runtime states such as Tasktracker, job, and task, where Tasktracker status monitoring is relatively straightforward, Just record their latest heartbeat report Time and health status (detected by the tasktracker-side monitoring script and send the results to Jobtracker by Heartbeat).
Job Description Model
As shown
Jobtracker describes and tracks the running status of each job in its interior with a "three-tier multi-tree" approach. Jobtracker creates a Jobinprogress object for each job to track and monitor its running state. The object exists during the entire run of the job: it is created when the job is submitted and destroyed when the job runs. At the same time, in order to solve the problem with a divide-and-conquer strategy, Jobtracker will split each job into several tasks and create a Taskinprogress object for each task to track and monitor its running state, while the task may be running because of software bugs, A hardware failure, for reasons such as failure, Jobtracker will rerun the task in accordance with a certain policy, that is, each task may attempt to run multiple times until the run succeeds or fails due to more than the number of attempts. Jobtracker each run of a task is called a "task run attempt", or task attempt. For a task, as long as there is one task attempt running successfully, the corresponding Taskinprogress object will mark that the task is running successfully, and when all taskinprogress are marked that their corresponding task runs successfully, The Jobinprogress object identifies the entire job as successfully running.
In order to differentiate between the various jobs,
Jobtracker will give each job a unique ID. This ID consists of three parts: Job prefix string, Jobtracker start time and job submission order, each part through "_" to form a complete job ID, such as job_201208071706_0009, the corresponding three parts are "job", " 201208071706 "and" 009 "(the 9th job since Jobtracker was run). The ID of each task inherits the ID of the job and expands on that basis, which consists of three parts: the Job ID (where the prefix string becomes "task"), the task type (map or reduce), and the task number (starting from 000000 to 999999). For example, task_201208071706_0009_m_000000 indicates that it has a job ID of task_201208071706_0009, a task type of map, and a task number of 000000. The ID of each task attempt inherits the ID of the task, which consists of two parts: the task ID (where the prefix string becomes "attempt") and the number of run attempts (starting at 0), for example, attempt_201208071706_0009_m_000000_ 0 indicates the No. 0 attempt of the task task_201208071706_0009_m_000000.
Jobinprogress
The Jobinprogress class is mainly used to monitor and track the operational state of the job, and provides the lowest level of dispatch interface for the scheduler.
Jobinprogress maintains two types of job information: static information, which is determined at the time the job is submitted, and dynamic information that changes dynamically as the job runs.
(1) Job static information
Job static information refers to the property information that has been determined when the job is submitted, including the following items:
Org.apache.hadoop.mapred.JobInProgress.java
//map task, reduce task, Cleanup task and setup task corresponding taskinprogressTaskinprogress maps[] =Newtaskinprogress[0]; Taskinprogress reduces[] =Newtaskinprogress[0]; Taskinprogress cleanup[] =Newtaskinprogress[0]; Taskinprogress setup[] =Newtaskinprogress[0];intNummaptasks =0;//map Task Count intNumreducetasks =0;//reduce Task Count Final LongMemorypermap;//The amount of memory required for each map task Final LongMemoryperreduce;//amount of memory required per reduce task volatile intNumslotspermap =1;//Number of slots required per map task volatile intNumslotsperreduce =1;//Number of slots required per reduce task / * Allow the number of failed tasks on each tasktracker, default is 4, set by parameter mapred.max.tracker.failures. When the job fails on a tasktracker by more than this value, the node is added to the job's blacklist, and the scheduler no longer assigns the job's tasks to that node * / Final intMaxtaskfailurespertracker;Private Static floatDefault_completed_maps_percent_for_reduce_slowstart =0.05F//When a 5% map task is completed, the reduce task can be dispatched intCompletedmapsforreduceslowstart =0;//How many map tasks are completed after you start scheduling the reduce task Final intMapfailurespercent;//allowable map task failure scale limit, set by parameter Mapred.max.map.failures.percent Final intReducefailurespercent;//allowable reduce task failure scale limit, set by parameter Mapred.max.reduce.failures.percentJobpriority priority = Jobpriority.normal;//Job Priority
(2) Job dynamic information
Job dynamic information refers to information that is dynamically updated during a job run. This information is useful for discovering tasktracker/job/task faults, and it can also provide a basis for scheduling task scheduling.
intRunningmaptasks =0;//number of map tasks that are running intRunningreducetasks =0;//number of reduce tasks that are running intFinishedmaptasks =0;//number of map tasks running completed intFinishedreducetasks =0;//Number of reduce task runs completed intFailedmaptasks =0;//Number of failed map Task attempt intFailedreducetasks =0;//Attempt number of failed reduce Task intSpeculativemaptasks =0;//number of running backup tasks (MAP) intSpeculativereducetasks =0;//number of running backup tasks (REDUCE) intFailedmaptips =0;//Number of failed taskinprogress (MAP), which means that the corresponding input data will be discarded without producing the final result intFailedreducetips =0;//Number of failed taskinprogress (REDUCE) Private volatile BooleanLaunchedcleanup =false;//Whether the cleanup Task has been started Private volatile BooleanLaunchedsetup =false;//Whether Setup Task has been started Private volatile Booleanjobkilled =false;//Whether the job has been killed Private volatile Booleanjobfailed =false;//Whether the job has failed //Networktopology Node to the set of TIPsMap<node, list<taskinprogress>> Nonrunningmapcache;//node mapping with taskinprogress, that is, taskinprogress input data location and node correspondence //Map of Networktopology Node to set of running TIPsMap<node, set<taskinprogress>> Runningmapcache;//node and the task mapping relationship that is running above //A list of non-local, non-running maps FinalList<taskinprogress> Nonlocalmaps;//Map task that does not need to consider data locality, if the inputsplit location of a map task is empty, there is no need to consider the local nature when scheduling tasks //Set of failed, non-running maps sorted by #failures FinalSortedset<taskinprogress> Failedmaps;//Tip collection sorted by number of failures //A set of non-local running mapsSet<taskinprogress> Nonlocalrunningmaps;//non-running Map task collection //A list of non-running reduce TIPsSet<taskinprogress> nonrunningreduces;//non-running reduce task collection //A set of running reduce TIPsSet<taskinprogress> runningreduces;//The set of reduce tasks that are running //A list of cleanup tasks for the map task attempts, to be launchedList<taskattemptid> Mapcleanuptasks =NewLinkedlist<taskattemptid> ();//List of map tasks to be cleaned, such as the task that the user killed directly by the command "Bin/hadoop Job-kill" //A list of cleanup tasks for the reduce task attempts, to be launchedList<taskattemptid> Reducecleanuptasks =NewLinkedlist<taskattemptid> ();LongStartTime;//Job submission time LongLaunchtime;//Job Start execution Time LongFinishtime;//Job completion time
Taskinprogress
The Taskinprogress class maintains all of the information in a task's running process. In Hadoop, because a task may presumably execute or re-execute, there are multiple task attempt, and at the same time, there may be multiple simultaneous tasks that are being executed simultaneously, and these tasks are managed and tracked by the same Taskinprogress object, As soon as any one of the tasks tries to run successfully, Taskinprogress will note that the task was successfully executed.
Private FinalTasksplitmetainfo Splitinfo;//task The split information to be processed Private intNummaps;//map Task count, only useful for reduce task Private intPartition//Index of the task in the Task List PrivateJobtracker Jobtracker;//jobtracker object for getting the global clock PrivateTaskID ID;//task ID, which is followed by a subscript to form the task attempt ID PrivateJobinprogress job;//The jobinprogress in which the taskinprogress is located Private Final intnumslotsrequired;//Number of slots required to run the task //Status of the TIP Private intSuccesseventnumber =-1;Private intNumtaskfailures =0;//task attempt number of failures Private intNumkilledtasks =0;number of//task attempt killed Private DoubleProgress =0;//Task Run Progress PrivateString State ="";//Operating status Private LongStartTime =0;//taskinprogress Object creation time Private LongExecstarttime =0;//First task attempt start run time Private LongExecfinishtime =0;//Last run successful task attempt finish time Private intcompletes =0;//task Attempt run complete number, actual only two values: 0 and 1 Private BooleanFailed =false;//Whether the taskinprogress failed to run Private Booleankilled =false;//Whether the taskinprogress was killed Private BooleanJobcleanup =false;//Whether the taskinprogress is a cleanup Task Private BooleanJobsetup =false;///Whether the taskinprogress is a setup Task //The ' next ' usable taskid of this tip intNexttaskid =0;//The next available task attempt ID for the taskinprogress //The taskid that took this TIP to SUCCESS PrivateTaskattemptid Successfultaskid;//The task ID that made the taskinprogress run successfully // The first taskid of this tip PrivateTaskattemptid Firsttaskid;//The ID of the first running task Attemp //MAP from task ID, Tasktracker ID, contains tasks //Currently runnings PrivateTreemap<taskattemptid, string> activetasks =NewTreemap<taskattemptid, string> ();//The mapping relationship between the running task ID and the Tasktracker ID //All attempt Ids of this TIP Privatetreeset<taskattemptid> tasks =NewTreeset<taskattemptid> ();all taskattempt IDs that have been run by the taskinprogress, including the completed and running /** * Map from TaskId-taskstatus * * PrivateTreemap<taskattemptid,taskstatus> taskstatuses =NewTreemap<taskattemptid,taskstatus> ();//task ID and taskstatus mapping relationship //Map from TaskId-Tasktracker Id, //contains cleanup attempts and where they ran, if any PrivateTreemap<taskattemptid, string> cleanuptasks =NewTreemap<taskattemptid, string> ();//cleanup Task ID mapping relationship with Tasktracker ID PrivateTreeset<string> machineswherefailed =NewTreeset<string> ();//List of all nodes where the failed task is already running PrivateTreeset<taskattemptid> tasksreportedclosed =NewTreeset<taskattemptid> ();//After a task attempt is successfully run, all other running task attempt are saved in the collection //list of tasks to kill, <taskid> <shouldFail> PrivateTreemap<taskattemptid, boolean> Taskstokill =NewTreemap<taskattemptid, boolean> ();//List of tasks to be killed //task to commit, <taskattemptid> PrivateTaskattemptid Tasktocommit;//Wait for the committed task attempt, the task attempt eventually make the taskinprogress run successfully
Maintenance of job and task run-time information