Maintenance of job and task run-time information

Source: Internet
Author: User

One of the most important features of Jobtracker is state monitoring, including monitoring of runtime states such as Tasktracker, job, and task, where Tasktracker status monitoring is relatively straightforward, Just record their latest heartbeat report Time and health status (detected by the tasktracker-side monitoring script and send the results to Jobtracker by Heartbeat).
Job Description Model
As shown

Jobtracker describes and tracks the running status of each job in its interior with a "three-tier multi-tree" approach. Jobtracker creates a Jobinprogress object for each job to track and monitor its running state. The object exists during the entire run of the job: it is created when the job is submitted and destroyed when the job runs. At the same time, in order to solve the problem with a divide-and-conquer strategy, Jobtracker will split each job into several tasks and create a Taskinprogress object for each task to track and monitor its running state, while the task may be running because of software bugs, A hardware failure, for reasons such as failure, Jobtracker will rerun the task in accordance with a certain policy, that is, each task may attempt to run multiple times until the run succeeds or fails due to more than the number of attempts. Jobtracker each run of a task is called a "task run attempt", or task attempt. For a task, as long as there is one task attempt running successfully, the corresponding Taskinprogress object will mark that the task is running successfully, and when all taskinprogress are marked that their corresponding task runs successfully, The Jobinprogress object identifies the entire job as successfully running.
In order to differentiate between the various jobs,

Jobtracker will give each job a unique ID. This ID consists of three parts: Job prefix string, Jobtracker start time and job submission order, each part through "_" to form a complete job ID, such as job_201208071706_0009, the corresponding three parts are "job", " 201208071706 "and" 009 "(the 9th job since Jobtracker was run). The ID of each task inherits the ID of the job and expands on that basis, which consists of three parts: the Job ID (where the prefix string becomes "task"), the task type (map or reduce), and the task number (starting from 000000 to 999999). For example, task_201208071706_0009_m_000000 indicates that it has a job ID of task_201208071706_0009, a task type of map, and a task number of 000000. The ID of each task attempt inherits the ID of the task, which consists of two parts: the task ID (where the prefix string becomes "attempt") and the number of run attempts (starting at 0), for example, attempt_201208071706_0009_m_000000_ 0 indicates the No. 0 attempt of the task task_201208071706_0009_m_000000.

Jobinprogress
The Jobinprogress class is mainly used to monitor and track the operational state of the job, and provides the lowest level of dispatch interface for the scheduler.
Jobinprogress maintains two types of job information: static information, which is determined at the time the job is submitted, and dynamic information that changes dynamically as the job runs.
(1) Job static information
Job static information refers to the property information that has been determined when the job is submitted, including the following items:
Org.apache.hadoop.mapred.JobInProgress.java

//map task, reduce task, Cleanup task and setup task corresponding taskinprogressTaskinprogress maps[] =Newtaskinprogress[0]; Taskinprogress reduces[] =Newtaskinprogress[0]; Taskinprogress cleanup[] =Newtaskinprogress[0]; Taskinprogress setup[] =Newtaskinprogress[0];intNummaptasks =0;//map Task Count  intNumreducetasks =0;//reduce Task Count  Final LongMemorypermap;//The amount of memory required for each map task  Final LongMemoryperreduce;//amount of memory required per reduce task  volatile intNumslotspermap =1;//Number of slots required per map task  volatile intNumslotsperreduce =1;//Number of slots required per reduce task  / * Allow the number of failed tasks on each tasktracker, default is 4, set by parameter mapred.max.tracker.failures. When the job fails on a tasktracker by more than this value, the node is added to the job's blacklist, and the scheduler no longer assigns the job's tasks to that node * /  Final intMaxtaskfailurespertracker;Private Static floatDefault_completed_maps_percent_for_reduce_slowstart =0.05F//When a 5% map task is completed, the reduce task can be dispatched intCompletedmapsforreduceslowstart =0;//How many map tasks are completed after you start scheduling the reduce task Final intMapfailurespercent;//allowable map task failure scale limit, set by parameter Mapred.max.map.failures.percent Final intReducefailurespercent;//allowable reduce task failure scale limit, set by parameter Mapred.max.reduce.failures.percentJobpriority priority = Jobpriority.normal;//Job Priority

(2) Job dynamic information
Job dynamic information refers to information that is dynamically updated during a job run. This information is useful for discovering tasktracker/job/task faults, and it can also provide a basis for scheduling task scheduling.

 intRunningmaptasks =0;//number of map tasks that are running  intRunningreducetasks =0;//number of reduce tasks that are running  intFinishedmaptasks =0;//number of map tasks running completed  intFinishedreducetasks =0;//Number of reduce task runs completed  intFailedmaptasks =0;//Number of failed map Task attempt  intFailedreducetasks =0;//Attempt number of failed reduce Task  intSpeculativemaptasks =0;//number of running backup tasks (MAP)  intSpeculativereducetasks =0;//number of running backup tasks (REDUCE)    intFailedmaptips =0;//Number of failed taskinprogress (MAP), which means that the corresponding input data will be discarded without producing the final result  intFailedreducetips =0;//Number of failed taskinprogress (REDUCE)    Private volatile BooleanLaunchedcleanup =false;//Whether the cleanup Task has been started  Private volatile BooleanLaunchedsetup =false;//Whether Setup Task has been started  Private volatile Booleanjobkilled =false;//Whether the job has been killed  Private volatile Booleanjobfailed =false;//Whether the job has failed //Networktopology Node to the set of TIPsMap<node, list<taskinprogress>> Nonrunningmapcache;//node mapping with taskinprogress, that is, taskinprogress input data location and node correspondence  //Map of Networktopology Node to set of running TIPsMap<node, set<taskinprogress>> Runningmapcache;//node and the task mapping relationship that is running above  //A list of non-local, non-running maps  FinalList<taskinprogress> Nonlocalmaps;//Map task that does not need to consider data locality, if the inputsplit location of a map task is empty, there is no need to consider the local nature when scheduling tasks  //Set of failed, non-running maps sorted by #failures  FinalSortedset<taskinprogress> Failedmaps;//Tip collection sorted by number of failures  //A set of non-local running mapsSet<taskinprogress> Nonlocalrunningmaps;//non-running Map task collection  //A list of non-running reduce TIPsSet<taskinprogress> nonrunningreduces;//non-running reduce task collection  //A set of running reduce TIPsSet<taskinprogress> runningreduces;//The set of reduce tasks that are running  //A list of cleanup tasks for the map task attempts, to be launchedList<taskattemptid> Mapcleanuptasks =NewLinkedlist<taskattemptid> ();//List of map tasks to be cleaned, such as the task that the user killed directly by the command "Bin/hadoop Job-kill"  //A list of cleanup tasks for the reduce task attempts, to be launchedList<taskattemptid> Reducecleanuptasks =NewLinkedlist<taskattemptid> ();LongStartTime;//Job submission time  LongLaunchtime;//Job Start execution Time  LongFinishtime;//Job completion time

Taskinprogress
The Taskinprogress class maintains all of the information in a task's running process. In Hadoop, because a task may presumably execute or re-execute, there are multiple task attempt, and at the same time, there may be multiple simultaneous tasks that are being executed simultaneously, and these tasks are managed and tracked by the same Taskinprogress object, As soon as any one of the tasks tries to run successfully, Taskinprogress will note that the task was successfully executed.

 Private FinalTasksplitmetainfo Splitinfo;//task The split information to be processed  Private intNummaps;//map Task count, only useful for reduce task  Private intPartition//Index of the task in the Task List  PrivateJobtracker Jobtracker;//jobtracker object for getting the global clock  PrivateTaskID ID;//task ID, which is followed by a subscript to form the task attempt ID  PrivateJobinprogress job;//The jobinprogress in which the taskinprogress is located  Private Final intnumslotsrequired;//Number of slots required to run the task  //Status of the TIP  Private intSuccesseventnumber =-1;Private intNumtaskfailures =0;//task attempt number of failures  Private intNumkilledtasks =0;number of//task attempt killed  Private DoubleProgress =0;//Task Run Progress  PrivateString State ="";//Operating status  Private LongStartTime =0;//taskinprogress Object creation time  Private LongExecstarttime =0;//First task attempt start run time  Private LongExecfinishtime =0;//Last run successful task attempt finish time  Private intcompletes =0;//task Attempt run complete number, actual only two values: 0 and 1  Private BooleanFailed =false;//Whether the taskinprogress failed to run  Private Booleankilled =false;//Whether the taskinprogress was killed  Private BooleanJobcleanup =false;//Whether the taskinprogress is a cleanup Task  Private BooleanJobsetup =false;///Whether the taskinprogress is a setup Task  //The ' next ' usable taskid of this tip  intNexttaskid =0;//The next available task attempt ID for the taskinprogress  //The taskid that took this TIP to SUCCESS  PrivateTaskattemptid Successfultaskid;//The task ID that made the taskinprogress run successfully  // The first taskid of this tip  PrivateTaskattemptid Firsttaskid;//The ID of the first running task Attemp  //MAP from task ID, Tasktracker ID, contains tasks  //Currently runnings  PrivateTreemap<taskattemptid, string> activetasks =NewTreemap<taskattemptid, string> ();//The mapping relationship between the running task ID and the Tasktracker ID  //All attempt Ids of this TIP  Privatetreeset<taskattemptid> tasks =NewTreeset<taskattemptid> ();all taskattempt IDs that have been run by the taskinprogress, including the completed and running  /** * Map from TaskId-taskstatus * *  PrivateTreemap<taskattemptid,taskstatus> taskstatuses =NewTreemap<taskattemptid,taskstatus> ();//task ID and taskstatus mapping relationship  //Map from TaskId-Tasktracker Id,  //contains cleanup attempts and where they ran, if any  PrivateTreemap<taskattemptid, string> cleanuptasks =NewTreemap<taskattemptid, string> ();//cleanup Task ID mapping relationship with Tasktracker ID  PrivateTreeset<string> machineswherefailed =NewTreeset<string> ();//List of all nodes where the failed task is already running  PrivateTreeset<taskattemptid> tasksreportedclosed =NewTreeset<taskattemptid> ();//After a task attempt is successfully run, all other running task attempt are saved in the collection  //list of tasks to kill, <taskid> <shouldFail>  PrivateTreemap<taskattemptid, boolean> Taskstokill =NewTreemap<taskattemptid, boolean> ();//List of tasks to be killed  //task to commit, <taskattemptid>  PrivateTaskattemptid Tasktocommit;//Wait for the committed task attempt, the task attempt eventually make the taskinprogress run successfully

Maintenance of job and task run-time information

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.