Job submission and initialization is mainly for the follow-up of the Mr Program to do the preparation work.
It is divided into four steps namely 1 configuration Mr Job environment, 2 upload job information, 3 submit job, 4 job initialization.
1. The queue is determined by the configuration "Mapred.job.queue.name" of the Mr Program by adding the job to the waitingjobs mapping of the scheduler's specified queue capacityschedulerqueue. defaults to default.
2. Tell dispatcher to dispatch a job in the queue
Inform the queue
queue.jobadded (job); The number of execution jobs submitted to the user in the queue plus 1.
Setup Scheduler specific job information
preinitializejob (Job);
Prepare for job initialization. This is mainly based on the configuration of the cluster "MAPRED.CLUSTER.MAP.MEMORY.MB" (the amount of memory per map slot Slotsizepermap) and the configuration of Mr Program "MAPRED.TASK.MAXVMEM" (The maximum amount of memory per task Getmemoryformaptask ()) determines how many slots each maptask will occupy the cluster.
the formula is :(int) (Math.ceil (float) getmemoryformaptask ()/(float) slotsizepermap)
Reducetask Ibid. use cluster's "MAPRED.CLUSTER.REDUCE.MEMORY.MB" Configuration 1.4.2 Scheduler Dispatch job initialization
Several background service threads are also started when the scheduler is started. One of the threads is a jobinitializationpoller thread, and the following is the execution of its Run method:
Jobinitializationpoller.run () Code:
while (running) {
cleanupinitializedjobslist ()//clearing out those jobs that are in the running state from the initialization collection, or the job
that has been completed Selectjobstoinitialize ();
if (!this.isinterrupted ()) {
thread.sleep (sleepinterval);//configure "Mapred.capacity-scheduler.init-poll-interval" Default is 3000.
}
}
1.4.2.1 Select the job to initialize.
Selectjobstoinitialize () Code:
For (String queue:jobQueueManager.getAllQueues ()) {
arraylist<jobinprogress> jobstoinitialize = Getjobstoinitialize (queue);
Jobinitializationthread t = threadstoqueuemap.get (queue);
for (jobinprogress job:jobstoinitialize) {
t.addjobstoqueue (queue, job);
}
}
First, iterate through all the Jobinprogress instances in the queue queues in all clusters and find all the jobstatus.prep sets.
The thread jobinitializationthread that is used to initialize the job is then found (this thread is the boot thread corresponding to the queue specified in the Threadstoqueuemap map.) This is based on the Capacity-scheduler.xml configuration "Mapred.capacity-scheduler.init-worker-threads" at startup, and the default will be to open 5 threads to check the job to initialize for each queue. and initialize it.)
Note: When there are too many jobs in the cluster environment, you can increase this configuration with the right amount of thread to perform job initialization.
Finally, the uninitialized jobs corresponding to queue queues are added to the jobinitializationthread thread for one by one initialization.
Jobinitializationthread.run () Code:
public void Run () {while
(startiniting) {
initializejobs ();
Thread.Sleep (Sleepinterval); Configure "Mapred.capacity-scheduler.init-poll-interval" defaults to 3000.
}
}
The Initializejobs () iterates through all uninitialized job calls to JT's initjob to initialize the job.
Setinitializingjob (Job);
Ttm.initjob (Job); Tasktrackermanager is the Jobtracker.
1.4.2.2 Initialization Job
taskinprogress
The Taskinprogress class maintains all the information during a task run. In Hadoop, because a task can be inferred or rerun. So there will be multiple taskattempt, and at the same time there may be multiple tasks with the same task trying to execute simultaneously. These tasks are managed and tracked by the same Taskinprogress object.
Primary properties when initializing it:
Privatefinal Tasksplitmetainfo Splitinfo; Maptask the split information to be processed privatejobinprogress job;
The jobinprogress of taskinprogress.
The mapping relationship between the running TaskID and the Tasktrackerid.
Privatetreemap<taskattemptid, string> activetasks = Newtreemap<taskattemptid, String> ();
All Taskattemptid that have been run, including the completed and running.
privatetreeset<taskattemptid> tasks = new treeset<taskattemptid> ();
The mapping relationship between TaskID and TaskStatus.
privatetreemap<taskattemptid,taskstatus> taskstatuses = newtreemap<taskattemptid,taskstatus> (); The corresponding relationship between Cleanuptaskid and Tasktracker privatetreemap<taskattemptid, string> cleanuptasks = newTreeMap<
Taskattemptid, string> ();
The list of task nodes that have failed to run privatetreeset<string> machineswherefailed = new treeset<string> ();
List of tasks to be killed.
Privatetreemap<taskattemptid, boolean> Taskstokill = Newtreemap<taskattemptid, Boolean> (); Waiting to be submitted to the taskattemp, Privatetaskattemptid tasktocommit;
Call JT's Initjob (Job) in the thread Jobinitializationthread run environment. There are two main things to do:
1 Initializing job-related tasks
Tasksplitmetainfo[] splits = Createsplits (Jobid);
Nummaptasks = splits.length;
Maps = new Taskinprogress[nummaptasks];
"Mapreduce.job.locality.wait.factor"
localitywaitfactor = Conf.getfloat (Locality_wait_factor, Default_ Locality_wait_factor);
This.reduces = new Taskinprogress[numreducetasks];
Start scheduling Reducetask Startup According to the configuration calculation when the Maptask completes a number of times. Default is 5%.
Completedmapsforreduceslowstart = Conf.getfloat ("Mapred.reduce.slowstart.completed.maps") * nummaptasks)
Cleanup = new Taskinprogress[2];
Setup = new Taskinprogress[2];
1.1MapTaskInprogress
Read the Splitmetainfo information from the job submission directory of the job and parse it into the associated tasksplitmetainfo[] array, and create a corresponding number of maptaskinprogress instances to distribute the split to prepare for execution. That is, specify the source data that it will process for each maptaskinprogress instance.
1.2 reducetaskinprogress
The reducetaskinprogress instance is then created according to Mr's configuration "Mapred.reduce.tasks".
1.3cleanupTaskInprogress and Setuptaskinprogress.
Cleanup and setup-type tasks are created for map and reduce, respectively.
2 notifies the scheduler to update the job.
Notifies the scheduler to update the job if the status is inconsistent before and after the initialization job-related task is judged.
Jobstatuschangeevent event = newjobstatuschangeevent (Job, eventtype.run_state_changed, Prevstatus,
newStatus);
Synchronized (jobtracker.this) {for
(Jobinprogresslistener listener:jobinprogresslisteners) {
Listener.jobupdated (event);
}
The job that has not been scheduled by the scheduler is still prep state for the first initialization.