About the job cleanup stage of hadoop mapreduce

Source: Internet
Author: User
Tags hadoop mapreduce

In the recent period, we found that the Mr Jobs with many analyses were delayed by 1 hour to 2 hours. In fact, it may only take 20 minutes for that job. Analyze the job status and find that the delay is in the cleanup stage of the job.

In the recent period, due to user growth and soaring data, more and more cluster jobs have been created, and the slots occupied by each job have also increased, leading to insufficient cluster slots, therefore, queuing occurs in the cluster, and the operation is normal. However, if the setup, map, and reduce operations of the entire job are completed, only cleanup (extremely lightweight) is not completed, resulting in a delay of 1-2 hours, this can also be done!

Check the status data of a job (Note: This job is very small and has been executed within 1-2 minutes ):

We can see that Setup \ map \ reduce is executed between and, but cleanup is started at, and the execution only takes 8 s!

That is to say, it takes nearly two hours to finish the job for the 8 s!

Check the jobtracker log:

12:39:06, 580 info org. Apache. hadoop. mapred. jobtracker: Adding task (job_cleanup) 'attempt _ 201212261519_3151096_m_000002_0 'to tip task_201212261519_3151096_m
_ 000002, for tracker...

It indicates that after Mr is executed, tasks are assigned and submitted to TT.

View the log on TT and search for the attampt.

14:31:45, 064 info org. Apache. hadoop. mapred. tasktracker: trying to launch: attempt_201212261519_3151096_m_000002_0 which needs 1 Slots

This indicates that the task has been waiting on TT. See the relevant code below:

In the run () method of org. Apache. hadoop. mapred. tasktracker:

try {              State osState = offerService();              if (osState == State.STALE) {                staleState = true;              } else if (osState == State.DENIED) {                denied = true;              } } catch (Exception ex) {    ... }

The main method marked in red for providing TT services is as follows:

TaskTrackerAction[] actions = heartbeatResponse.getActions();
if (action instanceof LaunchTaskAction) {              addToTaskQueue((LaunchTaskAction)action);            } else if (action instanceof CommitTaskAction) {
...
}

After obtaining the JT command through heartbeat, the command is processed. For launchtaskaction, that is, the new task is added to taskqueue, the Code is as follows:

private void addToTaskQueue(LaunchTaskAction action) {    if (action.getTask().isMapTask()) {      mapLauncher.addToTaskQueue(action);    } else {      reduceLauncher.addToTaskQueue(action);    }  }

If map is used, add maplauncher and reduce to reducelauncher. In general, cleanup is map.

Here maplauncher and reducelauncher are the two threads initialized at TT startup:

    mapLauncher = new TaskLauncher(TaskType.MAP, maxMapSlots);    reduceLauncher = new TaskLauncher(TaskType.REDUCE, maxReduceSlots);    mapLauncher.start();    reduceLauncher.start();

Join a queue is added to a linked list. This is the linked list initialized during thread definition.

public TaskLauncher(TaskType taskType, int numSlots) {      this.maxSlots = numSlots;      this.numFreeSlots = new IntWritable(numSlots);      this.tasksToLaunch = new LinkedList<TaskInProgress>();      setDaemon(true);      setName("TaskLauncher for " + taskType + " tasks");    }

Add to linked list

public void addToTaskQueue(LaunchTaskAction action) {      synchronized (tasksToLaunch) {        TaskInProgress tip = registerTask(action, this);        tasksToLaunch.add(tip);        tasksToLaunch.notifyAll();      }    }

The thread starts with TT and queries the linked list in real time. If the linked list exists, the first one is retrieved!

             //get the TIP            tip = tasksToLaunch.remove(0);            task = tip.getTask();            LOG.info("Trying to launch : " + tip.getTask().getTaskID() +                      " which needs " + task.getNumSlotsRequired() + " slots");

By comparing logs, all MAP jobs allocated to TT are indeed processed in order. But this leads to a problem!



Optimization solution: It is recommended that cleanup be distinguished from other map or reduce jobs. The priority of cleanup should be higher than that of other map jobs!

Some points may be inappropriate. Please criticize and correct me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.