Analysis---Error handling mechanism of map/reduce working mechanism

Source: Internet
Author: User

Objective

For Hadoop clusters, node corruption is a very common phenomenon.

A big feature of Hadoop is the corruption of a node, which does not affect the operation of the entire distributed task.

Here's an analysis of how the Hadoop platform is done.

Hardware failure

Hardware failures can be divided into two types-jobtracker node corruption and tasktracker node corruption.

1. Jobtracker node corruption

This is the most serious error in the Hadoop cluster.

With this error, you can only re-select the Jobtracker node, and in the selection period, all the tasks must be stopped, and the tasks that have already been completed must be all over again.

2. Tasktracker node corruption

This is the most common error in a Hadoop cluster. For this type of error, Hadoop has a good error-handling mechanism.

The heartbeat communication mechanism of Jobtracker and Tasktracker requires tasktracker to ensure that progress is reported to jobtracker within 1 minutes.

If the time Jobtracker is not received, the Tasktracker will be removed from the set of waiting schedules;

If you receive a report of a failed task, move the tasktracker to the end of the queue to wait for it to be queued again. However, if a Tasktracker reports four failures in a row, it will also be moved out of the task waiting queue.

Summary

The handling and maintenance of faults are usually managed by special personnel.

This part of the content is not to do a dig.

Also, why do all the other map tasks have to be re-executed when one of the multiple map tasks in a map node fails?

And the reduce node only uses the one task that failed to re-execute?

This question has been consulted on the CSDN, I believe there will be an answer soon.

Analysis---Error handling mechanism of map/reduce working mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.