Summary of mapreduce task failure, retry, and speculative Running Mechanism

Source: Internet
Author: User

In mapreduce, The Mapper and reducer programs we define may encounter errors and exits after they are run. In mapreduce, jobtracker tracks the running status of tasks throughout the process, mapreduce also defines a set of processing methods for erroneous tasks. The first thing you need to understand is how mapreduce can infer a task failure. In three cases, the task will be deemed to have failed: return a non-zero value, generate a Java exception, and time-out (no response for a long time ). The first method is usually used in streaming programs. If your mapper or reducer program returns a non-0 value at the end, mapreduce will feel that your task has failed. Another mapreduce program is mainly used for Java writing. The third type is not expected to be known by many people. For streaming, mapreduce monitors the output (standard output) of the worker task after the task runs. Assume that the task has no output within a certain period of time (this time can pass mapred. task. timeout option), mapreduce will think this task fails. Therefore, when writing a mapreduce program, you must pay attention to whether the program will be suspended due to excessive time. If this happens, consider whether the program will be killed by mistake. After a task fails, mapreduce runs the task again, and the number of retries can be set, usually four. Finally, we should note another speculative running mechanism of mapreduce. In this mechanism, we assume that the task running time exceeds expectation (this expectation is based on the running time of other tasks ), in this case, mapreduce starts another task that runs in parallel with this task. After a task is first run, it kills other tasks that have not been completed. This mechanism is mainly used to prevent the running environment of a reduce task from being faulty or the overall progress of a reduce task is slowed down due to abnormal running conditions. However, this mechanism may also cause problems in some situations. For example, if your reduce program runs concurrently with the same input, a conflict may occur, the speculative running mechanism is a huge risk for you. It's only good that the speculative running mechanism can also be disabled.

Summary of mapreduce task failure, retry, and speculative Running Mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.