Hadoop's Speculative execution

Source: Internet
Author: User

Recently in the test environment run task, a part of the task appears as follows:


Speculative execution (speculative execution) refers to the running of MapReduce in a clustered environment, which may be a bug, a load inequality, or some other problem, resulting in inconsistent multiple task speeds under one job, for example, some tasks have been completed, But some tasks may only run 10%, according to the cask principle, these tasks will become the entire job of the short board, if the cluster started the speculative execution, in order to maximize the short board, Hadoop will start a backup task for the task, let speculative The task processes a copy of the data at the same time as the original task, whichever runs out first, whose result is the final result, and killing another task after the run is complete.


Speculative execution (speculative execution) is an optimization strategy by which more resources are exchanged for time, but in the case of resource constraints, it is not necessarily possible to optimize the implementation of time, assuming that in the test environment, Datanode total memory space is 40G, Each task can request the memory set to 1G, there is now a task input data is 5g,hdfs shard to 128M, so that the number of map task is 40, basically full of all the Datanode nodes, if also because each map task runs too slow, The speculative task is started, which may affect the execution of the reduce task, which affects the implementation of reduce, and automatically increases the execution time of the entire job. so whether to enable speculative execution, if it can be determined according to the resource situation, if the resource itself is not enough, but also to run the speculation of the task, which will cause the subsequent startup of the task can not get to the resource, so as to prevent execution.

The default speculative actuator is: org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator, if you want to change the strategy of the speculative execution, you can rewrite it according to this class, inheriting Org.apache.hadoop.servi Ce. Abstractservice, implements the Org.apache.hadoop.mapreduce.v2.app.speculate.Speculator interface.

Defaultspeculator Construction Method:

  Public Defaultspeculator (Configuration conf, AppContext context, Taskruntimeestimator estimator, Clock clock    ) {Super (DefaultSpeculator.class.getName ());    this.conf = conf;    This.context = context;    This.estimator = Estimator;    This.clock = clock;    This.eventhandler = Context.geteventhandler (); This.soonestretryafternospeculate = Conf.getlong (Mrjobconfig.speculative_retry_after_no_speculate, M    Rjobconfig.default_speculative_retry_after_no_speculate); This.soonestretryafterspeculate = Conf.getlong (Mrjobconfig.speculative_retry_after_speculate, MRJobC Onfig.    Default_speculative_retry_after_speculate); this.proportionrunningtasksspeculatable = conf.getdouble (Mrjobconfig.speculativecap_running_tasks, M    Rjobconfig.default_speculativecap_running_tasks); this.proportiontotaltasksspeculatable = conf.getdouble (Mrjobconfig.speculativecap_total_tasks, MRJob Config.default_specUlativecap_total_tasks); This.minimumallowedspeculativetasks = Conf.getint (Mrjobconfig.speculative_minimum_allowed_tasks, MRJ  Obconfig.default_speculative_minimum_allowed_tasks); }


mapreduce.map.speculative: If True, the map task can presumably execute, that is, a map task can start the speculative task to run parallel execution, the speculative Task processes the same piece of data at the same time as the original task, and whoever finishes first will end up with whose results. The default is true.

mapreduce.reduce.speculative: ibid., the default value is true.

mapreduce.job.speculative.speculative-cap-running-tasks: The percentage of running tasks (a single job) that can be speculated on, by default: 0.1

mapreduce.job.speculative.speculative-cap-total-tasks: The percentage of all Tasks (single job) that can be speculated on, by default: 0.01

mapreduce.job.speculative.minimum-allowed-tasks: You can speculate on the minimum number of tasks allowed to be re-executed. Default is: 10

First, Mapreduce.job.speculative.minimum-allowed-tasks and Mapreduce.job.speculative.speculative-cap-total-tasks * Total task number, The maximum value is taken.

Then, take the value of the previous step and the number of Mapreduce.job.speculative.speculative-cap-running-tasks * running tasks, taking the maximum value, which is the number of tasks that are guessing the run that was executed

mapreduce.job.speculative.retry-after-no-speculate: wait Time (milliseconds) to do the next round of guessing if there is no task to speculate on in this round. Default: + (MS)

mapreduce.job.speculative.retry-after-speculate: wait Time (milliseconds) to do the next round of guessing if there is a task to speculate on in this round. Default: 15000 (MS)

mapreduce.job.speculative.slowtaskthreshold: standard deviation, the average rate of progress for a task must be lower than the average of all running tasks, which is considered too slow to perform, default value: 1.0

Start the service:

  @Override protected void Servicestart () throws Exception {Runnable Speculationbackgroundcore = new Runnable ( {@Override public void run () {while (!stopped &&!)                Thread.CurrentThread (). isinterrupted ()) {Long backgroundrunstarttime = Clock.gettime ();                  try {//calculates that the mapcontainerneeds and reducecontainerneeds are traversed according to the task type of map and reduce, and the inference task is started when the condition is met.                   int speculations = Computespeculations (); Long Mininumrecomp = speculations > 0?                  Soonestretryafterspeculate:soonestretryafternospeculate;                  Long wait = Math.max (Mininumrecomp, Clock.gettime ()-backgroundrunstarttime); if (Speculations > 0) {log.info ("We launched" + Speculations + "Speculatio  Ns.                  Sleeping "+ wait +" milliseconds. ");} Object Pollresult = Scancontrol.poll (wait, timeunit.milliseconds); } catch (Interruptedexception e) {if (!stopped) {log.error ("Background thread retur                  Ning, Interrupted ", e);                } return;    }              }            }          };    Speculationbackgroundthread = new Thread (Speculationbackgroundcore, "defaultspeculator background processing");    Speculationbackgroundthread.start ();  Super.servicestart (); }

Finally we look at the source code, is how to start a speculative task:

private int maybescheduleaspeculation (tasktype type) {int successes = 0;    Long now = Clock.gettime (); Concurrentmap<jobid, atomicinteger> containerneeds = Type = = Tasktype.map?    Mapcontainerneeds:reducecontainerneeds; For (Concurrentmap.entry<jobid, atomicinteger> jobEntry:containerNeeds.entrySet ()) {//Traverse all job//This race C  Onditon is okay. If We skip a speculation attempt we//should has tried because the event that lowers the number of//Contai      Ners needed to zero hasn ' t come through, it'll next time. Also, if we miss the fact that the number of containers needed is//zero but increased due to a failure it ' s no      T too bad to launch one//container prematurely.      if (Jobentry.getvalue (). Get () > 0) {continue;      } int numberspeculationsalready = 0;      int numberrunningtasks = 0;      Loop through the tasks of the kind job job = Context.getjob (Jobentry.getkey ()); Map<tasKId, task> tasks = job.gettasks (type);//Gets the job's Task int numberallowedspeculativetasks = (int) Math.max (MIN      Imum_allowed_speculative_tasks, Proportion_total_tasks_speculatable * tasks.size ());//The above is introduced      TaskId besttaskid = null;      Long bestspeculationvalue = -1l;      This loop is potentially pricey. TODO track the tasks that is potentially worth looking at for (Map.entry<taskid, task> taskEntry:tasks.en Tryset ()) {//Traverse All tasks long Myspeculationvalue = Speculationvalue (Taskentry.getkey (), now);//Get inferred value if (myspecul        Ationvalue = = already_speculating) {++numberspeculationsalready;        } if (Myspeculationvalue! = not_running) {++numberrunningtasks;          } if (Myspeculationvalue > Bestspeculationvalue) {besttaskid = Taskentry.getkey ();        Bestspeculationvalue = Myspeculationvalue; }} numberallowedspeculativetasks = (int) Math.max(Numberallowedspeculativetasks, proportion_running_tasks_speculatable * numberrunningtasks); If we found a speculation target, fire it off if (besttaskid! = null && Numberallowedspeculati Vetasks > Numberspeculationsalready) {//The number of allowed is greater than the number of ready-to-speculate executions, start to create a speculative run task addspeculativeattempt (besttaskid);//Send a T_        Add_spec_attempt event to start another task.      ++successes;  }} return successes; }



Private Long Speculationvalue (TaskId TaskId, Long Now) {Job job = Context.getjob (Taskid.getjobid ());    Task task = Job.gettask (TaskID);    Map<taskattemptid, Taskattempt> attempts = task.getattempts ();    Long acceptableruntime = Long.min_value;    Long result = Long.min_value; if (!mayhavespeculated.contains (TaskID)) {//is included in the set of presumed runs Acceptableruntime = Estimator.thresholdruntime (TaskID);      /operating Threshold if (Acceptableruntime = = Long.max_value) {return on_schedule;    }} Taskattemptid Runningtaskattemptid = null;    int numberrunningattempts = 0; For (Taskattempt taskAttempt:attempts.values ()) {if (taskattempt.getstate () = = Taskattemptstate.running | | Taskattempt.getstate () = = taskattemptstate.starting) {//task in run state, or start state if (++numberrunningattempts > 1) {//retry over        Once, return directly, then the value of Numberspeculationsalready plus 1 return already_speculating;        } Runningtaskattemptid = Taskattempt.getid (); Long EstimatedruntimE = Estimator.estimatedruntime (Runningtaskattemptid);//Estimated run time long taskattemptstarttime = Estimator.att Emptenrolledtime (Runningtaskattemptid);//Start time of the task if (Taskattemptstarttime > Now) {//this background p          Rocess ran before we could process the task//attempt status change that chronicles the attempt start        return too_new; } Long Estimatedendtime = estimatedruntime + taskattemptstarttime;//estimated run time + task start time, equals finish time long Estimatedrep Lacementendtime = Now + estimator.estimatednewattemptruntime (TaskID);//The completion time of a new task is opened float progress = Tas        Kattempt.getprogress ();        Taskattempthistorystatistics data = Runningtaskattemptstatistics.get (Runningtaskattemptid); if (data = = null) {Runningtaskattemptstatistics.put (Runningtaskattemptid, New Taskattempthistorystati        Stics (EstimatedRunTime, Progress, now)); } else {if (EstimatedRunTime = = Data.getestimaTedruntime () && progress = = Data.getprogress ()) {//Previous stats is same as same stats               if (Data.notheartbeatedinawhile (now)) {//Stats has stagnated for a while, simulate heart-beat.              Taskattemptstatus taskattemptstatus = new Taskattemptstatus ();              Taskattemptstatus.id = Runningtaskattemptid;              Taskattemptstatus.progress = progress;              Taskattemptstatus.taskstate = Taskattempt.getstate ();            Now simulate the Heart-beat handleattempt (taskattemptstatus); }} else {//Stats has changed-update our data structure data.setestimatedruntime (Estim            Atedruntime);            Data.setprogress (progress);          Data.resetheartbeattime (now);        }} if (Estimatedendtime < now) {//Finish time is less than the current time of return progress_is_good; } if (Estimatedreplacementendtime >= estimatedendtime) {//New OpenThe completion time of the service is less than or equal to the current time return too_late_to_speculate;      } result = Estimatedendtime-estimatedreplacementendtime;    }}//If We are here, there's at the most one task attempt.    if (numberrunningattempts = = 0) {//task does not run return not_running;      } if (acceptableruntime = = long.min_value) {acceptableruntime = Estimator.thresholdruntime (TaskID);      if (acceptableruntime = = Long.max_value) {return on_schedule;  }} return result; }
Defaultspeculator relies on an execution time estimator, which is legacytaskruntimeestimator by default, and MRV2 also provides another implementation: Exponentiallysmoothedtaskruntimeestimator, which uses a smoothing algorithm to smooth the results.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hadoop's Speculative execution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.