Improved spark task stability 1-blacklist mechanism

Last Update:2018-07-26 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Scene

A spark application generation process: Get requirements, write spark Code, test pass-through, throw on platform scheduling.

Often the app runs for a while, suddenly one day it fails, or it fails once before it runs successfully.

From the developer's point of view, my code is not a problem, the test also passed, before a period of running well, how suddenly failed. Why do I have to reschedule and run normally, is not your platform unstable. What is causing the above problem?

In a distributed cluster, especially under high load, there are many unexpected problems, such as: Bad disk/hard disk full will cause/path/to/usercache directory creation to fail, A certain number of failed tasks in a stage (spark.task.maxFailures) can cause the entire job to fail. Executor Registration external Shuffle service timeout. Executor gets the data timeout from the external shuffle service, which causes the entire stage to fail after the task repeatedly fails. Environmental dependency issues, such as the XXX package does not exist, the XXX package is not installed. DNS is not configured and the network is out of line. etc.

Why does the task fail to be schedular again on the original node or executor?

Data locality (Spark prioritizes task scheduling on nodes with corresponding data).

Whether it can only be resigned, after each failure to reschedule. What to do if the task has an SLA limit. Introduction

Spark 2.1 Adds the blacklist mechanism, the current (2.3.0) or experimental feature, which allows you to set a threshold for the number of failed tasks on the Executor/node, thus avoiding a situation that goes all the way to the black. :) Related Parameters

Configuration	Default Value	Description
Spark.blacklist.enabled	False	Whether to turn on the blacklist mechanism
Spark.blacklist.timeout	1h	For the executor/node that is added to the application blacklist, how long after the unconditional removal of the blacklist to run a new task
Spark.blacklist.task.maxTaskAttemptsPerExecutor	1	The failed retry threshold for the same task in a executor. When the threshold is reached, the executor will be blacklisted when the task is executed
Spark.blacklist.task.maxTaskAttemptsPerNode	2	The failed retry threshold for the same task on a node. When the threshold is reached, the node is blacklisted when the task is executed
Spark.blacklist.stage.maxFailedTasksPerExecutor	2	A stage in which different tasks have a failure threshold of the same executor. Once the threshold is reached, the executor will be blacklisted when the stage is executed.
Spark.blacklist.stage.maxFailedExecutorsPerNode	2	A stage in which different executor are added to the blacklist threshold value. When the threshold is reached, the node will be blacklisted when the stage is executed
Spark.blacklist.application.maxFailedTasksPerExecutor	2	The failure threshold for different tasks in the same executor. After the threshold is reached, the executor is blacklisted during the entire appliction run, and is automatically removed from the blacklist after the join time exceeds spark.blacklist.timeout. It is important to note that if dynamic allocation is turned on, these executor may be recycled due to excessive idle time.
Spark.blacklist.application.maxFailedExecutorsPerNode	2	In one node, the threshold value of the application blacklist is added to different executor. When this threshold is reached, the node enters the application blacklist, which is automatically removed from the blacklist after the join time exceeds spark.blacklist.timeout. It is important to note that if dynamic allocation is turned on, the executor on that node may be recycled because of too long idle time.
Spark.blacklist.killBlacklistedExecutors	False	If this configuration is turned on, Spark automatically shuts down and restarts the blacklisted executor, and if the entire node is blacklisted, all executor on that node will be closed.
Spark.blacklist.application.fetchFailure.enabled	False	If this configuration is turned on, when fetch failure occurs, the executor is added to the blacklist immediately. If the external shuffle service is turned on, the entire node will be blacklisted.

Implementation Details

Because it is an experimental function, the code may change at any time.

Only part of the core code is posted. tasksetblacklist

Blacklist ledger:

K:executor V: The failure of each task on the executor (number of task failures and last failure time)
val exectofailures = new hashmap[string, Executorfailuresintaskset] ()

//k: node, V: The node has a failed task on the executor
private val nodetoexecswithfailures = new hashmap[ String, Hashset[string]] ()
//k: node, V: The node is blacklisted on TaskId
private val nodetoblacklistedtaskindexes = new hashmap[ String, Hashset[int]] ()

//blacklist executor 
private val blacklistedexecs = new Hashset[string] ()
//blacklist Node
private Val blacklistednodes = new Hashset[string] ()

Determine if executor has added a blacklist of the given task
def isexecutorblacklistedfortask (executorid:string, index:int): Boolean = {
    exe Ctofailures.get (Executorid). exists {execfailures =
      execfailures.getnumtaskfailures (index) >= max_task_ Attempts_per_executor
    }
}

//Determine if node has joined the blacklist def Isnodeblacklistedfortask for a given task
(node:string, Index:int): Boolean = {
    nodetoblacklistedtaskindexes.get (node). Exists (_.contains (index))
}

When a task fails, Tasksetmanager calls the action to update the blacklist:
1. Update the number of failures and failure times for this task on excutor based on TaskID
2. Determine if the task has a failure record on the other executor of the node, if any, add the number of retries, and if >= max_task_attempts_per_node, add that node to the taskId blacklist
3. Determine whether the number of failed tasks in a executor is >= max_failures_per_exec_stage in this STAGE, and if so, add the executor to this Stageid blacklist
4. Determine whether the failure record of the same node executor is >= max_failed_exec_per_node_stage in this STAGE, and if so, add the node to the Stageid blacklist

Threshold parameter: Max_task_attempts_per_executor: The maximum number of task retries on each EXECUTOR Max_task_attempts_per_node: The maximum number of task retries per NODE Max_ Failures_per_exec_stage: A stage in which the maximum number of task failures on each executor max_failed_exec_per_node_stage: one stage, on each node executor The maximum number of failures

  Private[scheduler] def updateblacklistforfailedtask (host:string, exec:string, Index:int, FAI lurereason:string): Unit = {Latestfailurereason = Failurereason val execfailures = Exectofailures.getorelseupdat E (EXEC, new Executorfailuresintaskset (host)) Execfailures.updatewithfailure (index, Clock.gettimemillis ()) Val exe
    Cswithfailuresonnode = Nodetoexecswithfailures.getorelseupdate (host, New HashSet ()) Execswithfailuresonnode + = exec Val Failuresonhost = execsWithFailuresOnNode.toIterator.flatMap {exec = Exectofailures.get (exec). map {Failur ES = failures.getnumtaskfailures (index)}}.sum if (Failuresonhost >= max_task_attempts_per_n ODE) {nodetoblacklistedtaskindexes.getorelseupdate (host, New HashSet ()) + = index} if (Execfailures.numuni Quetaskswithfailures >= max_failures_per_exec_stage) {if (Blacklistedexecs.add (EXEC)) {Loginfo (S "Blackl Isting Executor ${exec}For stage $stageId ") Val Blacklistedexecutorsonnode = Execswithfailuresonnode.filter (blacklistedexecs.co Ntains (_)) if (blacklistedexecutorsonnode.size >= max_failed_exec_per_node_stage) {if (Blacklistedno Des.add (host) {Loginfo (S "blacklisting ${host} for Stage $stageId")}}}}

Blacklisttracker

Implementation of the principle and tasksetblacklist, the following is no longer posted blacklist judgment, blacklist objects and other code.

Unlike Tasksetblacklist, Blacklisttracker cannot get to a task failure until a taskSet is fully successful.

Core code:

When a taskSet executes successfully, the following code is called, the process is as follows:
1. Accumulate the number of task failures on each executor and remove the failed task if executor the last task fails for more than Blacklist_timeout_millis.
2. If the number of failures on executor is greater than or equal to the set threshold and is not blacklisted
* Add executor and its corresponding expiry time to the application blacklist, remove the executor from the Executor failure list, and update Nextexpirytime to determine if the blacklist has expired the next time the task is started
* Determine if you want to kill executor according to Spark.blacklist.killBlacklistedExecutors
* Update the number of executor failures on node
* If the number of failures on a node is greater than or equal to the threshold and is not blacklisted, executor
* Add node and its corresponding expiry time to the blacklist of application
* If Spark.blacklist.killBlacklistedExecutors is turned on, all executor on this node will be killed Blacklist_timeout_millis: The expiration Time after the blacklist is added Max_ FAILURES_PER_EXEC: Max task failures per executor Max_failed_exec_per_node: Maximum number of executor blacklisted on each node

def updateblacklistforsuccessfultaskset (Stageid:int, Stageattemptid:int, Failuresbyexec:hashmap[stri Ng, Executorfailuresintaskset]): Unit = {val now = Clock.gettimemillis () Failuresbyexec.foreach {case (exec, FAI Luresintaskset) = Val Appfailuresonexecutor = executoridtofailurelist.getorelseupdate (exec, new Executor failurelist) appfailuresonexecutor.addfailures (Stageid, Stageattemptid, Failuresintaskset) appFailuresOnExecut Or.dropfailureswithtimeoutbefore (now) val newtotal = Appfailuresonexecutor.numuniquetaskfailures val expiryTi Mefornewblacklists = Now + Blacklist_timeout_millis if (newtotal >= max_failures_per_exec &&!executorIdT
          Oblackliststatus.contains (exec)) {Loginfo (S "Blacklisting Executor ID: $exec because it has $newTotal" + S "Task failures in successful task sets") Val node = Failuresintaskset.node executoridtoblackliststatus. Put (exec, BlacklistedexeCutor (node, expirytimefornewblacklists)) Listenerbus.post (sparklistenerexecutorblacklisted (now, exec, newTotal))

        Executoridtofailurelist.remove (EXEC) updatenextexpirytime () killblacklistedexecutor (EXEC) Val blacklistedexecsonnode = nodetoblacklistedexecs.getorelseupdate (node, hashset[string] ()) Blacklistedexecsonnod E + = exec if (blacklistedexecsonnode.size >= max_failed_exec_per_node &&!nodeidtoblacklist 
            Expirytime.contains (node) {Loginfo (S "Blacklisting node $node because it has ${blacklistedexecsonnode.size}" + S "executors blacklisted: ${blacklistedexecsonnode}") Nodeidtoblacklistexpirytime.put (node, expiryTi
          mefornewblacklists) Listenerbus.post (sparklistenernodeblacklisted (now, node, blacklistedexecsonnode.size)) 
      _nodeblacklist.set (NodeIdToBlacklistExpiryTime.keySet.toSet) Killexecutorsonblacklistednode (node)} }
    }
  }

when to make a blacklist decision

A call chain submitted by a stage:

Taskschedulerimpl.submittasks
Coarsegrainedschedulerbackend.reviveoffers
Coarsegrainedschedulerbackend.makeoffers
Taskschedulerimpl.resourceoffers
Taskschedulerimpl.resourceoffersingletaskset
Coarsegrainedschedulerbackend.launchtasks

Appliaction level blacklist in taskschedulerimpl.resourceoffers to complete the judgment, Stage/task level of the blacklist in Taskschedulerimpl.resourceoffersingletaskset to complete the judgment. If all the nodes are blacklisted.

If you set the retry count of the task to a higher level, this problem may occur this time. Will disrupt the execution of this stage.

Taskschedulerimpl.resourceoffers

if (!launchedanytask) {
    taskset.abortifcompletelyblacklisted (hosttoexecutors)
}

Conclusion

Simply put, for a application, there are three levels of blacklists available for Executor/node:task blacklist, stage blacklist, application blacklist

These blacklist settings can be avoided because the task is repeatedly dispatched on the problematic Executor/node (bad disks, disk full, shuffle fetch failed, environment error, etc.), which leads to a failure of the entire application operation.

Tip:BlacklistTracker.updateBlacklistForFetchFailure There is a bug SPARK-24021 in the current version (2.3.0), it will be repaired in 2.3.1. If the spark.blacklist.application.fetchFailure.enabled configuration is turned on, it will be affected.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More