Scene
A spark application generation process: Get requirements, write spark Code, test pass-through, throw on platform scheduling.
Often the app runs for a while, suddenly one day it fails, or it fails once before it runs successfully.
From the developer's point of view, my code is not a problem, the test also passed, before a period of running well, how suddenly failed. Why do I have to reschedule and run normally, is not your platform unstable. What is causing the above problem?
In a distributed cluster, especially under high load, there are many unexpected problems, such as: Bad disk/hard disk full will cause/path/to/usercache directory creation to fail, A certain number of failed tasks in a stage (spark.task.maxFailures) can cause the entire job to fail. Executor Registration external Shuffle service timeout. Executor gets the data timeout from the external shuffle service, which causes the entire stage to fail after the task repeatedly fails. Environmental dependency issues, such as the XXX package does not exist, the XXX package is not installed. DNS is not configured and the network is out of line. etc.
Why does the task fail to be schedular again on the original node or executor?
Data locality (Spark prioritizes task scheduling on nodes with corresponding data).
Whether it can only be resigned, after each failure to reschedule. What to do if the task has an SLA limit. Introduction
Spark 2.1 Adds the blacklist mechanism, the current (2.3.0) or experimental feature, which allows you to set a threshold for the number of failed tasks on the Executor/node, thus avoiding a situation that goes all the way to the black. :) Related Parameters
Configuration |
Default Value |
Description |
Spark.blacklist.enabled |
False |
Whether to turn on the blacklist mechanism |
Spark.blacklist.timeout |
1h |
For the executor/node that is added to the application blacklist, how long after the unconditional removal of the blacklist to run a new task |
Spark.blacklist.task.maxTaskAttemptsPerExecutor |
1 |
The failed retry threshold for the same task in a executor. When the threshold is reached, the executor will be blacklisted when the task is executed |
Spark.blacklist.task.maxTaskAttemptsPerNode |
2 |
The failed retry threshold for the same task on a node. When the threshold is reached, the node is blacklisted when the task is executed |
Spark.blacklist.stage.maxFailedTasksPerExecutor |
2 |
A stage in which different tasks have a failure threshold of the same executor. Once the threshold is reached, the executor will be blacklisted when the stage is executed. |
Spark.blacklist.stage.maxFailedExecutorsPerNode |
2 |
A stage in which different executor are added to the blacklist threshold value. When the threshold is reached, the node will be blacklisted when the stage is executed |
Spark.blacklist.application.maxFailedTasksPerExecutor |
2 |
The failure threshold for different tasks in the same executor. After the threshold is reached, the executor is blacklisted during the entire appliction run, and is automatically removed from the blacklist after the join time exceeds spark.blacklist.timeout. It is important to note that if dynamic allocation is turned on, these executor may be recycled due to excessive idle time. |
Spark.blacklist.application.maxFailedExecutorsPerNode |
2 |
In one node, the threshold value of the application blacklist is added to different executor. When this threshold is reached, the node enters the application blacklist, which is automatically removed from the blacklist after the join time exceeds spark.blacklist.timeout. It is important to note that if dynamic allocation is turned on, the executor on that node may be recycled because of too long idle time. |
Spark.blacklist.killBlacklistedExecutors |
False |
If this configuration is turned on, Spark automatically shuts down and restarts the blacklisted executor, and if the entire node is blacklisted, all executor on that node will be closed. |
Spark.blacklist.application.fetchFailure.enabled |
False |
If this configuration is turned on, when fetch failure occurs, the executor is added to the blacklist immediately. If the external shuffle service is turned on, the entire node will be blacklisted. |
Implementation Details
Because it is an experimental function, the code may change at any time.
Only part of the core code is posted. tasksetblacklist
Blacklist ledger:
K:executor V: The failure of each task on the executor (number of task failures and last failure time)
val exectofailures = new hashmap[string, Executorfailuresintaskset] ()
//k: node, V: The node has a failed task on the executor
private val nodetoexecswithfailures = new hashmap[ String, Hashset[string]] ()
//k: node, V: The node is blacklisted on TaskId
private val nodetoblacklistedtaskindexes = new hashmap[ String, Hashset[int]] ()
//blacklist executor
private val blacklistedexecs = new Hashset[string] ()
//blacklist Node
private Val blacklistednodes = new Hashset[string] ()
Determine if executor has added a blacklist of the given task
def isexecutorblacklistedfortask (executorid:string, index:int): Boolean = {
exe Ctofailures.get (Executorid). exists {execfailures =
execfailures.getnumtaskfailures (index) >= max_task_ Attempts_per_executor
}
}
//Determine if node has joined the blacklist def Isnodeblacklistedfortask for a given task
(node:string, Index:int): Boolean = {
nodetoblacklistedtaskindexes.get (node). Exists (_.contains (index))
}
When a task fails, Tasksetmanager calls the action to update the blacklist:
1. Update the number of failures and failure times for this task on excutor based on TaskID
2. Determine if the task has a failure record on the other executor of the node, if any, add the number of retries, and if >= max_task_attempts_per_node, add that node to the taskId blacklist
3. Determine whether the number of failed tasks in a executor is >= max_failures_per_exec_stage in this STAGE, and if so, add the executor to this Stageid blacklist
4. Determine whether the failure record of the same node executor is >= max_failed_exec_per_node_stage in this STAGE, and if so, add the node to the Stageid blacklist
Threshold parameter: Max_task_attempts_per_executor: The maximum number of task retries on each EXECUTOR Max_task_attempts_per_node: The maximum number of task retries per NODE Max_ Failures_per_exec_stage: A stage in which the maximum number of task failures on each executor max_failed_exec_per_node_stage: one stage, on each node executor The maximum number of failures
Private[scheduler] def updateblacklistforfailedtask (host:string, exec:string, Index:int, FAI lurereason:string): Unit = {Latestfailurereason = Failurereason val execfailures = Exectofailures.getorelseupdat E (EXEC, new Executorfailuresintaskset (host)) Execfailures.updatewithfailure (index, Clock.gettimemillis ()) Val exe
Cswithfailuresonnode = Nodetoexecswithfailures.getorelseupdate (host, New HashSet ()) Execswithfailuresonnode + = exec Val Failuresonhost = execsWithFailuresOnNode.toIterator.flatMap {exec = Exectofailures.get (exec). map {Failur ES = failures.getnumtaskfailures (index)}}.sum if (Failuresonhost >= max_task_attempts_per_n ODE) {nodetoblacklistedtaskindexes.getorelseupdate (host, New HashSet ()) + = index} if (Execfailures.numuni Quetaskswithfailures >= max_failures_per_exec_stage) {if (Blacklistedexecs.add (EXEC)) {Loginfo (S "Blackl Isting Executor ${exec}For stage $stageId ") Val Blacklistedexecutorsonnode = Execswithfailuresonnode.filter (blacklistedexecs.co Ntains (_)) if (blacklistedexecutorsonnode.size >= max_failed_exec_per_node_stage) {if (Blacklistedno Des.add (host) {Loginfo (S "blacklisting ${host} for Stage $stageId")}}}}
Blacklisttracker
Implementation of the principle and tasksetblacklist, the following is no longer posted blacklist judgment, blacklist objects and other code.
Unlike Tasksetblacklist, Blacklisttracker cannot get to a task failure until a taskSet is fully successful.
Core code:
When a taskSet executes successfully, the following code is called, the process is as follows:
1. Accumulate the number of task failures on each executor and remove the failed task if executor the last task fails for more than Blacklist_timeout_millis.
2. If the number of failures on executor is greater than or equal to the set threshold and is not blacklisted
* Add executor and its corresponding expiry time to the application blacklist, remove the executor from the Executor failure list, and update Nextexpirytime to determine if the blacklist has expired the next time the task is started
* Determine if you want to kill executor according to Spark.blacklist.killBlacklistedExecutors
* Update the number of executor failures on node
* If the number of failures on a node is greater than or equal to the threshold and is not blacklisted, executor
* Add node and its corresponding expiry time to the blacklist of application
* If Spark.blacklist.killBlacklistedExecutors is turned on, all executor on this node will be killed Blacklist_timeout_millis: The expiration Time after the blacklist is added Max_ FAILURES_PER_EXEC: Max task failures per executor Max_failed_exec_per_node: Maximum number of executor blacklisted on each node
def updateblacklistforsuccessfultaskset (Stageid:int, Stageattemptid:int, Failuresbyexec:hashmap[stri Ng, Executorfailuresintaskset]): Unit = {val now = Clock.gettimemillis () Failuresbyexec.foreach {case (exec, FAI Luresintaskset) = Val Appfailuresonexecutor = executoridtofailurelist.getorelseupdate (exec, new Executor failurelist) appfailuresonexecutor.addfailures (Stageid, Stageattemptid, Failuresintaskset) appFailuresOnExecut Or.dropfailureswithtimeoutbefore (now) val newtotal = Appfailuresonexecutor.numuniquetaskfailures val expiryTi Mefornewblacklists = Now + Blacklist_timeout_millis if (newtotal >= max_failures_per_exec &&!executorIdT
Oblackliststatus.contains (exec)) {Loginfo (S "Blacklisting Executor ID: $exec because it has $newTotal" + S "Task failures in successful task sets") Val node = Failuresintaskset.node executoridtoblackliststatus. Put (exec, BlacklistedexeCutor (node, expirytimefornewblacklists)) Listenerbus.post (sparklistenerexecutorblacklisted (now, exec, newTotal))
Executoridtofailurelist.remove (EXEC) updatenextexpirytime () killblacklistedexecutor (EXEC) Val blacklistedexecsonnode = nodetoblacklistedexecs.getorelseupdate (node, hashset[string] ()) Blacklistedexecsonnod E + = exec if (blacklistedexecsonnode.size >= max_failed_exec_per_node &&!nodeidtoblacklist
Expirytime.contains (node) {Loginfo (S "Blacklisting node $node because it has ${blacklistedexecsonnode.size}" + S "executors blacklisted: ${blacklistedexecsonnode}") Nodeidtoblacklistexpirytime.put (node, expiryTi
mefornewblacklists) Listenerbus.post (sparklistenernodeblacklisted (now, node, blacklistedexecsonnode.size))
_nodeblacklist.set (NodeIdToBlacklistExpiryTime.keySet.toSet) Killexecutorsonblacklistednode (node)} }
}
}
when to make a blacklist decision
A call chain submitted by a stage:
Taskschedulerimpl.submittasks
Coarsegrainedschedulerbackend.reviveoffers
Coarsegrainedschedulerbackend.makeoffers
Taskschedulerimpl.resourceoffers
Taskschedulerimpl.resourceoffersingletaskset
Coarsegrainedschedulerbackend.launchtasks
Appliaction level blacklist in taskschedulerimpl.resourceoffers to complete the judgment, Stage/task level of the blacklist in Taskschedulerimpl.resourceoffersingletaskset to complete the judgment. If all the nodes are blacklisted.
If you set the retry count of the task to a higher level, this problem may occur this time. Will disrupt the execution of this stage.
Taskschedulerimpl.resourceoffers
if (!launchedanytask) {
taskset.abortifcompletelyblacklisted (hosttoexecutors)
}
Conclusion
Simply put, for a application, there are three levels of blacklists available for Executor/node:task blacklist, stage blacklist, application blacklist
These blacklist settings can be avoided because the task is repeatedly dispatched on the problematic Executor/node (bad disks, disk full, shuffle fetch failed, environment error, etc.), which leads to a failure of the entire application operation.
Tip:BlacklistTracker.updateBlacklistForFetchFailure There is a bug SPARK-24021 in the current version (2.3.0), it will be repaired in 2.3.1. If the spark.blacklist.application.fetchFailure.enabled configuration is turned on, it will be affected.