Property
Name |
Default |
meaning |
|
|
|
spark.locality.wait |
3s |
How long to wait for launch a data-local task before giving up and launching it on a less-local node. The same wait is used to step through multiple locality levels (Process-local, node-local, rack-local and then any). It is also possible to customize the waiting time for each level by setting spark.locality.wait.node, etc. You should increase this setting if your tasks is long and see poor locality, but the default usually works well. |
Spark.locality.wait.node |
spark.locality.wait |
Customize the locality wait for node locality. For example, you can set this to 0 to skip node locality and search immediately for rack locality (if your cluster have RAC K information). |
spark.locality.wait.process |
spark.locality.wait |
Customize The locality wait for process locality. This affects tasks, attempt to access cached data in a particular executor process. |
Spark.locality.wait.rack |
spark.locality.wait |
Customize The locality wait for rack locality. |
Spark.scheduler.maxRegisteredResourcesWaitingTime |
30s |
Maximum amount of time to wait for the resources to register before scheduling begins. |
Spark.scheduler.minRegisteredResourcesRatio |
0.8 for YARN mode; 0.0 for standalone mode and Mesos coarse-grained mode |
The minimum ratio of registered resources (registered resources/total expected resources) (resources is executors in YA RN mode, CPU cores in standalone mode and Mesos coarsed-grained mode [' Spark.cores.max ' value are total expected resources For Mesos coarse-grained mode]) to the wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it'll wait before Sche Duling begins is controlled by Configspark.scheduler.maxRegisteredResourcesWaitingTime. |
Spark.scheduler.mode |
Fifo |
The scheduling mode between jobs submitted to the same sparkcontext. Can is set to FAIR to use FAIR sharing instead of queueing jobs one after another. Useful for multi-user services. |
Spark.scheduler.revive.interval |
1s |
The interval length for the scheduler to revive, the worker resource offers to run tasks. |
Spark.speculation |
False |
If set to "true", performs speculative execution of the tasks. This means if one or more tasks is running slowly in a stage, they'll be re-launched. |
Spark.speculation.interval |
100ms |
How often Spark would check for tasks to speculate. |
Spark.speculation.multiplier |
1.5 |
How many times slower a task was than the median to being considered for speculation. |
Spark.speculation.quantile |
0.75 |
Percentage of tasks which must be complete before speculation are enabled for a particular stage. |
Spark.task.cpus |
1 |
Number of cores to allocate for each task. |
Spark.task.maxFailures |
4 |
Number of individual task failures before giving up on the job. Should is greater than or equal to 1. Number of allowed retries = this value-1. |
In use, print out the level of localization
16:49:56 INFO tasksetmanager:starting task 0.0 in Stage 12.0 (TID 8, localhost, process_local, 841218 bytes)
16:49:56 INFO executor:running task 0.0 in Stage 12.0 (TID 8)
#都有哪些本地化级别
Process_local: Process localization, code and data in the same process, that is, in the same executor; the task of calculating data is performed by executor, and the data is in Blockmanager of executor; performance is best.
Node_local: node localization, code and data in the same node; For example, the data is on the node as an HDFs block, and the task runs in a executor on the node, or the data and the task are in different executor on one node ; Data needs to be transferred between processes
No_pref: For task, where the data is obtained from the same, there is no good or bad points
Rack_local: Rack localization, data and task on two nodes of a rack; data needs to be transmitted across the network between nodes
Any: Data and tasks may be anywhere in the cluster, and not in one rack, with the worst performance
#本地化的等待时间
Spark.locality.wait, default is 3s;6s,10s
How to set it up:
By default, the following 3 wait lengths are the same as the one above, all 3s
Spark.locality.wait.process
Spark.locality.wait.node
Spark.locality.wait.rack
New Sparkconf ()
. Set ("Spark.locality.wait", "10")