Data localization Level

Source: Internet
Author: User

Name Default meaning
spark.locality.wait 3s How long to wait for launch a data-local task before giving up and launching it on a less-local node. The same wait is used to step through multiple locality levels (Process-local, node-local, rack-local and then any). It is also possible to customize the waiting time for each level by setting spark.locality.wait.node, etc. You should increase this setting if your tasks is long and see poor locality, but the default usually works well.
Spark.locality.wait.node spark.locality.wait Customize the locality wait for node locality. For example, you can set this to 0 to skip node locality and search immediately for rack locality (if your cluster have RAC K information).
spark.locality.wait.process spark.locality.wait Customize The locality wait for process locality. This affects tasks, attempt to access cached data in a particular executor process.
Spark.locality.wait.rack spark.locality.wait Customize The locality wait for rack locality.
Spark.scheduler.maxRegisteredResourcesWaitingTime 30s Maximum amount of time to wait for the resources to register before scheduling begins.
Spark.scheduler.minRegisteredResourcesRatio 0.8 for YARN mode; 0.0 for standalone mode and Mesos coarse-grained mode The minimum ratio of registered resources (registered resources/total expected resources) (resources is executors in YA RN mode, CPU cores in standalone mode and Mesos coarsed-grained mode [' Spark.cores.max ' value are total expected resources For Mesos coarse-grained mode]) to the wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it'll wait before Sche Duling begins is controlled by Configspark.scheduler.maxRegisteredResourcesWaitingTime.
Spark.scheduler.mode Fifo The scheduling mode between jobs submitted to the same sparkcontext. Can is set to FAIR to use FAIR sharing instead of queueing jobs one after another. Useful for multi-user services.
Spark.scheduler.revive.interval 1s The interval length for the scheduler to revive, the worker resource offers to run tasks.
Spark.speculation False If set to "true", performs speculative execution of the tasks. This means if one or more tasks is running slowly in a stage, they'll be re-launched.
Spark.speculation.interval 100ms How often Spark would check for tasks to speculate.
Spark.speculation.multiplier 1.5 How many times slower a task was than the median to being considered for speculation.
Spark.speculation.quantile 0.75 Percentage of tasks which must be complete before speculation are enabled for a particular stage.
Spark.task.cpus 1 Number of cores to allocate for each task.
Spark.task.maxFailures 4 Number of individual task failures before giving up on the job. Should is greater than or equal to 1. Number of allowed retries = this value-1.

In use, print out the level of localization

16:49:56 INFO tasksetmanager:starting task 0.0 in Stage 12.0 (TID 8, localhost, process_local, 841218 bytes)

16:49:56 INFO executor:running task 0.0 in Stage 12.0 (TID 8)


Process_local: Process localization, code and data in the same process, that is, in the same executor; the task of calculating data is performed by executor, and the data is in Blockmanager of executor; performance is best.
Node_local: node localization, code and data in the same node; For example, the data is on the node as an HDFs block, and the task runs in a executor on the node, or the data and the task are in different executor on one node ; Data needs to be transferred between processes
No_pref: For task, where the data is obtained from the same, there is no good or bad points
Rack_local: Rack localization, data and task on two nodes of a rack; data needs to be transmitted across the network between nodes
Any: Data and tasks may be anywhere in the cluster, and not in one rack, with the worst performance


Spark.locality.wait, default is 3s;6s,10s

How to set it up:
By default, the following 3 wait lengths are the same as the one above, all 3s

New Sparkconf ()
. Set ("Spark.locality.wait", "10")

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.