Data localization Level

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Property

Name	Default	meaning

spark.locality.wait	3s	How long to wait for launch a data-local task before giving up and launching it on a less-local node. The same wait is used to step through multiple locality levels (Process-local, node-local, rack-local and then any). It is also possible to customize the waiting time for each level by setting spark.locality.wait.node, etc. You should increase this setting if your tasks is long and see poor locality, but the default usually works well.
Spark.locality.wait.node	spark.locality.wait	Customize the locality wait for node locality. For example, you can set this to 0 to skip node locality and search immediately for rack locality (if your cluster have RAC K information).
spark.locality.wait.process	spark.locality.wait	Customize The locality wait for process locality. This affects tasks, attempt to access cached data in a particular executor process.
Spark.locality.wait.rack	spark.locality.wait	Customize The locality wait for rack locality.
Spark.scheduler.maxRegisteredResourcesWaitingTime	30s	Maximum amount of time to wait for the resources to register before scheduling begins.
Spark.scheduler.minRegisteredResourcesRatio	0.8 for YARN mode; 0.0 for standalone mode and Mesos coarse-grained mode	The minimum ratio of registered resources (registered resources/total expected resources) (resources is executors in YA RN mode, CPU cores in standalone mode and Mesos coarsed-grained mode [' Spark.cores.max ' value are total expected resources For Mesos coarse-grained mode]) to the wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it'll wait before Sche Duling begins is controlled by Configspark.scheduler.maxRegisteredResourcesWaitingTime.
Spark.scheduler.mode	Fifo	The scheduling mode between jobs submitted to the same sparkcontext. Can is set to FAIR to use FAIR sharing instead of queueing jobs one after another. Useful for multi-user services.
Spark.scheduler.revive.interval	1s	The interval length for the scheduler to revive, the worker resource offers to run tasks.
Spark.speculation	False	If set to "true", performs speculative execution of the tasks. This means if one or more tasks is running slowly in a stage, they'll be re-launched.
Spark.speculation.interval	100ms	How often Spark would check for tasks to speculate.
Spark.speculation.multiplier	1.5	How many times slower a task was than the median to being considered for speculation.
Spark.speculation.quantile	0.75	Percentage of tasks which must be complete before speculation are enabled for a particular stage.
Spark.task.cpus	1	Number of cores to allocate for each task.
Spark.task.maxFailures	4	Number of individual task failures before giving up on the job. Should is greater than or equal to 1. Number of allowed retries = this value-1.

In use, print out the level of localization

16:49:56 INFO tasksetmanager:starting task 0.0 in Stage 12.0 (TID 8, localhost, process_local, 841218 bytes)

16:49:56 INFO executor:running task 0.0 in Stage 12.0 (TID 8)

#都有哪些本地化级别

Process_local: Process localization, code and data in the same process, that is, in the same executor; the task of calculating data is performed by executor, and the data is in Blockmanager of executor; performance is best.
Node_local: node localization, code and data in the same node; For example, the data is on the node as an HDFs block, and the task runs in a executor on the node, or the data and the task are in different executor on one node ; Data needs to be transferred between processes
No_pref: For task, where the data is obtained from the same, there is no good or bad points
Rack_local: Rack localization, data and task on two nodes of a rack; data needs to be transmitted across the network between nodes
Any: Data and tasks may be anywhere in the cluster, and not in one rack, with the worst performance

#本地化的等待时间

Spark.locality.wait, default is 3s;6s,10s

How to set it up:
By default, the following 3 wait lengths are the same as the one above, all 3s
Spark.locality.wait.process
Spark.locality.wait.node
Spark.locality.wait.rack

New Sparkconf ()
. Set ("Spark.locality.wait", "10")

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data localization Level

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data localization Level

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support