Go Spark Schedule Related Configuration

Source: Internet
Author: User

Original link

Scheduling-related parameter settings, most of which are straightforward, do not require too much additional explanation, but based on the common nature of these parameters (presumably the parameters you will configure for your cluster's first step), here are some explanations of their internal mechanisms.

Spark.cores.max

One of the most important parameters of a cluster, of course, is the number of CPU compute resources. Spark.cores.max This parameter determines the number of CPU cores that a spark application can request in standalone and Mesos modes. If you don't have the need to run multiple spark applications concurrently, you don't need to set this parameter, By default, the value of Spark.deploy.defaultCores is used (while the value of spark.deploy.defaultCores defaults to Int.max, which means unrestricted meaning) so that the application can use all currently available CPU resources.

For this parameter, it should be noted that this parameter does not work for yarn mode, yarn mode, the resource by yarn Unified scheduling management, The number of CPU resources requested at the start of an app is determined by the number of other two directly configured executor and the number of cores in each executor. (due to historical reasons, some start-up parameters in different operating modes the individual thinks that further integration is needed)

In addition, in the standalone mode, such as the background allocation of CPU resources, the current implementation, within the scope of Spark.cores.max allowed, basically is the first to request from each worker can get the maximum number of CPU core to each executor, Therefore, if manually restricting the number of max cores requested is less than the number of CPUs managed by the standalone and Mesos modes, it is possible that the application will only run on a subset of the nodes in the cluster (because the maximum number of CPU resources available to some nodes has met the requirements of the application). Instead of being distributed evenly across the cluster. This is usually not a big problem, but if data locality is involved, it is possible that a certain amount of data must be read from the remote. Theoretically, this problem can be solved in two ways: first, the resource management module of standalone and Mesos automatically distributes and starts the executor according to the node resources, and the second is the same as yarn mode, allowing the user to specify and limit the number of cores per executor. There is a PR in the community trying to get a second way to solve a similar problem, but as of the moment I wrote this document (2014.8), it hasn't been merge yet.

Spark.task.cpus

This parameter literally means the number of CPUs assigned to each task, which defaults to 1. In fact, this parameter does not really control the number of CPUs that each task is actually running, such as you can use more CPUs by creating additional work threads within the task (at least until now, whether the execution environment of the task can be controlled by LXC and other technologies). Its role is to count the CPU resources that have been used when the job is dispatched, each time a task is assigned. In other words, it is only theoretically used to statistic resources, so that scheduling can be arranged conveniently. So, if you expect to speed up the task by modifying this parameter, let's move on to a different idea. The significance of this parameter, the individual feel or in you really in the mission itself through any means, occupy more CPU resources, let the scheduling behavior more accurate an auxiliary means.

Spark.scheduler.mode

This parameter determines whether to use FIFO mode or fair mode when a single spark application is scheduled internally. Yes, you're right. This parameter only manages scheduling policies for multiple job jobs that do not have dependencies within a spark application.

If you need a scheduling strategy between multiple spark applications, in standalone mode, it depends on the number of CPU resources each application has requested and obtained (the application that is temporarily not getting the resources is pending there), which is basically in FIFO form, Who first applies for and obtains resources, who occupies resources until they are completed. In yarn mode, the scheduling policy between multiple spark applications is determined by yarn's own policy configuration file.

So what's the use of this internal dispatch logic? If your spark application submits jobs to multiple users in the form of a service, you can adjust the scheduling and resource allocation priorities for different user jobs by configuring fair mode-related parameters.

Spark.locality.wait

Spark.locality.wait and Spark.locality.wait.process,spark.locality.wait.node, Spark.locality.wait.rack These parameters affect the details of the local strategy when assigning tasks.

The processing of tasks in spark needs to take into account the local nature of the data involved, basically two, one is the source of the data is Hadooprdd; The second is that the RDD data source comes from the Rdd Cache (that is, read from Blockmanager by CacheManager, or streaming data source Rdd). In other cases, if the shuffle operation is not involved in the RDD, does not constitute a baseline to divide the stage and task, there is no question of judging locality locality, and if it is Shufflerdd, its locality is always no Prefer, So it doesn't really matter locality.

Ideally, the task is, of course, the best run-time performance assigned to a node that can read data locally (within the same JVM or within the same physical machine). However, the execution speed of each task cannot be estimated accurately, so it is difficult to obtain a global optimal execution strategy in advance, when the spark application gets a computing resource, if there is no task that can satisfy the best local requirement can run, it is to go back and run a task with a little less local condition. Or are you going to wait for the next available compute resource to expect it to better match the local nature of the task?

Together, these parameters determine that the Spark Task Scheduler, when assigned, chooses to temporarily not assign tasks, but instead waits for the maximum wait time for different levels of local resources that meet the internal/intra-internal/intra-frame of the process. The default is 3000 milliseconds.

Basically, if you have a large number of tasks and a single task runs for a longer period of time, whether a single task is running locally on the data, the cost difference may be significant, and if the data locality is not ideal, then tuning these parameters may have some benefits for performance optimization. Conversely, if the costs of waiting outweigh the benefits, don't think about it.

In particular, when processing the first batch of tasks submitted after the application has just started, the executors of the processing task may not have been fully registered when the job scheduling module starts to work, so a part of the task is placed in the No prefer queue. The priority of this task is second only to the task of meeting the process level of data locality, which is assigned to non-local node execution, if it is true that no executors is running on the corresponding node, or indeed no prefer task (such as Shufflerdd), This is indeed a more optimized choice, but the reality here is that this part of the executors has not been registered yet. In this case, even increasing the number of these parameters in this section will not help. In response to this situation, there are a number of completed and ongoing PR attempts to give a more intelligent solution by, for example, dynamically adjusting the no prefer queue, monitoring node registration ratios, and so on. However, you can solve this problem simply by taking the initiative to sleep a few seconds after creating the Sparkcontext, depending on the startup situation of your cluster.

Spark.speculation

Spark.speculation and Spark.speculation.interval, Spark.speculation.quantile, Spark.speculation.multiplier and other parameters adjust the specific details of speculation behavior, speculation is in the task scheduling, if there is no suitable for the current local requirements of the task to run, will run slow tasks on the idle computing resources again scheduled behavior, these parameters Adjust the frequency of these behaviors and determine the indicators, the default is not to use speculation.

Often it is difficult to correctly determine whether the need for speculation, can really play the speculation use of occasions, often some nodes because of the operating environment, such as CPU resources for some reason is occupied, disk corruption caused the slow task execution speed of the case, Of course, if your partitioning task does not exist, it can only be executed once, or it cannot execute multiple copies at the same time. Speculation tasks refer to the indicator is usually the execution time of other tasks, and the actual task may be due to the partition data size is uneven, there will be a time difference, plus a certain schedule and IO randomness, so if the consistency of the indicator is too strict, Speculation may not be able to really find the problem, instead of adding unnecessary task overhead, too wide, presumably basically useless.

Personally, if your cluster size is large, the operating environment is complex, it is possible to perform frequent execution anomalies, and data partition size difference is not small, for the stability of the program run time, you can consider carefully adjust these parameters. Otherwise, consider how to eliminate the factors that cause the task to perform at an abnormally fast rate by paving some.

Of course, I didn't actually run spark on a very large cluster, so if the view is biased, I would also ask for practical experience with XD.

Go Spark Schedule Related Configuration

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.