Spark Chapter---Spark Resource scheduling and task scheduling _

Spark Chapter---Spark Resource scheduling and task scheduling __spark summary

Last Update:2018-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the foregoing

Spark resource Scheduling is a very important module, as long as the understanding of the principle, can specifically understand how spark is implemented, so particularly important.

In the case of voluntary application, this paper is divided into coarse grained and fine-grained models respectively.

second, the specific Spark Resource scheduling flowchart:

spark the flow of resource scheduling and task scheduling:

1, start the cluster, the worker node will report to the Master node resource situation, master Master the cluster resource situation.

2. When spark submits a application, the application is formed into a dag-ring-free graph according to the dependence between Rdd. After the task is submitted, spark creates two objects on the driver side: Dagscheduler and TaskScheduler.

3, Dagscheduler is a task scheduling of the high-level scheduler, is an object. The main function of Dagscheduler is to divide the DAG according to the RDD between the narrow and the stage, These stage are then presented as Taskset to the TaskScheduler (TaskScheduler is the low-level scheduler for task scheduling, where Taskset is actually a collection of tasks that are encapsulated in one task, The Parallelism task task in stage

4, Taskschedule will traverse the Taskset collection, after each task will send task to compute node executor to perform (in fact, is sent to the executor of the thread pool ThreadPool to execute).

5, the task in the executor thread pool operation will be to taskscheduler feedback,

6. When task execution fails, the TAskscheduler is responsible for retrying, sending the task back to executor for execution and the default retries 3 times . If you try again 3 times and still fail, the stage of the task will fail.

7, Stage failed, the Dagscheduler is responsible for retrying, resend Taskset to Taskschdeuler,stage default retry 4 times . If you try again 4 times and still fail, the job fails. When the job failed, application failed.

8. TaskScheduler can not only retry failed tasks, but also retry straggling (backward, slow) task (that is, tasks that are too slow to execute faster than other tasks). If there is a task running slowly, then TaskScheduler starts a new task to perform the same processing logic as the slow task. The two task which executes first, whichever task the result is executed. This is Spark's speculative enforcement mechanism . In Spark, the assumption is that the default is off. Speculative execution can be configured by the Spark.speculation property.

Summary:

1, for the ETL type to enter the database business to close the speculation enforcement mechanism, so there will be no duplication of data warehousing.

2 . If you encounter data skew, open speculative execution may lead to a task restart to handle the same logic, and the tasks may be in a state of endless processing. (so generally closed speculative execution)

3, a job more than one action, there will be more than one job, a general action corresponding to a job, if a application have more than one job, in order to execute once, even after the failure, the previous execution is over, not Will roll back.

4, have sparkcontext end is driver end.

5, generally to the following lines, the resources are applied, the following is the processing of logic

Val conf = new sparkconf ()
Conf.setmaster ("local"). Setappname ("Pipeline");
Val sc = new Sparkcontext (conf)

Coarse-grained resource requests and fine-grained resource requests

Coarse-grained resource request (Spark)

in the before application executes, When all the resources have been requested, the resource will not be scheduled until all tasks have been completed, and the resource will not be released until all task execution is complete.

Advantages: Before application execution, all resources are applied, and each task runs with resources directly, without requiring task execution to request resources before executing, task startup is fast, task execution is fast, Stage execution is fast, job is fast, application execution is fast.

Disadvantage: Resources are not released until the last task is completed, and the resources of the cluster are not fully utilized. More serious when the data is tilted.

fine-grained resource request (MapReduce)

Application does not need to apply for resources prior to execution, so that each task in the job is required to apply for resources before execution, and the task executes to release resources.

Advantages: The resources of the cluster can be fully utilized.

Disadvantage: Task to apply for resources themselves, task startup slows down, the application of the operation of the corresponding slowed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Chapter---Spark Resource scheduling and task scheduling __spark summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Chapter---Spark Resource scheduling and task scheduling __spark summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support