Get through the Spark system Operation Insider mechanism cycle process (dt Big Data DreamWorks)

Last Update:2016-02-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Content:

1, taskscheduler working principle;

2, TaskScheduler source decryption;

There are a series of tasks in the stage, the tasks are parallel computing, the logic is exactly the same, but the processing of data is different.

The Dagscheduler is submitted to TaskScheduler (Task Scheduler) as a task.

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

==========taskscheduler Working principle Decryption ============

1, Dagscheduler in the submission of taskset to the underlying scheduler is interface-oriented TaskScheduler, which conforms to the object-oriented dependency abstraction, and does not rely on the specific principles, resulting in the underlying resource scheduler pluggable, Causes spark to run in a number of resource scheduler modes, such as standalone, YARN, Mesos, Local, EC2, and other custom resource schedulers, in standalone mode, we focus on Taskschedulerimpl;

2, when Sparkcontext instantiation, through the Createtaskscheduler to create Taskschedulerimpl and Sparkdeployschedulerbackend

CASE&NBSP; Spark_regex (Sparkurl) =>
Span style= "color: #cc7832; font-weight:bold;" >val scheduler = NEW&NBSP; Taskschedulerimpl (SC)
VAL&NBSP; masterurls = sparkurl.split (). Map ( "spark://" + _)
VAL&NBSP; backend = NEW&NBSP; sparkdeployschedulerbackend (Scheduler, sc,&NBSP; masterurls)
Scheduler.initialize (backend)
(Backend,&NBSP; scheduler)

In the Initialize method of Taskschedulerimpl, the sparkdeployschedulerbackend is transmitted in order to be Taskschedulerimpl, in Taskschedulerimpl The Backend.start method is called when the Start method is called, and the application is eventually registered in the Start method;

3, TaskScheduler's core task is to submit taskset to the cluster operation and report the results:

1) Create and maintain a Tasksetmanager for Taskset and track the local and error messages of the task;

2) When encountering the Straggle task, it will be put to other nodes for retry;

3) TaskScheduler must report the execution to Dagscheduler, including information such as fetch fail when shuffle output lost;

4, TaskScheduler internal will hold schedulerbackend, from standalone mode, concrete realization is sparkdeployschedulerbackend;

5, Sparkdeployschedulerbackend at the start of the construction of Appclient instance, and in the instance start when the Clientendpoint this message loop body, Clientendpoint The current program is registered with master at boot time, and Sparkdeployschedulerbackend The parent class Coarsegrainedschedulerbackend will instantiate the message loop body of type Driverendpoint (which is the driver of the classic object when we run the program) at start. Sparkdeployschedulerbackend is specifically responsible for collecting resource information on workers, and when Executorbackend starts, it sends Registerexecutor information to Driverendpoint for registration. At this time Sparkdeployschedulerbackend mastered the current application has the computing resources, TaskScheduler is through the Sparkdeployschedulerbackend owned computing resources to run the task specifically;

6, Sparkcontext, Dagscheduler, Taskschedulerimpl, sparkdeployschedulerbackend when the application starts, only instantiate once, the application exists during, always exist these objects;

Big Summary: Call Createtaskscheduler to create Taskschedulerimpl and Sparkdeployshedulerbackend when Sparkcontext is instantiated, while in spark instantiation, Sparkdeployshedulerbackend's start is called in the Start,start method that calls Taskschedulerimpl, and the Appclient object is created in the method. and calls the Start method of the Appclient object, in which the clientendpoint is created, When you create a clientendpoint, the command is passed to specify the name of the Ingress class for the executor process that is specifically launched for the current application coarsegrainedexecutorbackend, Then Clientendpoint starts and registers the current application with Tryregistermaster to master, and master receives the registration information, and if it can run the program generates JOBID for the program and passes schedule () To allocate computing resources, the allocation of specific computing resources is determined by the configuration information of the application running, memory, cores, etc., and finally master sends instructions to the worker, When a worker allocates compute resources for the current application, it allocates the Executorrunner,executorrunner internally to build Processbuiler in a thread way to start another JVM process, The name of the class in which the main method is loaded when the JVM process starts is to specify a class with a specific name of Coarsegrainedexecutorbackend when the command to create the Clientendpoint is passed in. The JVM sees the coarsegrainedexecutorbackend loading and invoking the main method in the Processbuilder boot, In the main method, the Coarsegrainedexecutorbackend itself is instantiated as the message loop body, Coarsegrainedexecutorbackend, when instantiated, sends Registerexecutor to Driverendpoint via callback OnStart to register the current Coarsegrainedexecutorbackend , Driverendpoint received the registration information and saved it in the Sparkdeployschedulerbackend instance in the memory data structure so that the driver gets the compute resources.

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

Homework:

Draw a big summary of the flowchart.

Liaoliang Teacher's card:

China Spark first person

Sina Weibo: Http://weibo.com/ilovepains

Public Number: Dt_spark

Blog: http://blog.sina.com.cn/ilovepains

Mobile: 18610086859

qq:1740415547

Email: [Email protected]

This article from "a Flower proud Cold" blog, declined reprint!

Get through the Spark system Operation Insider mechanism cycle process (dt Big Data DreamWorks)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Get through the Spark system Operation Insider mechanism cycle process (dt Big Data DreamWorks)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Get through the Spark system Operation Insider mechanism cycle process (dt Big Data DreamWorks)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support