Spark Runtime (Driver, Masster, Worker, Executor) Insider decryption (DT Big Data Dream Factory)

Last Update:2016-02-21 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Content:

1, again on the Spark cluster deployment;

2. Job submission and decryption;

3, job generation and acceptance;

4, task of the operation;

5, again on shuffle;

Perspective Spark Runtime from an operational perspective through master, Drvier, executor

========== on spark cluster deployment ============

The deployment of the cluster on the official website:

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

By default, there is a executor under each worker, which maximizes memory and CPU usage.

Master sends instructions to the worker to allocate resources, and does not care whether the worker can be assigned to the resource, or how much resources he sends.

1, from the point of view of Spark runtime, from the five core objects: Master, Worker, Executor, Drvier, coarsegrainedexecutorbackend;

2, spark in the distributed cluster system design, maximize the function of independent, modular package specific independent objects, strong cohesion loose coupling;

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

3, when the Drvier in the Sparkcontext initialization will be submitted to Master,master if the program is accepted to run in Spark, the current program will be assigned AppID, but also allocate specific computing resources, it is important to note that Master assigns specific computing resources to the worker in the cluster according to the configuration information of the current submitter, but Master does not care whether the specific resource has been allocated or not, and turns to say that master has logged the allocated resource after the instruction is issued. The client will not be able to use the resource if it submits another program again .

Disadvantage: It may cause other programs to be submitted cannot be submitted to the compute resources that would otherwise be allocated, not run; The most important advantage: the fastest running system on the basis of weak coupling of the spark Distributed system function (otherwise, if Master had to wait until the final allocation of resources was successful, the driver would not be notified. Will cause driver to block, not be able to maximize the utilization of parallel computing resources);

Because spark by default, the program is queued for execution.

It should be added that, by default, spark does not have the obvious drawbacks of allocating resource policies because there is typically only one application running in the cluster.

==========job submission Process Source decryption ============

Run a program first, look at the log

1, a very important skill, by running a job in Spark-shell to understand the job submission process, and then use the source code to verify the process;

Scala> sc.textfile ("/library/dataforsortedshuffle"). FlatMap (_.split (")"). Map (word=> (word,1)). Reducebykey ( _+_). Saveastextfile ("/library/dataoutput2")

Log information:

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

2. All action in spark will be at least one job

The above log shows that a job is triggered when saveastextfile.

3, Sparkcontext in the instantiation of the time will be constructed sparkdeployschedulerbackend, Dagscheduler, Taskschedulerimpl, Mapoutputtrackermaster and other objects, Which Sparkdeployschedulerbackend is responsible for the management and scheduling of cluster computing resources, Dagscheduler is responsible for high-level scheduling (such as job Stage division, data local content), Taskschedulerimpl is responsible for the underlying scheduling of the stage (for example, the scheduling of each task, task fault tolerance, etc.), Mapoutputtrackermaster is responsible for the management of data output and read in shuffle;

4, Taskschedulerimpl Internal Dispatch (is part of the whole log):

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

==========task Operation Decryption ============

1, task is running in executor, and executor is located in Coarsegrainedexecutorbackend, and Coarsegrainedexecutorbackend and executor is one by one corresponding;

The process here is:

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

2. When Coarsegrainedexecutorbackend receives the LAUNCHTASK message sent by Tasksetmanager, it deserializes the task Description, Then use the only executor in Coarsegrainedexecutorbackend to perform the task

Override DefReceive: Partialfunction[any, Unit] = {
CaseRegisteredexecutor(hostname) =
Loginfo ("successfully registered with driver")
Executor=NewExecutor (Executorid, Hostname, Env, Userclasspath, IsLocal =false)

Caseregisterexecutorfailed(message) =
LogError ("Slave registration failed:"+ message)
System.Exit(1)

CaseLaunchtask(data) =
if(Executor==NULL) {
LogError ("Received launchtask command but executor is null")
System.Exit(1)
}Else{
ValTaskdesc =ser. Deserialize[taskdescription] (Data.value)
Loginfo ("Got assigned task"+ taskdesc.taskid)
Executor. Launchtask ( This, TaskId = Taskdesc.taskid, Attemptnumber = Taskdesc.attemptnumber,
Taskdesc.name, Taskdesc.serializedtask)
}

Cluster:

Private[Spark]Objectcoarsegrainedclustermessages {

Case ObjectRetrievesparkpropsextendsCoarsegrainedclustermessage

//Driver to Executors
Case ClassLaunchtask (Data:serializablebuffer)extendsCoarsegrainedclustermessage

Case ClassKilltask (taskId:Long,ExecutorString, InterruptThread:Boolean)
extendsCoarsegrainedclustermessage

Sealed TraitRegisterexecutorresponse

Case ClassRegisteredexecutor (hostname:String)extendsCoarsegrainedclustermessage
withRegisterexecutorresponse

Case Classregisterexecutorfailed (Message:String)extendsCoarsegrainedclustermessage
withRegisterexecutorresponse

Supplemental Note: Launchtask is case class

See in Executor

Start worker thread Pool
Private Val ThreadPool = threadutils. Newdaemoncachedthreadpool ("Executor task launch worker")
Private Val Executorsource = new executorsource (threadPool, Executorid)

Thread pool Ah!!!

Homework:

A summary of the job submission and execution process.

Liaoliang Teacher's card:

China Spark first person

Sina Weibo: Http://weibo.com/ilovepains

Public Number: Dt_spark

Blog: http://blog.sina.com.cn/ilovepains

Mobile: 18610086859

qq:1740415547

Email: [Email protected]

This article from "a Flower proud Cold" blog, declined reprint!

Spark Runtime (Driver, Masster, Worker, Executor) Insider decryption (DT Big Data Dream Factory)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More