Content:
1, again on the Spark cluster deployment;
2. Job submission and decryption;
3, job generation and acceptance;
4, task of the operation;
5, again on shuffle;
Perspective Spark Runtime from an operational perspective through master, Drvier, executor
========== on spark cluster deployment ============
The deployment of the cluster on the official website:
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
By default, there is a executor under each worker, which maximizes memory and CPU usage.
Master sends instructions to the worker to allocate resources, and does not care whether the worker can be assigned to the resource, or how much resources he sends.
1, from the point of view of Spark runtime, from the five core objects: Master, Worker, Executor, Drvier, coarsegrainedexecutorbackend;
2, spark in the distributed cluster system design, maximize the function of independent, modular package specific independent objects, strong cohesion loose coupling;
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
3, when the Drvier in the Sparkcontext initialization will be submitted to Master,master if the program is accepted to run in Spark, the current program will be assigned AppID, but also allocate specific computing resources, it is important to note that Master assigns specific computing resources to the worker in the cluster according to the configuration information of the current submitter, but Master does not care whether the specific resource has been allocated or not, and turns to say that master has logged the allocated resource after the instruction is issued. The client will not be able to use the resource if it submits another program again .
Disadvantage: It may cause other programs to be submitted cannot be submitted to the compute resources that would otherwise be allocated, not run; The most important advantage: the fastest running system on the basis of weak coupling of the spark Distributed system function (otherwise, if Master had to wait until the final allocation of resources was successful, the driver would not be notified. Will cause driver to block, not be able to maximize the utilization of parallel computing resources);
Because spark by default, the program is queued for execution.
It should be added that, by default, spark does not have the obvious drawbacks of allocating resource policies because there is typically only one application running in the cluster.
==========job submission Process Source decryption ============
Run a program first, look at the log
1, a very important skill, by running a job in Spark-shell to understand the job submission process, and then use the source code to verify the process;
Scala> sc.textfile ("/library/dataforsortedshuffle"). FlatMap (_.split (")"). Map (word=> (word,1)). Reducebykey ( _+_). Saveastextfile ("/library/dataoutput2")
Log information:
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
2. All action in spark will be at least one job
The above log shows that a job is triggered when saveastextfile.
3, Sparkcontext in the instantiation of the time will be constructed sparkdeployschedulerbackend, Dagscheduler, Taskschedulerimpl, Mapoutputtrackermaster and other objects, Which Sparkdeployschedulerbackend is responsible for the management and scheduling of cluster computing resources, Dagscheduler is responsible for high-level scheduling (such as job Stage division, data local content), Taskschedulerimpl is responsible for the underlying scheduling of the stage (for example, the scheduling of each task, task fault tolerance, etc.), Mapoutputtrackermaster is responsible for the management of data output and read in shuffle;
4, Taskschedulerimpl Internal Dispatch (is part of the whole log):
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
==========task Operation Decryption ============
1, task is running in executor, and executor is located in Coarsegrainedexecutorbackend, and Coarsegrainedexecutorbackend and executor is one by one corresponding;
The process here is:
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
2. When Coarsegrainedexecutorbackend receives the LAUNCHTASK message sent by Tasksetmanager, it deserializes the task Description, Then use the only executor in Coarsegrainedexecutorbackend to perform the task
Override DefReceive: Partialfunction[any, Unit] = {
CaseRegisteredexecutor(hostname) =
Loginfo ("successfully registered with driver")
Executor=NewExecutor (Executorid, Hostname, Env, Userclasspath, IsLocal =false)
Caseregisterexecutorfailed(message) =
LogError ("Slave registration failed:"+ message)
System.Exit(1)
CaseLaunchtask(data) =
if(Executor==NULL) {
LogError ("Received launchtask command but executor is null")
System.Exit(1)
}Else{
ValTaskdesc =ser. Deserialize[taskdescription] (Data.value)
Loginfo ("Got assigned task"+ taskdesc.taskid)
Executor. Launchtask ( This, TaskId = Taskdesc.taskid, Attemptnumber = Taskdesc.attemptnumber,
Taskdesc.name, Taskdesc.serializedtask)
}
Cluster:
Private[Spark]Objectcoarsegrainedclustermessages {
Case ObjectRetrievesparkpropsextendsCoarsegrainedclustermessage
//Driver to Executors
Case ClassLaunchtask (Data:serializablebuffer)extendsCoarsegrainedclustermessage
Case ClassKilltask (taskId:Long,ExecutorString, InterruptThread:Boolean)
extendsCoarsegrainedclustermessage
Sealed TraitRegisterexecutorresponse
Case ClassRegisteredexecutor (hostname:String)extendsCoarsegrainedclustermessage
withRegisterexecutorresponse
Case Classregisterexecutorfailed (Message:String)extendsCoarsegrainedclustermessage
withRegisterexecutorresponse
Supplemental Note: Launchtask is case class
See in Executor
Start worker thread Pool
Private Val ThreadPool = threadutils. Newdaemoncachedthreadpool ("Executor task launch worker")
Private Val Executorsource = new executorsource (threadPool, Executorid)
Thread pool Ah!!!
Homework:
A summary of the job submission and execution process.
Liaoliang Teacher's card:
China Spark first person
Sina Weibo: Http://weibo.com/ilovepains
Public Number: Dt_spark
Blog: http://blog.sina.com.cn/ilovepains
Mobile: 18610086859
qq:1740415547
Email: [Email protected]
This article from "a Flower proud Cold" blog, declined reprint!
Spark Runtime (Driver, Masster, Worker, Executor) Insider decryption (DT Big Data Dream Factory)