Spark Executor Insider thorough decryption (DT Big Data Dream Factory)

Source: Internet
Author: User

Content:

1, Spark executor working principle diagram;

2, Executorbackend registry source decryption;

3, executor instantiation of the inside;

4. How does executor work in particular?

1, the master sends the instruction to the worker to start the executor;

2, the worker accepts to master sends the instruction, through Executorrunner launches another process to run the executor;

3, this will start the coarse-grained executorbackend (coarsegrainedexecutorbackend);

4, coarsegrainedexecutorbackend by sending registerexecutor to driver registration;

5, driver in executor after the successful registration will return registerexecutor information to Coarsegrainedexecutorbackend;

==========spark Executor working principle diagram ============

1, the need to pay special attention to the Coarsegrainedexecutorbackend start, to driver registration executor Its essence is a registered Executorbackend instance, and executor instances there is no direct relationship between;

Override DefOnStart() {
Loginfo ("Connecting to driver:"+ Driverurl)
Rpcenv.asyncsetupendpointrefbyuri (Driverurl). flatMap {ref =
//This was a very fast action so we can use "Threadutils.samethread"
    Driver=Some(ref)
Ref.ask[registerexecutorresponse] (
Registerexecutor(Executorid, Self, Hostport, Cores, Extractlogurls))
} (Threadutils.Samethread). OnComplete {
//This was a very fast action so we can use "Threadutils.samethread"
    CaseSuccess(msg) = Utils.Trylognonfatalerror{
Option(self). foreach (_.send (msg))//msg must be Registerexecutorresponse
   }
CaseFailure(e) + = {
LogError (S "Cannot register with driver:$Driverurl", E
System.Exit(1)
}
} (Threadutils.Samethread)
}

2, Coarsegrainedexecutorbackend is executor run the process name, executor is the real Processing Task object, executor internal is through the thread pool way to complete task calculation;

3, Coarsegrainedexecutorbackend and executor are one by one corresponding;

4, Coarsegrainedexecutorbackend is a message communication body (it implements the Threadsaferpcendpoint), can send messages to driver and can accept driver sent over the instructions, such as Start task;

5. In the driver process, there are two critical endpoint:

1) Clientendpoint: mainly responsible for registering the current program to master, is the internal member of Appclient;

2) Driverendpoint: This is the entire program run time Drive, is the internal member of Coarsegrainedexecutorbackend, here will receive registerexecutor information and complete the registration in driver;

6, in driver through Executordata encapsulation and registration Executorbackend information to driver memory data structure Executormapdata (Coarsegrainedschedulerbackend members)

7, actually in the execution time Driverendpoint will take the information to inhale the Coarsegrainedschedulerbackend memory data structure Executormapdata, So ultimately it is OK to register to Coarsegrainedschedulerbackend, that is to say Coarsegrainedschedulerbackend mastered all the executorbackend processes assigned to the current program, In each Executorbackend instance, the Executor object is responsible for the execution of a specific task. Use the Synchronized keyword to ensure executormapdata secure concurrent write operations when running;

8, Coarsegrainedexecutorbackend received Driverendpoint sent over the Registeredexecutor message will start the Executor instance object, The executor instance object object, in fact, is responsible for real task computation;

9, the creation of ThreadPool in a multi-threaded concurrent execution and thread reuse in a way to efficiently execute the spark sent to the task, after receiving the command executed by the task, will first wrap the task in Taskrunner

Override Defreceiveandreply(context:rpccallcontext): Partialfunction[any, Unit] = {

CaseRegisterexecutor(Executorid, Executorref, Hostport, Cores, Logurls) =
if(Executordatamap. Contains (Executorid)) {
Context.reply (registerexecutorfailed("Duplicate executor ID:"+ Executorid))
}Else{
//If The executor ' s RPC env is not listening for incoming connections, ' Hostport '
//would be NULL, and the client connection should is used to contact the executor.
      ValExecutoraddress =if(Executorref.address! =NULL) {
Executorref.address
}Else{
Context.senderaddress
}
Loginfo (S "Registered executor$Executorref ($Executoraddress) with ID$Executorid")
Addresstoexecutorid(executoraddress) = Executorid
Totalcorecount. Addandget (Cores)
totalregisteredexecutors. Addandget (1)
Valdata =NewExecutordata (executorref, Executorref.address, Executoraddress.host,
       Cores, Cores, Logurls)
//This must is synchronized because variables mutated
In this block is read when requesting executors
     Coarsegrainedschedulerbackend. This. synchronized {
Executordatamap. put (Executorid, Data
if(numpendingexecutors>0) {
numpendingexecutors-=1
         Logdebug (S "decremented number of pending executors ($numpendingexecutorsLeft ) ")
}
}
Note:some tests expect the reply to come after we put the executor in the map
     Context.reply (Registeredexecutor(Executoraddress.host))
Listenerbus. Post (
sparklistenerexecutoradded(System.Currenttimemillis(), Executorid, Data))
Makeoffers ()
}

Private[Cluster]classExecutordata (
ValExecutorendpoint:rpcendpointref,
   ValExecutoraddress:rpcaddress,
   override ValExecutorhost:String,
   varFreecores:Int,
   override ValTotalcores:Int,
   override ValLogurlmap:Map[String, String]
)extendsExecutorinfo (executorhost, Totalcores, LOGURLMAP)

Start worker thread Pool
Private Val ThreadPool = threadutils. Newdaemoncachedthreadpool ("Executor task launch worker")
Private Val Executorsource = new executorsource (threadPool, Executorid)

How the ==========executor works ============

1, when the driver sent over the task, is actually sent to the coaresgrainedexecutorbackend this rpcendpoint, rather than directly sent to the executor (executor because not the message loop body, So you can never receive the information sent over remotely)

2, Executorbackend received in the driver sent over the message will provide a call to Launchtask to executor to execute, and then to the thread pool thread processing;

3, Taskrunner In fact, Java in the implementation of the Runnable interface, when the real work will be given to the thread pool, thread pools in the thread to run, this time will call the Run method to execute the task;

4. Taskrunner calls the Run method of the task when the Run method is called, and the task's Run method calls Runtask, and the actual task has shufflemaptask and resulttask;

CaseLaunchtask(data) =
if(Executor==NULL) {
LogError ("Received launchtask command but executor is null")
System.Exit(1)
}Else{
ValTaskdesc =ser. Deserialize[taskdescription] (Data.value)
Loginfo ("Got assigned task"+ taskdesc.taskid)
Executor. Launchtask ( This, TaskId = Taskdesc.taskid, Attemptnumber = Taskdesc.attemptnumber,
     Taskdesc.name, Taskdesc.serializedtask)
}

defLaunchtask(
Context:executorbackend,
   TaskId:Long,
   Attemptnumber:Int,
   TaskName:String,
   Serializedtask:bytebuffer):Unit= {
ValTR =NewTaskrunner (context, TaskId = TaskId, Attemptnumber = Attemptnumber, TaskName,
   Serializedtask)
Runningtasks. put (TaskId, Tr
ThreadPool. Execute (TR)
}

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

Liaoliang Teacher's card:

China Spark first person

Sina Weibo: Http://weibo.com/ilovepains

Public Number: Dt_spark

Blog: http://blog.sina.com.cn/ilovepains

Mobile: 18610086859

qq:1740415547

Email: [Email protected]


This article from "a Flower proud Cold" blog, declined reprint!

Spark Executor Insider thorough decryption (DT Big Data Dream Factory)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.