The way of spark cultivation (advanced article)--spark Source reading: Tenth section standalone operation mode analysis __ Source analysis

Source: Internet
Author: User
Tags call back

The

Spark standalone uses the Master/slave architecture, which includes the following classes:

Class: Org.apache.spark.deploy.master.Master Description: Responsible for the entire cluster of resource scheduling and application management. Message type: Receives messages sent by worker 1. Registerworker 2. Executorstatechanged 3. Workerschedulerstateresponse 4. Heartbeat messages sent to the worker 1. Registeredworker 2. Registerworkerfailed 3. Reconnectworker 4. Killexecutor 5.LaunchExecutor 6.LaunchDriver 7.KillDriver 8.ApplicationFinished message sent to Appclient 1. Registeredapplication 2. Executoradded 3. Executorupdated 4. Applicationremoved receives the message sent by Appclient 1. RegisterApplication 2. Unregisterapplication 3. Masterchangeacknowledged 4. Requestexecutors 5. Killexecutors message sent to driver client 1.SubmitDriverResponse 2.KillDriverResponse 3.DriverStatusResponse receive driver Client-Sent message 1.RequestSubmitDriver 2.RequestKillDriver 3.RequestDriverStatus class Org.apache.spark.deploy.worker.Work ER Description: Register yourself with Master and start Coarsegrainedexecutorbackend, start executor Run task message type at run time: Message 1 sent to master. Registerworker 2. Executorstatechanged 3. Workerschedulerstateresponse 4. Heartbeat receives message 1 sent by master.
Registeredworker2. registerworkerfailed 3. Reconnectworker 4.
Killexecutor 5.LaunchExecutor 6.LaunchDriver 7.KillDriver 8.ApplicationFinished
Class Org.apache.spark.deploy.client.AppClient.ClientEndpoint Description: Registers with master and monitors application, requests or kills message types such as executors: Message sent to Master 1. RegisterApplication 2. Unregisterapplication 3. Masterchangeacknowledged 4. Requestexecutors 5. Killexecutors receives message 1 sent by master. Registeredapplication 2. Executoradded 3. Executorupdated 4.
Applicationremoved class: Org.apache.spark.scheduler.cluster.DriverEndpoint Note: The runtime registers the executor and starts the task running and handles message types such as status updates sent by executor: Messages sent to executor 1.LaunchTask 2.KillTask 3.RegisteredExecutor 4. 
registerexecutorfailed receive executor sent message 1.RegisterExecutor 2.StatusUpdate class: Org.apache.spark.deploy.ClientEndpoint Description: Management driver includes submitting driver, killing driver, and getting driver status information sent to master 1.RequestSubmitDriver 2.RequestKillDriver 3.RequestD
 Riverstatus receives the message sent by master 1.SubmitDriverResponse 2.KillDriverResponse 3.DriverStatusResponse

All of the above classes inherit from Org.apache.spark.rpc.ThreadSafeRpcEndpoint, and their underlying implementations are currently implemented through Akka, as shown in the following illustration:

The interactions between the various categories are shown in the following illustration:
1. Interaction between Appclient and master

When Sparkcontext is created, the Createtaskscheduler method is invoked to create the corresponding TaskScheduler and schedulerbackend

 Create and start the scheduler Val (sched, ts) = Sparkcontext.createtaskscheduler (this, master) _schedulerback End = Sched _taskscheduler = ts _dagscheduler = new Dagscheduler (this) _heartbeatreceiver.ask[boolean] (Tasksch Edulerisset)//start TaskScheduler after TaskScheduler sets Dagscheduler reference in Dagscheduler ' s//Construc Tor _taskscheduler.start () standalone run mode created TaskScheduler and schedulerbackend specific source code as follows:/** * Create a Task Scheduler Ba
   Sed on a given master URL.
   * Return a 2-tuple of the scheduler backend and the Task Scheduler. 

   * Private def Createtaskscheduler (Sc:sparkcontext, master:string): (Schedulerbackend, TaskScheduler) = { Omit other non-critical code case Spark_regex (Sparkurl) => val Scheduler = new Taskschedulerimpl (SC) Val Masteru RLS = Sparkurl.split (","). Map ("spark://" + _) Val backend = new Sparkdeployschedulerbackend (Scheduler, SC, Masteru
  RLS) Scheduler.initialize (backend)      (Backend, scheduler)//Omit other non-critical code} 

After creating the TaskScheduler and Schedulerbackend, call the TaskScheduler Start method to start the Schedulerbackend ( Standalone mode corresponds to Sparkdeployschedulerbackend)

The Start method in Taskschedulerimpl 
override def start () {
    //Invoke the Schedulerbackend Start method 
    Backend.start ()
    Omit other non-critical code
  }

The source code for the Start method in the corresponding Sparkdeployschedulerbackend is as follows:

Override Def start () {
    super.start ()

    //Omit other non-critical code
    //application related information (including application name, executor run memory, etc.)
    Val Appdesc = new Applicationdescription (Sc.appname, Maxcores, sc.executormemory,
      command, Appuiaddress, Sc.eventlogdir, Sc.eventlogcodec, Coresperexecutor)
    //Create appclient, pass in corresponding startup parameters
    client = new Appclient ( SC.ENV.RPCENV, Masters, Appdesc, this, conf)
    Client.start ()
    waitforregistration ()
  }

The Start method in the Appclient class has the original code as follows:

Appclient Start Method
  def start () {
    //Just launch an rpcendpoint; it'll call back into the listener.
    Clientendpoint, the Clientendpoint is the inner class of appclient
    //It is appclient rpcendpoint endpoint
    = Rpcenv.setupendpoint ("Appclient", New Clientendpoint (rpcenv))
  }
Clientendpoint Registers Application

override Def OnStart () with Master at startup (): unit = {
      try {
        Registerwithmaster (1)
      Catch {case
        e:exception =>
          logwarning ("Failed to connect to master", E)
          markdisconnected () C14/>stop ()
      }
    }

The Registerwithmaster method is to register the application to master, the source code is as follows:

/** * Register with the all Masters asynchronously. It'll call ' Registerwithmaster ' every * registration_timeout_seconds SECONDS until exceeding registration_retries t
     IMEs.
     * Once we connect to a master successfully, all scheduling work and futures'll be cancelled.
     * * Nthretry means this is the nth attempt to register with master. * * Private def registerwithmaster (nthretry:int) {registermasterfutures = Tryregisterallmasters ()///Registration failed Retry Registrationretrytimer = registrationretrythread.scheduleatfixedrate (new Runnable {override def run (): U NIT = {utils.tryorexit {if (registered) {Registermasterfutures.foreach _.cancel (tru
              e)) Registermasterthreadpool.shutdownnow ()} else if (Nthretry >= registration_retries) { Markdead ("All Masters are unresponsive! Giving up. ")}
   else {Registermasterfutures.foreach (_.cancel (True))           Registerwithmaster (Nthretry + 1)}}}, Registration_timeout_seconds, REG Istration_timeout_seconds, Timeunit.seconds)}

The

registers with all masters because master may have achieved high reliability (ha), such as Zookeeper ha, so there are multiple master, but eventually only active master responds with the following source code:

/** * Register with the all Masters asynchronously and returns a array ' Future ' for Cancella
     tion.  * Private Def tryregisterallmasters (): array[jfuture[_]] = {for (masteraddress <-masterrpcaddresses) yield {Registermasterthreadpool.submit (new Runnable {override def run (): unit = try {if (Regis tered) {return} loginfo ("Connecting to Master" + Masteraddress.tosparkurl + "...")  )//Get Master rpcendpoint val masterref = Rpcenv.setupendpointref (Master.system_name, Masteraddress, Master.endpoint_name)//Send registerapplication information to Master masterref.send (registerappli
            Cation (appdescription, self))} catch {case ie:interruptedexception =>//Cancelled Case Nonfatal (E) => logwarning (S "Failed to connect to master $masterAddress", E)})}}< /pre>

Master will receive registerapplication messages from Appclient, the specific source code is as follows:

The Org.apache.spark.deploy.master.Master.receive method accepts the RegisterApplication message sent by Appclient
override def receive: Partialfunction[any, Unit] = {case

    registerapplication (description, driver) => {
      //TODO Prevent repeated Reg Istrations from some driver
      if (state = = Recoverystate.standby) {
        //Ignore, don ' t send response
      } else {
        Loginfo ("registering app" + description.name)
        //Create ApplicationInfo
        val app = createapplication (description, Driver)
        //Registration Application
        registerapplication (APP)
        Loginfo ("registered app" + Description.name + "with ID "+ app.id"
        persistenceengine.addapplication (APP)
        //Send Registeredapplication message
        to Appclient Driver.send (Registeredapplication (app.id, self))
        schedule ()
      }
    }

Appclient internal class Clientendpoint receive registeredapplication message from master

Override Def receive:partialfunction[any, Unit] = {case
      registeredapplication (appid_, masterref) =>
        // Fixme to handle the following cases?
        1. A Master receives multiple registrations and sends back multiple//registeredapplications due to A unstable network
        .
        //2. Receive multiple registeredapplication from different masters because the master is
        //changing.
        AppId = appid_
        registered = True
        master = Some (masterref)
        listener.connected (appId)
       //Omit other non-critical code
}

Through the above process to complete application registration. Other interactive information is as follows

------------------appclient message to master------------------////appclient Register application case Class register to master Application (appdescription:applicationdescription, driver:rpcendpointref) extends Deploymessage//appclient to master Log off Application case Class Unregisterapplication (appid:string)//master after recovering from a failure, send Masterchange message to Appclient, Appclient receives the message, changes the saved master information, and then sends masterchangeacknowledged to master case Class Masterchangeacknowledged (appId:

String)//The number of running requests for application is Requestedtotal Executor case class Requestexecutors (appid:string, Requestedtotal:int) Kill application corresponding Executors case class Killexecutors (Appid:string, executorids:seq[string])//------------------ Master sends a message to appclient------------------////Appclient Send application registered successful message case Class Registeredapplication (appId : String, Master:rpcendpointref) extends Deploymessage//TODO (Matei): Replace Hostport with host//worker started Executo R, send this message to notify Appclient case class executoradded (id:inT, Workerid:string, hostport:string, Cores:int, Memory:int) {Utils.checkhostport (Hostport, "Required hostPort") Appclient Case Class executorupdated (Id:int, State:executorstate, message:option[string) is sent after the//executor status update. Exitstatus:option[int])//application a successful run or failure, Master sends the message to the Appclient//appclient to stop application's running case after receiving the message When Class applicationremoved (message:string)//Master changes, the masterchanged message is used to notify the worker and Appclient case class Masterchan Ged (Master:rpcendpointref, masterwebuiurl:string)
2. Interaction between master and worker

Only the basic message interaction is given here, and there is time to analyze it later.

 ------------------worker sends a message to master------------------////To master registration Worker,master after completing the worker registration, Send a Registeredworker message to the worker, and you can then receive dispatch case Class Registerworker from Master (id:string, host:string, por T:int, Worker:rpcendpointref, Cores:int, Memory:int, Webuiport:int, Publicaddress:stri NG) extends Deploymessage {utils.checkhost (host, "Required hostname") assert (Port > 0)}//Report to Master
      Executor state Change Case class executorstatechanged (appid:string, Execid:int, State:executorstate, Message:option[string], Exitstatus:option[int]) extends Deploymessage//report to master Driver status change case class D Riverstatechanged (driverid:string, State:driverstate, exception:option[exception]) extends Deplo Ymessage//worker reports its running executor and driver information case Class Workerschedulerstateresponse to Master (id:string, Executors:list[e Xecutordescription], Driverids:seq[strinG]//worker The heartbeat information sent to master, mainly to the live case class Heartbeat (workerid:string, worker:rpcendpointref) extends Deploym Essage
 ------------------Master sends a message to the worker------------------////worker send Registerworker message register worker, After registering successfully, Master replies Registeredworker message to the worker case class Registeredworker (Master:rpcendpointref, masterwebuiurl:string) Extends deploymessage//worker send Registerworker message registration worker, after registration failed master reply registerworkerfailed message to worker case class After registerworkerfailed (message:string) extends Deploymessage//worker heartbeat timeout, Master sends a reconnectworker message to the worker. Notifies the worker node that it needs to re-register case class Reconnectworker (masterurl:string) extends Deploymessage//application after running, master to worker Send Killexecutor message, after the worker receives the message, deletes the corresponding Execid executor case class Killexecutor (masterurl:string, appid:string, Execid:  INT) extends Deploymessage//Send startup Executor message case class Launchexecutor to the Worker node (masterurl:string, AppId: String, Execid:int, Appdesc:applicationdescription, Cores:int, memory:int) extends Deploy Message//Send to the worker node start driver messages Case Class Launchdriver (driverid:strING, driverdesc:driverdescription) extends Deploymessage//kill corresponds to driver case class Killdriver (driverid:string) Extend s deploymessage case Class applicationfinished (id:string)
3. Message interaction between Driver client and master

Driver client mainly manages Driver, including submitting Driver to Master, requesting to kill Driver, etc. Its source code is located in the Org.apache.spark.deploy.client.scala source file, the class name is: Org.apache.spark.deploy.ClientEndpoint. Note that it is different from the nature of the Org.apache.spark.deploy.client.AppClient.ClientEndpoint class.

 ------------------Driver The interaction of master information between client------------------////driver Client submits Driver case class to master request Reques Tsubmitdriver (driverdescription:driverdescription) extends Deploymessage//master to driver client to return a successful registration case Clas S Submitdriverresponse (Master:rpcendpointref, Success:boolean, driverid:option[string], message:string) ext Ends Deploymessage//driver client requests to master for kill Driver Case Class Requestkilldriver (driverid:string) extends Deploy Message//Master reply to kill driver success Case Class Killdriverresponse (Master:rpcendpointref, driverid:string, SUCC Ess:boolean, message:string) extends Deploymessage//driver client to master request Driver status Case class Requestdriverst ATUs (driverid:string) extends Deploymessage//master returns status request information to driver client case class Driverstatusresponse (Found:boo Lean, state:option[driverstate], workerid:option[string], workerhostport:option[string], Exception:option[exceptio

 N])
4. Message interaction between driver and executor
------------------driver messages sent to executor------------------///Start Task Case class Launchtask (data: Serializablebuffer) extends Coarsegrainedclustermessage//Kill Task Case Class Killtask (Taskid:long, executor:string, I Nterruptthread:boolean) extends Coarsegrainedclustermessage//executor registered Success Case Object Registeredexecutor extends Co Arsegrainedclustermessage//executor Registration failure case Class registerexecutorfailed (message:string) extends Coarsegrainedclustermessage//------------------Executor message sent to driver------------------////Register executor case with driver Class Registerexecutor (executorid:string, Executorref:rpcendpointref, hostport:string, cores: Int, Logurls:map[string, String]) extends Coarsegrainedclustermessage {utils.checkhostport (Hostport, "expe CTED host Port ")}//Reporting status changes to driver case class Statusupdate (executorid:string, Taskid:long, State:taskstate, DA Ta:serializablebuffer) extends Coarsegrainedclustermessage objecT statusupdate {/** Alternate factory method This takes a bytebuffer directly for the data field/DEF apply (exe  Cutorid:string, Taskid:long, State:taskstate, data:bytebuffer): Statusupdate = {statusupdate (executorId, TaskId, State, new Serializablebuffer (data)}}

Author: Zhou Zhi Lake
Network name: Swing Teen Dream

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.