First, we will use a spark architecture diagram to understand the role and position of worker in Spark:
Worker has the following roles:
1. receive commands from the master to start or kill executor.
2. Accept the master command to start or kill the driver.
3. report the status of executor/driver to master
4. Heartbeat to the master, and heartbeat times out, the master considers that the worker has crashed and cannot work.
5. Report the worker status to the GUI
To put it bluntly, the worker is actually working in the whole cluster. First, let's take a look at the important data structure of worker:
val executors = new HashMap[String, ExecutorRunner] val finishedExecutors = new HashMap[String, ExecutorRunner] val drivers = new HashMap[String, DriverRunner] val finishedDrivers = new HashMap[String, DriverRunner]
These hash maps store the ing between the name and the object time, so that you can directly find the object through the name for calling.
Let's take a look at how to start executor:
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) => if (masterUrl != activeMasterUrl) { logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.") } else { try { logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name)) val manager = new ExecutorRunner(appId, execId, appDesc, cores_, memory_, self, workerId, host, appDesc.sparkHome.map(userSparkHome => new File(userSparkHome)).getOrElse(sparkHome), workDir, akkaUrl, ExecutorState.RUNNING) executors(appId + "/" + execId) = manager manager.start() coresUsed += cores_ memoryUsed += memory_ masterLock.synchronized { master ! ExecutorStateChanged(appId, execId, manager.state, None, None) } } catch { case e: Exception => { logError("Failed to launch executor %s/%d for %s".format(appId, execId, appDesc.name)) if (executors.contains(appId + "/" + execId)) { executors(appId + "/" + execId).kill() executors -= appId + "/" + execId } masterLock.synchronized { master ! ExecutorStateChanged(appId, execId, ExecutorState.FAILED, None, None) } } }
Lines 1 to 3 verify whether the command is from a valid master. Lines 7 to 10 define an executorrunner. In fact, the system does not have a class called executor. What we call executor is actually implemented by executorrunner. This name is also appropriate. The new executor will be placed in the hash map mentioned above. Then start the executor in 12 rows. Statistics on core and memory used by Lines 13 and 14. Lines 15 to 17 actually report the executor status to the master. Locks are required here.
If an exception is thrown during this process, check whether the executor has been added to the hash map. If yes, stop it first and then delete it from the hash map. And report to the master that executor is failed. The master restarts the new executor.
Next, let's take a look at the use of the driver's hash map through killdriver:
case KillDriver(driverId) => { logInfo(s"Asked to kill driver $driverId") drivers.get(driverId) match { case Some(runner) => runner.kill() case None => logError(s"Asked to kill unknown driver $driverId") } }
The killdirver command is actually issued by the master, and the master actually receives the kill driver command from the client. This also shows the simplicity of scala.