19 of Apache Spark Source Code Reading -- Application and release of resources in standalone cluster mode

Last Update:2014-07-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

You are welcome to reprint it. Please indicate the source, huichiro.

Summary

This article describes how to apply for and release resources (mainly CPU core and memory) during the whole running period of spark application in standalone cluster deployment mode.

Shows the four components that constitute the standalone cluster deployment mode. They are master, worker, executor, and driver, which run independently of the JVM process.

From the perspective of resource management

The master is in charge of the resources of the entire cluster, mainly refers to the CPU core and memory, but the master does not own these resources.
The actual contributor of worker computing resources must report to the master about the number of CPU cores and memory they own, and start the executor under the instructions of the master.
The hard work of executor to execute real computing is determined by the master to determine the core and memory values owned by the process.
The actual owner of the driver resource. The driver will submit one or more jobs. After each job is split into multiple tasks, it will be distributed to each executor for real execution.

These contents are also involved in Fault Tolerance Analysis in standalone cluster mode. Today we will focus on how resources are smoothly recycled in different scenarios after allocation.

Resource reporting and aggregation process

In standalone cluster, the master must be started before the worker and driver programs.

After the master node is successfully started, you can start the worker startup. When the worker starts, it needs to initiate a registration with the master node. The registration message contains the CPU core and memory of the worker node.

The Calling sequence is as follows: prestart-> registerwithmaster-> tryregisterallmasters

Let's take a look at the code of tryregisterallmasters.

 def tryRegisterAllMasters() {    for (masterUrl <- masterUrls) {      logInfo("Connecting to master " + masterUrl + "...")      val actor = context.actorSelection(Master.toAkkaUrl(masterUrl))      actor ! RegisterWorker(workerId, host, port, cores, memory, webUi.boundPort, publicAddress)    }  }

Our question is, where do the memory and cores parameters required by the registerworker constructor obtain them?

Note that the main function in worker creates workerarguments,

  def main(argStrings: Array[String]) {    SignalLogger.register(log)    val args = new WorkerArguments(argStrings)    val (actorSystem, _) = startSystemAndActor(args.host, args.port, args.webUiPort, args.cores,      args.memory, args.masters, args.workDir)    actorSystem.awaitTermination()  }

Memory is obtained through the inferdefaultmemory function, and cores is obtained through inferdefacocores.

def inferDefaultCores(): Int = {    Runtime.getRuntime.availableProcessors()  }  def inferDefaultMemory(): Int = {    val ibmVendor = System.getProperty("java.vendor").contains("IBM")    var totalMb = 0    try {      val bean = ManagementFactory.getOperatingSystemMXBean()      if (ibmVendor) {        val beanClass = Class.forName("com.ibm.lang.management.OperatingSystemMXBean")        val method = beanClass.getDeclaredMethod("getTotalPhysicalMemory")        totalMb = (method.invoke(bean).asInstanceOf[Long] / 1024 / 1024).toInt      } else {        val beanClass = Class.forName("com.sun.management.OperatingSystemMXBean")        val method = beanClass.getDeclaredMethod("getTotalPhysicalMemorySize")        totalMb = (method.invoke(bean).asInstanceOf[Long] / 1024 / 1024).toInt      }    } catch {      case e: Exception => {        totalMb = 2*1024        System.out.println("Failed to get total physical memory. Using " + totalMb + " MB")      }    }    // Leave out 1 GB for the operating system, but don‘t return a negative memory size    math.max(totalMb - 1024, 512)  }

If the core and memory of each worker have been specified for display in the configuration file, the value in the configuration file is used. The specific configuration parameter isSpark_worker_coresAndSpark_worker_memory

After receiving the registerwork message, the master creates a workerinfo for each worker based on the reported information.

    case RegisterWorker(id, workerHost, workerPort, cores, memory, workerUiPort, publicAddress) =>    {      logInfo("Registering worker %s:%d with %d cores, %s RAM".format(        workerHost, workerPort, cores, Utils.megabytesToString(memory)))      if (state == RecoveryState.STANDBY) {        // ignore, don‘t send response      } else if (idToWorker.contains(id)) {        sender ! RegisterWorkerFailed("Duplicate worker ID")      } else {        val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,          sender, workerUiPort, publicAddress)        if (registerWorker(worker)) {          persistenceEngine.addWorker(worker)          sender ! RegisteredWorker(masterUrl, masterWebUiUrl)          schedule()        } else {          val workerAddress = worker.actor.path.address          logWarning("Worker registration failed. Attempted to re-register worker at same " +            "address: " + workerAddress)          sender ! RegisterWorkerFailed("Attempted to re-register worker at same address: "            + workerAddress)        }      }

Resource allocation process

If a driver application has been registered at worker registration, you need to start the corresponding executor for the driver application in the unallocated resource status.

Workerinfo is used in the Schedule function. The processing logic of the Schedule function is described as follows:

Check whether the remaining memory of the currently active worker can meet the minimum requirements of each task of the application. If yes, add the worker to the queue of the allocable resources.
Based on the distribution policy, if you decide to share the work with each worker, one core is occupied on each worker until all the allocable resources are exhausted or the driver's needs are met.
If the distribution policy is to distribute to as few workers as possible, the allocable core on the worker is used up at a time until the driver's core requirements are met.
Add the corresponding executor to each worker according to the result of step 2 or 3. The processing function isAddexecutor

In order to briefly describe the distribution and processing processes of workers

      for (worker > workers if worker.coresFree > 0 && worker.state == WorkerState.ALIVE) {        for (app <- waitingApps if app.coresLeft > 0) {          if (canUse(app, worker)) {            val coresToUse = math.min(worker.coresFree, app.coresLeft)            if (coresToUse > 0) {              val exec = app.addExecutor(worker, coresToUse)              launchExecutor(worker, exec)              app.state = ApplicationState.RUNNING            }          }        }      }

Launchexecutor is mainly responsible for two tasks

Record the number of CPU cores and memory used by the newly added executor. The record process occurs in worker. addexecutor.
Send launchexecutor commands to worker

  def launchExecutor(worker: WorkerInfo, exec: ExecutorInfo) {    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)    worker.addExecutor(exec)    worker.actor ! LaunchExecutor(masterUrl,      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory)    exec.application.driver ! ExecutorAdded(      exec.id, worker.id, worker.hostPort, exec.cores, exec.memory)  }

After receiving the launchexecutor command, the worker will also write an account, subtract the CPU core and memory to be used from the available resources, and then use executorrunner to generate the executor process, note that executor runs in an independent process. The Code is as follows:

case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>      if (masterUrl != activeMasterUrl) {        logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")      } else {        try {          logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))          val manager = new ExecutorRunner(appId, execId, appDesc, cores_, memory_,            self, workerId, host,            appDesc.sparkHome.map(userSparkHome => new File(userSparkHome)).getOrElse(sparkHome),            workDir, akkaUrl, conf, ExecutorState.RUNNING)          executors(appId + "/" + execId) = manager          manager.start()          coresUsed += cores_          memoryUsed += memory_          masterLock.synchronized {            master ! ExecutorStateChanged(appId, execId, manager.state, None, None)          }        } catch {          case e: Exception => {            logError("Failed to launch executor %s/%d for %s".format(appId, execId, appDesc.name))            if (executors.contains(appId + "/" + execId)) {              executors(appId + "/" + execId).kill()              executors -= appId + "/" + execId            }            masterLock.synchronized {              master ! ExecutorStateChanged(appId, execId, ExecutorState.FAILED, None, None)            }          }        }      }

During resource allocation, note that if multiple driver applications are in the waiting state, the principle of resource allocation is FIFO, first come, first served.

Resource Recycling Process

The resources reported in the worker are eventually occupied by the job tasks submitted in the driver application. If the application ends (including normal and abnormal exit), the resources occupied by the application should be recycled smoothly, resources that will be occupied will be regrouped into allocable resources.

How can I know that the driver application has exited when the current problem is converted to master and executor?

There are two different processing methods: one is to say goodbye before leaving, and the other is not to sue. This section describes how to use this function separately.

He said goodbye and then left, that is, the driver application explicitly notifies the master and executor. The task has been completed, and I want to bye. The application explicitly calls sparkcontext. Stop.

  def stop() {    postApplicationEnd()    ui.stop()    // Do this only if not stopped already - best case effort.    // prevent NPE if stopped more than once.    val dagSchedulerCopy = dagScheduler    dagScheduler = null    if (dagSchedulerCopy != null) {      metadataCleaner.cancel()      cleaner.foreach(_.stop())      dagSchedulerCopy.stop()      taskScheduler = null      // TODO: Cache.stop()?      env.stop()      SparkEnv.set(null)      ShuffleMapTask.clearCache()      ResultTask.clearCache()      listenerBus.stop()      eventLogger.foreach(_.stop())      logInfo("Successfully stopped SparkContext")    } else {      logInfo("SparkContext already stopped")    }  }

One of the main functions of explicitly calling sparkcontext. Stop is to stop executor explicitly. The code for issuing the stopexecutor command can be found in the stop function in coarsegrainedschedulerbackend.

  override def stop() {    stopExecutors()    try {      if (driverActor != null) {        val future = driverActor.ask(StopDriver)(timeout)        Await.ready(future, timeout)      }    } catch {      case e: Exception =>        throw new SparkException("Error stopping standalone scheduler‘s driver actor", e)    }  }

So how does the master know that the driver application exits? This is due to the communication mechanism of akka. When either party of mutual communication unexpectedly exits, the other party will receiveDisassociatedeventThe master removes the stopped driver application from the message processing.

    case DisassociatedEvent(_, address, _) => {      // The disconnected client could‘ve been either a worker or an app; remove whichever it was      logInfo(s"$address got disassociated, removing it.")      addressToWorker.get(address).foreach(removeWorker)      addressToApp.get(address).foreach(finishApplication)      if (state == RecoveryState.RECOVERING && canCompleteRecovery) { completeRecovery() }    }

In other ways, how does executor know that the application it serves has successfully completed its mission? Like the master, disassociatedevent is used to perceive the situation. For details, see the receive function in coarsegrainedexecutorbackend.

  case x: DisassociatedEvent =>      logError(s"Driver $x disassociated! Shutting down.")      System.exit(1)

Recycling Resources in exceptional circumstances

Because of the heartbeat mechanism between the master and worker, if the worker unexpectedly exits, the master will detect its extinction by the heartbeat mechanism and then remove the reported resources.

When the executor exits abnormally, the monitoring thread executorrunner in the worker immediately perceives it and reports it to the master. The master recycles resources and re-Requests the worker to start the executor.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

19 of Apache Spark Source Code Reading -- Application and release of resources in standalone cluster mode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

19 of Apache Spark Source Code Reading -- Application and release of resources in standalone cluster mode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support