Spark Source code series (4) graphic job Lifecycle

Source: Internet
Author: User

In this chapter, we explored the running process of spark jobs, but did not depict the whole process. Well, let me go, let you know!

Let's review this figure. Driver program is the program we wrote, and its core is sparkcontext. In retrospect, from the API usage perspective, RDD must be obtained through it.

The following describes how it interacts with other components rather than the cognitive side.

Application Registration process from driver to master

After sparkcontext is instantiated, two important classes, dagscheduler and taskscheduler, are instantiated internally.

In standalone mode, the implementation class of taskscheduler is taskschedulerimpl. When initializing it, sparkcontext will pass in a sparkdeployschedulerbackend.

An appclient is started in the Start method of sparkdeployschedulerbackend.

    val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", args, sc.executorEnvs, 
            classPathEntries, libraryPathEntries, extraJavaOpts) val sparkHome = sc.getSparkHome() val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,            sparkHome, sc.ui.appUIAddress, sc.eventLogger.map(_.logDir)) client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf) client.start()

Maxcoresis specified by Alibaba spark.cores.max, and executormemoyis specified by spark.exe cutor. Memory.

After the appclient is started, it will register applicatoin with the master. I will use the diagram to express the subsequent process.

The figure above involves third-party communication. The specific process is as follows:

1. The driver sends the registerapplication message to the master to register the application. After receiving the message, the master sends the registeredapplication notification to inform the driver that the registration is successful. The driver's receiving Class is appclient.

2. After the master receives the registerapplication, the scheduling process is triggered. When the resources are sufficient, the master sends launchexecutor and executoradded messages to the woker and driver respectively.

3. After the worker receives the launchexecutor message, it executes the command carried in the message and runs the coarsegrainedexecutorbackend class (in the figure, it is replaced by its inherited interface executorbackend ), after the execution is completed, the executorstatechanged message is sent to the master.

4. After the master receives the executorstatechanged message, it immediately sends the executorupdated Message notification driver.

5. The appclient in the driver receives the executoradded and executorupdated sent by the master for corresponding processing.

6. After the coarsegrainedexecutorbackend is started, the system sends the registerexecutor message to the driver.

7. In the driver, sparkdeployschedulerbackend (the specific code is in the coarsegrainedschedulerbackend) receives the registerexecutor message, replies to the registered message registeredexecutor to executorbackend, and prepares to send tasks.

8. After coarsegrainedexecutorbackend receives the registeredexecutor message, instantiate an executor and wait for the task to arrive.

Resource Scheduling

Well, after we finish the communication process of registering an application, one of the most important aspects is its scheduling. How is it scheduled? This is why I emphasized maxcores and executormemoy.

If you read the first chapter "Spark-submit job submission process", you will know that I have already talked about scheduling because I didn't know what this app was. But now we know what the app is. I will not post the code. Let's summarize it.

1. First Schedule the driver and then schedule the application.

2. Its scheduling application method is first-in-first-out, so don't wonder why your app cannot be scheduled, just like going to a hospital in Beijing to see a doctor. It's no longer a late date, it is a truth.

3. There are two executor allocation methods: one is to distribute tasks on multiple nodes, and the other is to run tasks on as few nodes as possible, with the spark parameter. deploy. the spreadout parameter is determined. The default value is true, which distributes tasks to multiple nodes.

Traverse all the pending applications and assign them the worker workers run by Executor. The default allocation method is as follows:

1. Select a worker whose memory is greater than executormemoy from the worker, and sort the worker by the number of idle CPUs from large to small.

2. traverse the worker and allocate the required CPU from the available worker. Each worker provides a CPU core until the number of CPU cores is insufficient or maxcores is allocated.

3. Send tasks to the selected worker to start the Executor. The memory occupied by each executor is the executormemoy we set.

The process of resource scheduling is generally like this. When it comes to this, some children's shoes are confused. Where is the FIFO/Fair Scheduling in our task scheduling? The task scheduler schedules not applications, but all tasks parsed in your code, as mentioned in the previous chapter.

For this reason, when sparkcontext is shared, such as shark and jobserver, the role of the task scheduler is obvious.

The process in which the driver publishes a task to the executor

Next, let's talk about the process of releasing a task from the driver to the executor. This is described in the previous chapter. Now we can release the figure.

1. The code of the driver program runs to the action, triggering the runjob method of sparkcontext.

2. sparkcontext is relatively lazy and hand it over to dagscheduler.

3. dagscheduler divides the job into stages, converts the stages into corresponding tasks, and submits the tasks to taskscheduler.

4. Use taskscheduler to add the tasks to the task queue and hand it over to schedulerbackend.

5. The scheduler assigns executor to the task and schedulerbackend is responsible for executing the task.

Task status update

Task execution is performed by taskrunner. It needs to communicate with the driver through executorbackend. The communication message is statusupdate:

1. Before running a task, the current task status of the driver is taskstate. Running.

2. After the task is run, the current task state of the driver is taskstate. finished and the calculation result is returned.

3. If an error occurs during task running, the current task status of the driver is taskstate. failed and the cause of the error is returned.

4. If the task is killed halfway, the current task status of the driver is taskstate. Failed.

The following describes the successful running status. The figure is too large, so it is inserted to the end.

1. After the task is completed, call the executorbackend statusupdate method to return the result. If the result exceeds 10 MB, the result is saved in blockmanager and the blockid is returned. If necessary, the blockid is used to claim the request from blockmanager.

2. executorbackend directly sends statusupdate to the driver to return the task information.

3. After receiving the statusupdate message, the driver (schedulerbackend) calls the statusupdate method of taskscheduler and sends the next batch of tasks to executorbackend.

4. taskschedmanager find the tasksetmanager (class responsible for managing a batch of tasks) of the task by taskid, and delete the task from tasksetmanager, insert the task into the successful queue of taskresultgetter (class responsible for obtaining the task result.

5. After taskresultgetter obtains the result, call the handlesuccessfultask method of taskscheduler to return the result.

6. taskscheduler calls the handlesuccessfultask method of tasksetmanager to process successful tasks.

7. tasksetmanager calls the taskended method of dagscheduler and tells dagscheduler that the task is finished. If all tasks succeed at this time, tasksetmanager is ended.

8. dagschedevent triggers the completionevent event in the taskended method. The completionevent is processed by resulttask and shufflemaptask.

1) resulttask: Add numfinished of the job to 1. If numfinished is equal to the number of parts of the job, it indicates the end of the job stage, mark the stage as the end, and call joblistener (in jobwaiter) in the tasksucceeded method, the result is handed over to resulthandler (the encapsulated anonymous function) for processing. If the number of completed tasks equals to the total number of tasks, the task exits.

2) shufflemaptask:

(1) Call the stage addoutputloc method and add the result to the stage outputlocs list.

(2) If there is no waiting task for the stage, it indicates that the stage ends.

(3) Register stage outputlocs to mapoutputtracker and leave a stage for use.

(4) If stage outputlocs is empty, it indicates that its calculation fails and the stage is submitted again.

(5) Find the next stage submission that is waiting and has no father.

 

 

CEN yuhai

Please indicate the source for reprinting. Thank you!

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.