Spark analysis-dagscheduler

Last Update:2014-07-05 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Main functions of dagscheduler
1. Receive jobs submitted by users;
2. Divide jobs into different stages according to their types, generate a series of tasks in each stage, and encapsulate them into taskset;
3. Submit taskset to taskscheduler;

The job submission process is described as follows:

val sc = new SparkContext("local[2]", "WordCount", System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_TEST_JAR")))val textFile = sc.textFile("xxx")val result = textFile.flatMap(line => line.split("\t")).map(word => (word, 1)).reduceByKey(_ + _)result.collect

RDD. Collect

==> SC. runjob##########################Now the RDD is submitted to dagscheduler.########################

==> Dagscheduler. runjob

==> Dagscheduler. submitjob

==> Eventprocessactor!Jobsubmitted

Def receive = {CaseJobsubmitted(Jobid, RDD, func, partitions, allowlocal...) => dagschedtions. handlejobsubmitted (jobid, RDD, func, partitions, allowlocal ...)}// Complete the transition from job to stage, generate finalstage, and submitPrivate [schedtions] def handlejobsubmitted (jobid: int, finalrdd: RDD [_], FUNC: (taskcontext, iterator [_]) =>_, partitions: array [int], allowlocal: Boolean ...) {// Note: This RDD is a final RDD instead of a series of RDD. When you use finalrdd to create a finalstage // newstage operation, a new result stage or shuffle stage will be generated: there is an isshufflemap variable inside to identify whether the stage is shuffle or result var finalstage: Stage =Newstage(RDD, partitions. size, none, jobid, some (callsite) // use finalstage to build job Val job = new activejob (jobid, finalstage, func, partitions, callsite, listener, properties) // for a simple job, there is no dependency and there is only one partition. This type of job is processed using a local thread instead of being submitted to taskscheduler for processing if (allowlocal & finalstage. parents. size = 0 & partitions. length = 1) {runlocally (job)} else {Submitstage (finalstage)}}

The handlejobsubmitted method converts a job to a stage to generate a finalstage. Each job has a finalstage.

Newstage () method analysis: Generate finalstage Based on finalrdd

Private defNewstage(RDD: RDD [_], numtasks: int, // The number of tasks is the number of partitions shuffledep: Option [shuffledependency [_, _], jobid: int, callsite: option [String] = none): Stage = {val id = nextstageid. getandincrement () Val stage = new stage (ID, RDD, numtasks, shuffledep, getparentstages (RDD, jobid), jobid, callsite )......} private def getparentstages (RDD: RDD [_], jobid: INT): list [stage] = {Val parents = new hashset [stage] Val visited = new hashset [RDD [_] def visit (r: RDD [_]) {If (! Visited (R) {visited + = R for (DEP <-R. dependencies) {Dep match {Case shufdep: shuffledependency [_, _] => parents + = getshufflemapstage (shufdep, jobid) case _ => visit (Dep. RDD) }}visit (RDD) parents. tolist} private def getshufflemapstage (shuffledep: shuffledependency [_, _], jobid: INT): Stage = {shuffletomapstage. get (shuffledep. shuffleid) Match {case some (stage) => stage case none => Val stage = neworusedstage (shuffledep. RDD, shuffledep. RDD. partitions. size, shuffledep, jobid) shuffletomapstage (shuffledep. shuffleid) = stage}

The finalstage generated after newstage () contains all the dependent parent stages of the stage. The dependency of the stage is constructed using the getparentstages () method;

Generate a stage instance. The stage ID is obtained by adding one to the value of nextstageid. The number of tasks is the number of partitions;

There are two types of stages: shufflestage and resultstage;

There is an isshufflemap variable inside the stage to identify whether the stage is shuffle or result type;

Spark divides stages according to wide dependencies: shufflestage is created based on the dependency of RDD;

Submitstage () method analysis: Calculate the dependency between stages (stage DAG) and process the dependency.

Private def submitstage (stage: stage) {If (! Waiting (stage )&&! Running (stage )&&! Failed (stage) {Val missing =Getmissingparentstages(Stage ). sortby (_. ID) // If parent stage if (missing = nil) is found based on final stage) {// if the calculation finds that the current stage does not have any dependencies or all dependencies have been prepared, submit the taskSubmitmissingtasks(Stage, jobid. get) Running + = stage // set the current stage to running, because the current stage does not have the stage to be fully processed} else {// if there is a parent stage, submit parent is required first, because the for (parent <-missing) {submitstage (parent)} Waiting + = stage // the current stage needs to be executed sequentially between stages and put into the waiting list, this stage needs to wait for the parent to be executed first} // find all the parent stageprivate def getmissingparentstages (stage: stage) based on the parents of final stage ): list [stage] = {...... dep match {// For shuffledepe Ndency: Creates a shuffle map stage. If this stage is available, add the case shufdep: shuffledependency [_, _] => // shuffledependecy Val mapstage = getshufflemapstage (shufdep, stage. jobid) if (! Mapstage. isavailable) {missing + = mapstage} case narrowdep: narrowdependency [_] => // narrowdependecy visit (narrowdep. RDD )}}

Getmissparentstages (stage) processing steps:

1. Obtain the parent of the stage based on the stage, that is, the dependency of RDD. The parentstage is generated through the dependencies of RDD;

2. If the dependency is wide dependent, A mapstage is generated as the parent of finalstage. That is to say, mapstage and finalstage are generated for jobs that require shuffle operations.

3. If the dependency is narrow, no new stage is generated. That is to say, only one finalstage is required for jobs that do not require shuffle;

Note: The result set obtained by getmissparentstages (stage) is sorted in descending order of stageid.

Submitstage () processing steps:

1. Calculate the getmissparentstages () of the stage. If the current stage does not have any dependencies or all dependencies have been executed, submit the stage;

2. If the stage with dependency is not executed, execute all dependent parent stages first (execute the stage according to the result set descending order obtained by getmissparentstages () method );

Submitmissingtasks () method analysis: Split stage into tasks based on parition to generate taskset and submit it to taskscheduler

Private def submitmissingtasks (stage: stage, jobid: INT) {// first, based on the Partition Distribution of the RDD on which the stage depends, task var tasks = arraybuffer [task [_] () with the same number of partition will be generated. // different tasks will be generated for finalstage or mapstage. // Check whether to shufflemap the stage. If yes, the shufflemaptask if (stage. isshufflemap) {// mapstage: indicates that other stages depend on this stage for (P <-0 until stage. numpartitions if stage. outputlocs (p) = nil) {// The task distributes Val locs = getpreferredlocs (stage. RDD, p) tasks + = new shufflemaptask (stage. ID, stage. RDD, stage. shuffledep. get, P, locs)} else {// finalstage: This type of stage directly outputs the result to generate resulttask Val job = resultstageto Job (stage) for (ID <-0 until job. numpartitions if! Job. finished (ID) {Val partition = job. partitions (ID) Val locs = getpreferredlocs (stage. RDD, partition) // because it is resulttask, You need to input the defined func, that is, if the processing result returns tasks + = new resulttask (stage. ID, stage. RDD, Job. func, partition, locs, ID)} // submit a task to taskschuduler, in the unit of stage. One stage corresponds to one taskset tasksched. submittasks (New taskset (tasks. toarray, stage. ID, stage. newattemptid (), stage. jobid, properties ))}

To process the submitmissingtask () method, follow these steps:

1. Use stage. isshufflemap to determine whether to generate shufflemaptask or resulttask;

2. For shufflemaptask, tasks with the same number of partitions are generated based on the Partition Distribution of the RDD on which the stage depends. These tasks are distributed based on the locality of the partition'

3. encapsulate all tasks generated by the stage into a taskset and submit them to the submittasks () method of taskscheduler for scheduling;

##########################Now dagscheduler has been submitted to taskschuduler########################

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More