Apache Spark Source Code Reading 3 -- Analysis of function call relationships during Task Runtime

Last Update:2014-07-07 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

You are welcome to reprint it. Please indicate the source, huichiro.

Summary

This article mainly describes how the business logic of a task executed in taskrunner is called. In addition, it tries to clarify where the input data of a running task is obtained, where and how to return the processing result.

Preparation

Spark has been installed
Spark runs in local mode or local-cluster mode

Local-cluster mode

The local-cluster mode is also known as pseudo-distributed. Run the following command:

MASTER=local[1,2,1024] bin/spark-shell

[1, 1, 1024]Executor number, core number, and memory size, respectively. The memory size should not be smaller than the default 512 MB.

Analysis of the initialization process of driver programme the main source files involved in the initialization process

Sparkcontext. Scala entry to the entire initialization process
Sparkenv. Scala creates blockmanager, mapoutputtrackermaster, connectionmanager, and cachemanager
Dagscheduler. Entry to Scala job submission, which divides jobs into Key Stages
Taskschedulerimpl. Scala determines the executor on which each stage can run several tasks.
Schedulerbackend
1. For the simplest standalone running mode, see localbackend. Scala
2. For cluster mode, check the source file sparkdeployschedulerbackend.

Detailed steps of initialization

Step 1: Generate sparkconf Based on the initialization input parameters, and then create sparkenv Based on sparkconf. sparkenv mainly includes the following key components: 1. blockmanager 2. mapoutputtracker 3. shufflefetcher 4. connectionmanager

 private[spark] val env = SparkEnv.create(    conf,    "",    conf.get("spark.driver.host"),    conf.get("spark.driver.port").toInt,    isDriver = true,    isLocal = isLocal)  SparkEnv.set(env)

Step 2: Create taskscheduler and select the correspondingSchedulerbackendAnd start taskscheduler. This step is critical.

  private[spark] var taskScheduler = SparkContext.createTaskScheduler(this, master, appName)  taskScheduler.start()

Taskschedend. Start is used to start the corresponding schedulerbackend and start the timer for detection.

override def start() {    backend.start()    if (!isLocal && conf.getBoolean("spark.speculation", false)) {      logInfo("Starting speculative execution thread")      import sc.env.actorSystem.dispatcher      sc.env.actorSystem.scheduler.schedule(SPECULATION_INTERVAL milliseconds,            SPECULATION_INTERVAL milliseconds) {        checkSpeculatableTasks()      }    }  }

Step 3: The tasksched instance created in the preceding step is created as an input parameter.DagschedulerAnd start running

@volatile private[spark] var dagScheduler = new DAGScheduler(taskScheduler)  dagScheduler.start()

Step 4: Start the Web UI

ui.start()

RDD conversion process

The simplest wordcount is used as an example to describe the conversion process of RDD.

sc.textFile("README.md").flatMap(line=>line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

The preceding line of short code actually involves a very complex RDD conversion. The following describes the conversion process and result of each step.

Step 1: Val rawfile = SC. textfile ("readme. md ")

Textfile first generates hadooprdd, and then generates mappedrdd through the map operation. If you execute the preceding statement in spark-shell, the result can prove the analysis.

scala> sc.textFile("README.md")14/04/23 13:11:48 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes14/04/23 13:11:48 INFO MemoryStore: ensureFreeSpace(119741) called with curMem=0, maxMem=31138775014/04/23 13:11:48 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 116.9 KB, free 296.8 MB)14/04/23 13:11:48 DEBUG BlockManager: Put block broadcast_0 locally took  277 ms14/04/23 13:11:48 DEBUG BlockManager: Put for block broadcast_0 without replication took  281 msres0: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at :13

Step 2: Val splittedtext = rawfile. flatmap (line => line. Split (""))

Flatmap converts the original mappedrddFlatmappedrdd

 def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] =                                                                                                  new FlatMappedRDD(this, sc.clean(f))

Step 3: Val wordcount = splittedtext. Map (WORD => (word, 1 ))

Use Word to generate corresponding key-value pairs. The flatmappedrdd in the previous step is converted to mappedrdd.

Step 4: Val performancejob = wordcount. performancebykey (_ + _), this step is the most complex

The Operation Used in step 2 and 3 is all defined in RDD. Scala, andReducebykeyBut not in RDD. Scala. The definition of cecebykey appears in the source file.Pairrddfunctions. Scala

Careful, you will certainly ask reducebykey not the mappedrdd attribute and method. How can it be called by mappedrdd? In fact, there is an implicit conversion behind this, which converts mappedrdd into pairrddfunctions

implicit def rddToPairRDDFunctions[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)]) =    new PairRDDFunctions(rdd)

This implicit conversion is a syntactic feature of scala. If you want to know more, use the keyword "Scala implicit method" for query. Many articles will detail this.

Next, let's take a look at the definition of performancebykey.

  def reduceByKey(func: (V, V) => V): RDD[(K, V)] = {    reduceByKey(defaultPartitioner(self), func)  }  def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)] = {    combineByKey[V]((v: V) => v, func, func, partitioner)  }  def combineByKey[C](createCombiner: V => C,      mergeValue: (C, V) => C,      mergeCombiners: (C, C) => C,      partitioner: Partitioner,      mapSideCombine: Boolean = true,      serializerClass: String = null): RDD[(K, C)] = {    if (getKeyClass().isArray) {      if (mapSideCombine) {        throw new SparkException("Cannot use map-side combining with array keys.")      }      if (partitioner.isInstanceOf[HashPartitioner]) {        throw new SparkException("Default partitioner cannot partition array keys.")      }    }    val aggregator = new Aggregator[K, V, C](createCombiner, mergeValue, mergeCombiners)    if (self.partitioner == Some(partitioner)) {      self.mapPartitionsWithContext((context, iter) => {        new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context))      }, preservesPartitioning = true)    } else if (mapSideCombine) {      val combined = self.mapPartitionsWithContext((context, iter) => {        aggregator.combineValuesByKey(iter, context)      }, preservesPartitioning = true)      val partitioned = new ShuffledRDD[K, C, (K, C)](combined, partitioner)        .setSerializer(serializerClass)      partitioned.mapPartitionsWithContext((context, iter) => {        new InterruptibleIterator(context, aggregator.combineCombinersByKey(iter, context))      }, preservesPartitioning = true)    } else {      // Don‘t apply map-side combiner.      val values = new ShuffledRDD[K, V, (K, V)](self, partitioner).setSerializer(serializerClass)      values.mapPartitionsWithContext((context, iter) => {        new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context))      }, preservesPartitioning = true)    }  }

Reducebykey will eventually call combinebykey. In this function, pairedrddfunctions will be convertedShufflerdd,After mappartitionswithcontext is called, shufflerdd is converted to mappartitionsrdd.

Log output can prove our analysis

res1: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[8] at reduceByKey at :13

RDD conversion Summary

Summary of the entire RDD conversion process

Hadooprdd-> mappedrdd-> flatmappedrdd-> mappedrdd-> pairrddfunctions-> shufflerdd-> mappartitionsrdd

The entire conversion process is long. All these conversions occur before the task is submitted.

Operation category of the dataset for running process analysis

Before analyzing the function call relationship during task running, we also discuss a biased theory. Why does transformantion act on RDD?

The answer to this question is related to mathematics. From the theoretical abstraction point of view, task processing can all be attributed to "input-> processing-> output ". Input and Output correspond to dataset.

Make a simple classification on this basis

One-one a dataset is still a dataset after conversion, and the size of the dataset remains unchanged, such as map
One-one dataset is a dataset after conversion, but the size is changed. There are two possible reasons for this change: expand or contract. For example, flatmap is an operation for increasing the size, while subtract is an operation with smaller size.
Merge-one multiple dataset are merged into one dataset, such as combine and join.
One-worker a dataset is split into multiple dataset, such as groupby

Function calls during Task Runtime

For more information about the task submission process, see the second article in this series. This section describes how to call a task step by step to each operation on the RDD during running.

Taskrunner. Run
- Task. Run
  - Task. runtask (a task is a base class and has two sub-classes: shufflemaptask and resulttask)
    - RDD. iterator
      - RDD. computeorreadcheckpoint
        
        RDD. compute

Maybe when we see the RDD. compute function definition, we still feel that F is not called. Take the compute definition of mappedrdd as an example.

  override def compute(split: Partition, context: TaskContext) =                                                                                                          firstParent[T].iterator(split, context).map(f)

Note: The map function is the most likely illusion here. The map here is not the map in RDD,It is the member function map of iterator defined in Scala., Please refer to the http://www.scala-lang.org/api/2.10.4/index.html#scala.collection.Iterator

Stack output

 80         at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:111) 81         at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:154) 82         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) 83         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) 84         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) 85         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) 86         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) 87         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) 88         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) 89         at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) 90         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) 91         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) 92         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) 93         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) 94         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) 95         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) 96         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) 97         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) 98         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) 99         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)100         at org.apache.spark.scheduler.Task.run(Task.scala:53)101         at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)

Resulttask

Compute's computation process is complex for shufflemaptask and involves many circles. It is much more direct for resulttask.

override def runTask(context: TaskContext): U = {    metrics = Some(context.taskMetrics)    try {      func(context, rdd.iterator(split, context))    } finally {      context.executeOnCompleteCallbacks()    }  }

Transfer of computing results

The analysis above shows that after the wordcount job is finally submitted, it is divided into two stages by dagscheduler. The first stage is shufflemaptask, and the second stage is resulttask.

Then how is the shufflemaptask computing result obtained by resulttask? The process is described as follows:

Shffulemaptask packs the computing status (not specific data) as mapstatus and returns it to dagscheduler.
Dagscheduler saves mapstatus to mapoutputtrackermaster.
When executing shufflerdd, resulttask calls the fetch method of blockstoreshufflefetcher to obtain data.
1. The first thing is to consult the location of the data that mapoutputtrackermaster wants.
2. Call blockmanager. getmultiple to obtain real data based on the returned results.

Fetch function pseudo code of blockstoreshufflefetcher

    val blockManager = SparkEnv.get.blockManager    val startTime = System.currentTimeMillis    val statuses = SparkEnv.get.mapOutputTracker.getServerStatuses(shuffleId, reduceId)    logDebug("Fetching map output location for shuffle %d, reduce %d took %d ms".format(      shuffleId, reduceId, System.currentTimeMillis - startTime))    val blockFetcherItr = blockManager.getMultiple(blocksByAddress, serializer)    val itr = blockFetcherItr.flatMap(unpackBlock)

Note thatGetserverstatusesAndGetmultipleOne is the location where the data is queried, and the other is to obtain the real data.

For a detailed description of shuffle, see "exploring the shuffle Implementation of spark in detail" http://jerryshao.me/architecture/2014/01/04/spark-shuffle-detail-investigation/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More