Spark Shuffle module--suffle Read Process analysis __spark

Source: Internet
Author: User
Tags deprecated shuffle

Before you read this article, read the spark Sort Based Shuffle Memory analysis

Spark Shuffle read Call stack is as follows:
1. Org.apache.spark.rdd.shuffledrdd#compute ()
2. Org.apache.spark.shuffle.shufflemanager#getreader ()
3. Org.apache.spark.shuffle.hash.hashshufflereader#read ()
4. Org.apache.spark.storage.shuffleblockfetcheriterator#initialize ()
5. Org.apache.spark.storage.shuffleblockfetcheriterator#splitlocalremoteblocks ()
Org.apache.spark.storage.shuffleblockfetcheriterator#sendrequest ()
Org.apache.spark.storage.shuffleblockfetcheriterator#fetchlocalblocks ()

The following are the classes and corresponding methods that are involved in the execution of the Fetchlocalblocks () method:
6. Org.apache.spark.storage.blockmanager#getblockdata ()
Org.apache.spark.shuffle.hash.shufflemanager#shuffleblockresolver ()
Shufflemanager has two subclasses, and if it is hashshuffle, it corresponds to Org.apache.spark.shuffle.hash.hashshufflemanager#shuffleblockresolver () method, which returns the Org.apache.spark.shuffle.FileShuffleBlockResolver, and then calls the Fileshuffleblockresolver#getblockdata () method returns the Block data
; If it is sort Shuffle, it corresponds to the
Org.apache.spark.shuffle.hash.sortshufflemanager#shuffleblockresolver (), The method returns the Org.apache.spark.shuffle.IndexShuffleBlockResolver and then calls Indexshuffleblockresolver#getblockdata () Returns the block data.

The following are the classes and corresponding methods involved in the execution of the Org.apache.spark.storage.shuffleblockfetcheriterator#sendrequest () method
7.

Org.apache.spark.network.shuffle.shuffleclient#fetchblocks
Org.apache.spark.network.shuffle.ShuffleClient has two subclasses, namely Externalshuffleclient and Blocktransferservice
, of which Org.apache.spark.network.shuffle.BlockTransferService has two subclasses, respectively, Nettyblocktransferservice and Nioblocktransferservice, corresponding to two kinds of not With remote access to block data, the Nioblocktransferservice method has been set to deprecated in Spark 1.5.2 and will be removed in subsequent versions

Following the above call stack to explain each method, here only the context, the details of the next discussion Shuffledrdd#compute () code

When task executes, the Shuffledrdd compute method is invoked with the following code:

Org.apache.spark.rdd.shuffledrdd#compute ()
override Def compute (split:partition, Context:taskcontext): iterator[(K, C)] = {
    val dep = dependencies.head.asinstanceof[shuffledependency[k, V, C]]
    // Using the Org.apache.spark.shuffle.shufflemanager#getreader () method/
    /Either the sort shuffle or the Hash shuffle, the use is
    // Org.apache.spark.shuffle.hash.HashShuffleReader
    SparkEnv.get.shuffleManager.getReader (Dep.shufflehandle, Split.index, Split.index + 1, context)
      . Read ()
      . asinstanceof[iterator[(K, C)]]
  }

As you can see, its core logic is to get the Hashshufflereader object by calling the Shufflemanager#getreader () method, and then call Hashshufflereader#read () Method completes the read of the shuffle data generated by Shufflemaptask in the previous stage. It should be explained that whether the hash Shuffle or sort Shuffle, the use is Hashshufflereader. Hashshufflereader#read ()

To jump to the Hashshufflereader#read () method, the source code is as follows:

/** read the combined key-values for this reduce task/override Def Read (): Iterator[product2[k, C]] = {//create Shuff Leblockfetcheriterator object, the Initialize () method is called in its constructor, and Splitlocalremoteblocks () is executed in the method, which determines the read policy for the data//
      Remote Data call SendRequest () method read//Local Data Call Fetchlocalblocks () method reads Val blockfetcheritr = new Shuffleblockfetcheriterator ( Context, Blockmanager.shuffleclient, Blockmanager, Mapoutputtracker.getmapsizesbyexecutorid (Handle.shu Ffleid, startpartition),//note:we use GETSIZEASMB as no suffix is provided for backwards compatibility Sp ArkEnv.get.conf.getSizeAsMb ("Spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024)//Wrap the streams for Compressio N Based on configuration val wrappedstreams = blockfetcheritr.map {case (Blockid, InputStream) => Blockmanag Er.wrapforcompression (Blockid, InputStream)} val ser = Serializer.getserializer (dep.serializer) Val Serializ Erinstance = Ser.newinstance ()//CreAte a key/value iterator for each stream val Recorditer = wrappedstreams.flatmap {wrappedstream =>//NOTE: The askeyvalueiterator below wraps a key/value iterator inside of a//Nextiterator. 
      The Nextiterator makes sure that close () was called on the//underlying inputstream when all records have been read. Serializerinstance.deserializestream (Wrappedstream). Askeyvalueiterator}//Update the context task metric
    s for each record read. Val readmetrics = Context.taskMetrics.createShuffleReadMetricsForDependency () val metriciter = completioniterator[(an
      Y, any), iterator[(Any, any)]] (Recorditer.map (the record => {readmetrics.increcordsread (1) record }), Context.taskmetrics (). Updateshufflereadmetrics ())//An interruptible iterator must is used here or

    Der to support task cancellation val interruptibleiter = new interruptibleiterator[(Any, no)] (context, Metriciter) Val Aggregatediter:iteraTor[product2[k, C]] = if (dep.aggregator.isDefined) {if (dep.mapsidecombine) {//Read data VA aggregated on map end L Combinedkeyvaluesiterator = interruptibleiter.asinstanceof[iterator[(K, C)]] Dep.aggregator.get.combineCombiners Bykey (Combinedkeyvaluesiterator, context)} else {//read data from reducer-side aggregation val keyvaluesiterator = Interru 
      ptibleiter.asinstanceof[iterator[(K, Nothing)]] Dep.aggregator.get.combineValuesByKey (keyvaluesiterator, context)
      } else {require (!dep.mapsidecombine, "map-side combine without aggregator specified!") Interruptibleiter.asinstanceof[iterator[product2[k, C]]]}//Sort out the results of the output dep.keyordering match {case so Me (Keyord:ordering[k]) =>//Create a externalsorter to sort the data.
        Note this if Spark.shuffle.spill is disabled,//The Externalsorter won ' t spill to disk. Val Sorter = new Externalsorter[k, C, c] (ordering = Some (Keyord), serializer = Some (Ser))
        Sorter.insertall (Aggregatediter) context.taskmetrics (). incmemorybytesspilled (sorter.memorybytesspilled)
          Context.taskmetrics (). incdiskbytesspilled (sorter.diskbytesspilled) context.internalmetricstoaccumulators ( internalaccumulator.peak_execution_memory). Add (sorter.peakmemoryusedbytes) sorter.iterator case Non e => Aggregatediter}}
shuffleblockfetcheriterator#splitlocalremoteblocks ()

The Splitlocalremoteblocks () method determines the read policy for the data, and the Localblocks variable is recorded on the local machine's blockid,remoteblocks variable to record all blockid on the remote machine. The remote data block is split into the largest maxsizeinflight size fetchrequests

Val remoterequests = new Arraybuffer[fetchrequest]

The Splitlocalremoteblocks () method has the following source code:

Private[this] def splitlocalremoteblocks (): arraybuffer[fetchrequest] = {//make remote requests at most MAXBYTESINFL IGHT/5 in length; The reason to keep them//smaller than maxbytesinflight are to allow multiple, parallel fetches from up to 5//No
    Des, rather than blocking on reading output from one node. Maxbytesinflight is the maximum amount of data per request, the default value is 48M//via SparkEnv.get.conf.getSizeAsMb ("Spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024) to set Val targetrequestsize = Math.max (MAXBYTESINFLIGHT/5, 1L) logdebug ("Maxbytesinflight:" + maxbyt Esinflight + ", targetrequestsize:" + targetrequestsize)//Split local and remote blocks.  Remote blocks are further split into fetchrequests of size//at most maxbytesinflight in order to limit the amount of
    Data in flight.
    Val remoterequests = new Arraybuffer[fetchrequest]//tracks total number of blocks (including zero sized blocks) var totalblocks = 0 for (address, Blockinfos) <-BlocksbyaddreSS) {totalblocks + = blockinfos.size//data to get in local if (Address.executorid = = BLOCKMANAGER.BLOCKMANAGERID.E Xecutorid) {//Filter out zero-sized blocks//Record data in local Blockid localblocks ++= (_ . _2!= 0). Map (_._1) Numblockstofetch + = Localblocks.size} else {//data not native Val iterator = bl ockinfos.iterator var currequestsize = 0L var curblocks = new arraybuffer[(Blockid, Long) while (  Iterator.hasnext) {val (blockid, size) = Iterator.next ()//Skip empty blocks if (Size >
            0) {curblocks + = ((blockid, size)//record data on remote machine blockid remoteblocks + = Blockid  Numblockstofetch + = 1 Currequestsize + = size} else if (size < 0) {throw new Blockexception (Blockid, "Negative block size" + size)} if (Currequestsize >= targetrequestsize) {//Add this fetchrequest remoterequests + = new Fetchrequest (address, curblocks) curblocks = new arraybuffer[
            (Blockid, Long)]
        Logdebug (S "Creating request of $curRequestSize at $address") currequestsize = 0}} Add in the final request if (Curblocks.nonempty) {remoterequests + + new Fetchrequest (address,
    curblocks)}} loginfo (S "Getting $numBlocksToFetch non-empty blocks out of $totalBlocks blocks") Remoterequests}
shuffleblockfetcheriterator#fetchlocalblocks ()

The Fetchlocalblocks () method reads the local block and invokes the Blockmanager Getblockdata method with the following source code:

Private[this] def fetchlocalblocks () {
    val iter = Localblocks.iterator while
    (Iter.hasnext) {
      val blockid = i Ter.next ()
      try {
        //Call Blockmanager Getblockdata method
        val buf = Blockmanager.getblockdata (blockid)
        Shufflemetrics.inclocalblocksfetched (1)
        shufflemetrics.inclocalbytesread (buf.size)
        buf.retain ()
        Results.put (New Successfetchresult (Blockid, Blockmanager.blockmanagerid, 0, buf))
      } catch {case
        e: Exception =>
          //If We are Exception, stop immediately.
          Logerror (S "Error occurred while fetching local blocks", e)
          results.put (New Failurefetchresult (Blockid, Blockmanager.blockmanagerid, E))
          return
      }
    }
  

To jump to Blockmanager's Getblockdata method, you can see its source code as follows:

Override Def Getblockdata (blockid:blockid): Managedbuffer = {
          if (blockid.isshuffle) {   
// The first call is the Shufflemanager Shuffleblockresolver method, get Shuffleblockresolver
//And then call its Getblockdata method   ShuffleManager.shuffleBlockResolver.getBlockData (Blockid.asinstanceof[shuffleblockid])
          } else {
            val blockbytesopt = dogetlocal (Blockid, Asblockresult = False)
              . Asinstanceof[option[bytebuffer]]
            if ( blockbytesopt.isdefined) {
              val buffer = blockbytesopt.get
        new Niomanagedbuffer (buffer)
      } else {
        throw new Blocknotfoundexception (Blockid.tostring)}}}
  

The

Org.apache.spark.shuffle.hash.shufflemanager#shuffleblockresolver () method gets the corresponding shuffleblockresolver, if it is a hash Shuffle, then
is Org.apache.spark.shuffle.FileShuffleBlockResolver, and if it is sort Shuffle is org.apache.spark.shuffle.IndexShuffleBlockResolver. Then the corresponding Shuffleblockresolver Getblockdata method is called, and the corresponding Filesegment is returned. The
Fileshuffleblockresolver#getblockdata method source code is as follows:

override def getblockdata (blockid:shuffleblockid): Managedbuffer = {//corresponding to shuffle in hash shuff Le consolidate files mechanism generated file if (consolidateshufflefiles) {//Search all file groups associated with this Shuff
      Le. Val shufflestate = shufflestates (Blockid.shuffleid) val iter = ShuffleState.allFileGroups.iterator while (ITER . hasnext) {val segmentopt = Iter.next.getFileSegmentFor (Blockid.mapid, Blockid.reduceid) if (segmentopt.i sdefined) {val segment = Segmentopt.get return new Filesegmentmanagedbuffer (Transportcon F, Segment.file, Segment.offset, Segment.length)} throw new IllegalStateException ("Failed to find S Huffle Block: "+ blockid)} else {//normal hash shuffle mechanism generated file Val file = BlockManager.diskBlockManager.getF Ile (Blockid) New Filesegmentmanagedbuffer (transportconf, file, 0, File.length)}} 

Indexshuffleblockresolver#getblockdata method source code is as follows:

Override Def Getblockdata (blockid:shuffleblockid): Managedbuffer = {//the ' block is actually going ' to be
    a range of A single map output file for this map, so
    //Find out the consolidated file, then the offset within which from our Inde x
    //using Shuffleid and MapId, get the corresponding index file
    val indexfile = Getindexfile (Blockid.shuffleid, Blockid.mapid)

    val in = New DataInputStream (New FileInputStream (indexfile))
    try {
      //Navigate to the data location for this block
      bytestreams.skipfully ( In, Blockid.reduceid * 8)
      //Data start position
      val offset = in.readlong ()
      //Data End position
      val nextoffset = In.readlong ()
      //Return
      to Filesegment new Filesegmentmanagedbuffer (
        transportconf
        , Getdatafile (Blockid.shuffleid , blockid.mapid),
        offset,
        nextoffset-offset)
    } finally {
      in.close ()
    }
  }
shuffleblockfetcheriterator#sendrequest ()

The

SendRequest () method is used to get data from a remote machine

 Private[this] def sendrequest (req:fetchrequest) {logdebug ("Sending request for%d blocks (%s) from%s". Format ( Req.blocks.size, Utils.bytestostring (req.size), req.address.hostPort) Bytesinflight + = req.size//So we can L Ook up the size of all blockid val Sizemap = req.blocks.map {case (blockid, size) => (blockid.tostring, size)}.t OMap val blockids = Req.blocks.map (_._1.tostring) Val address = req.address//Use Shuffleclient Fetchblocks method to obtain Data//There are two types of shuffleclient, namely externalshuffleclient and Blocktransferservice//default to Blocktransferservice Shuffleclient.fe Tchblocks (Address.host, Address.port, Address.executorid, Blockids.toarray, new Blockfetchinglistener {over Ride def onblockfetchsuccess (Blockid:string, buf:managedbuffer): unit = {//Only add of the buffer to results qu
          Eue If the iterator isn't zombie,//i.e. cleanup () has not been called. if (!iszombie) {//Increment the REF count because we need to a different thread.
            This needs is released after use.
            Buf.retain () Results.put (New Successfetchresult (Blockid (blockid), Address, Sizemap (blockid), buf)) Shufflemetrics.incremotebytesread (Buf.size) shufflemetrics.incremoteblocksfetched (1)} Lo Gtrace ("Got remote block" + Blockid + "after" + Utils.getusedtimems (starttime))} Override Def Onblock Fetchfailure (blockid:string, e:throwable): unit = {Logerror (S "Failed to get blocks (s) from ${req.address.host}
    : ${req.address.port} ", E) results.put (New Failurefetchresult (Blockid (blockid), Address, E)}}
 )
  }

As you can see from the code above, The code uses Shuffleclient.fetchblocks for remote block data acquisition, Org.apache.spark.network.shuffle.ShuffleClient has two subclasses, respectively, Externalshuffleclien T and Blocktransferservice, And Org.apache.spark.network.shuffle.BlockTransferService has two subclasses, respectively, Nettyblocktransferservice and Nioblocktransferservice,shuffle. The Client object is defined in Org.apache.spark.storage.BlockManager, and its source code is as follows:

Shuffleclient 
 Private[spark] val shuffleclient = if (Org.apache.spark.storage.BlockManager) defined in the externalshuffleserviceenabled) {
    //use externalshuffleclient to get remote block data
    val transconf = sparktransportconf.fromsparkconf (conf, numusablecores)
    new Externalshuffleclient (transconf, SecurityManager, Securitymanager.isauthenticationenabled (),
      securitymanager.issaslencryptionenabled ())
  } else {
    // Get remote block data
    Blocktransferservice} using Nettyblocktransferservice or
  nioblocktransferservice

The Blocktransferservice in the code is initialized in Sparkenv, as follows:

Blocktransferservice val blocktransferservice = conf.get 
//org.apache.spark.sparkenv ( "Spark.shuffle.blockTransferService", "Netty"). toLowerCase Match {case "Netty" => new Nettyblocktran  Sferservice (conf, SecurityManager, numusablecores) case "NiO" => logwarning ("nio-based block transfer
          Service is deprecated, "+" and would be removed in Spark 1.6.0. ") New Nioblocktransferservice (conf, SecurityManager)} 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.