9.Shuffle read-Write source analysis

Source: Internet
Author: User
Tags shuffle
Let's go directly to the schematic diagram! After the data is calculated, a bucket cache is created for each resulttask, and the corresponding Shuffleblockfile disk file is stored, and the Shufflemaptask information is placed in Mapstatus after the calculation is completed. Finally sent to the mapoutputtracker of the Dagscheduler in driver, Each resulttask uses Blockstoreshufflefetcher to mapoutputtracker the mapstatus to fetch the data that needs to be pulled, and then pulls the data through the lower Blockmanager. Pull over the data will be composed of an internal rdd, called Shufflerdd, cache, cache is not enough to disk, and finally resultmap the aggregation of data generation Mappartitionrdd, that is, we write the program in action after the result Rdd

Optimized shuffle analysis schematic diagram:




The

Optimized shuffle principle is that the number of CPUs written to a disk file in Shufflemap will only be created on the CPU's corresponding file data, and the following will only be written to the same file when the new Shufflemaptask is run. The index is also recorded to record the position of the Shufflemaptask computed data in the Shuffleblockfile, and the data written by multiple shufflemaptask is called a segment, That is to say, the original 100 shufflemaptask corresponding to the 100 resultask will create 100*100 disk files, and now only need the number of CPUs multiplied by the number of Resultmap file number, reducing the number of disk file read and write, The way to optimize the shuffle is simply to set a parameter when creating the Sparkcontext
in the last chapter on the source code analysis of the task writer:
Writer. write (Rdd. Iterator (PA Rtition, context). asinstanceof [Iterator [_]: Product2 [Any, any]]

In fact this writer default case is Haspshufflewriter, invoke writer method source code such as Under:

       
       
        
        /** Write a bunch of records to this task ' s output * * */** * Write new RDD partition data from each shufflemaptask to local Disk */override def write (reco Rds:iterator[_: Product2[k, V]]: unit = {//First of all, whether you need to aggregate locally on the map//RE Ducebykey such operator operation, its dep.aggregator.isDegined is true, including Def.mapsidecombine is true val iter = if (dep.ag gregator.isdefined) {if (dep.mapsidecombine) {///The local aggregation is performed here, for example (hi,1) (Hi,
        
              1) Then it will be aggregated into (hi,2) Dep.aggregator.get.combineValuesByKey (records, context) else {Records}} else {re
        
              Quire (!dep.mapsidecombine, "map-side combine without aggregator")
        
      Records}  
        
            If a local aggregation is performed, the data is traversed, and the partition default is hashpartition for each of the data, generating bucketid//and determining each A copy of the data to be written to which bucket for (Elem <-iter) {val bucketID = Dep.partitioner.getPart Ition (elem._1)//Gets the bucketID and then invokes the Shuffleblockmanager.formaptask () method to generate bucketID corresponding writer, and then use the Write
       
        R writes data to bucket shuffle.writers (bucketID). Write (Elem)}}

Here the shuffle is a member of the Hushshufflewriter variable, through the Shuffleblockmanager object Formaptask method to obtain each bucketid corresponding writer, formaptask method source code is as follows :
       
       
        
        /**
        
           * To each map task to obtain a
        
           shufflewritergroup
        
          /def formaptask (Shuffleid:int, Mapid:int, Numbuckets:int, Serializer:serializer,
        
        

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.