9.Shuffle read-Write source analysis

Last Update:2018-07-27 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Let's go directly to the schematic diagram! After the data is calculated, a bucket cache is created for each resulttask, and the corresponding Shuffleblockfile disk file is stored, and the Shufflemaptask information is placed in Mapstatus after the calculation is completed. Finally sent to the mapoutputtracker of the Dagscheduler in driver, Each resulttask uses Blockstoreshufflefetcher to mapoutputtracker the mapstatus to fetch the data that needs to be pulled, and then pulls the data through the lower Blockmanager. Pull over the data will be composed of an internal rdd, called Shufflerdd, cache, cache is not enough to disk, and finally resultmap the aggregation of data generation Mappartitionrdd, that is, we write the program in action after the result Rdd

Optimized shuffle analysis schematic diagram:

The

Optimized shuffle principle is that the number of CPUs written to a disk file in Shufflemap will only be created on the CPU's corresponding file data, and the following will only be written to the same file when the new Shufflemaptask is run. The index is also recorded to record the position of the Shufflemaptask computed data in the Shuffleblockfile, and the data written by multiple shufflemaptask is called a segment, That is to say, the original 100 shufflemaptask corresponding to the 100 resultask will create 100*100 disk files, and now only need the number of CPUs multiplied by the number of Resultmap file number, reducing the number of disk file read and write, The way to optimize the shuffle is simply to set a parameter when creating the Sparkcontext
in the last chapter on the source code analysis of the task writer:
Writer. write (Rdd. Iterator (PA Rtition, context). asinstanceof [Iterator [_]: Product2 [Any, any]]

In fact this writer default case is Haspshufflewriter, invoke writer method source code such as Under:

       
       
        
        /** Write a bunch of records to this task ' s output * * */** * Write new RDD partition data from each shufflemaptask to local Disk */override def write (reco Rds:iterator[_: Product2[k, V]]: unit = {//First of all, whether you need to aggregate locally on the map//RE Ducebykey such operator operation, its dep.aggregator.isDegined is true, including Def.mapsidecombine is true val iter = if (dep.ag gregator.isdefined) {if (dep.mapsidecombine) {///The local aggregation is performed here, for example (hi,1) (Hi,
        
              1) Then it will be aggregated into (hi,2) Dep.aggregator.get.combineValuesByKey (records, context) else {Records}} else {re
        
              Quire (!dep.mapsidecombine, "map-side combine without aggregator")
        
      Records}  
        
            If a local aggregation is performed, the data is traversed, and the partition default is hashpartition for each of the data, generating bucketid//and determining each A copy of the data to be written to which bucket for (Elem <-iter) {val bucketID = Dep.partitioner.getPart Ition (elem._1)//Gets the bucketID and then invokes the Shuffleblockmanager.formaptask () method to generate bucketID corresponding writer, and then use the Write
       
        R writes data to bucket shuffle.writers (bucketID). Write (Elem)}}

Here the shuffle is a member of the Hushshufflewriter variable, through the Shuffleblockmanager object Formaptask method to obtain each bucketid corresponding writer, formaptask method source code is as follows :

       
       
        
        /**
        
           * To each map task to obtain a
        
           shufflewritergroup
        
          /def formaptask (Shuffleid:int, Mapid:int, Numbuckets:int, Serializer:serializer,

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More