/** Spark SQL Source Code Analysis series Article */
Next article spark SQL Catalyst Source Code Analysis physical Plan. This article describes the detailed implementation details of the physical plan Tordd:
We all know a SQL, the real run is when you call it the Collect () method will run the spark Job, and finally calculate the RDD.
Lazy val Tordd:rdd[row] = Executedplan.execute ()
The Spark plan basically consists of 4 types of operations, the Basicoperator basic type, and somewhat more complex such as join, aggregate, and sort.
The general meaning of BasicOperator1.1, Project project is to pass in a series of expressions Seq[namedexpression], given the input row. After the CONVERT (expression's computed eval) operation. A new row is generated. The implementation of project is to call its Child.execute () method, and then call Mappartitions to operate on each of the partition.
This f-function is actually a new mutableprojection, and then loops through each partition to convert.
Case Class Project (Projectlist:seq[namedexpression], Child:sparkplan) extends Unarynode { override def output = proj Ectlist.map (_.toattribute) override def execute () = Child.execute (). mappartitions {iter =//F mapping for each partition @ Transient val reusableprojection = new Mutableprojection (projectlist) Iter.map (reusableprojection)} }
By observing the definition of mutableprojection, it is possible to discover. Is the process of bind references to a schema and eval: Converts a row to a row with a schema column already defined.
Assuming that the input row already has a schema, the incoming seq[expression] will be bound to the current schema.
Case Class Mutableprojection (Expressions:seq[expression]) extends (row = row) { def this (expressions:seq[ Expression], inputschema:seq[attribute]) = This (Expressions.map (Bindreferences.bindreference (_, Inputschema)) )//bound schema Private[this] val exprarray = Expressions.toarray Private[this] val mutablerow = new Genericmuta Blerow (exprarray.size)//New row def currentvalue:row = Mutablerow def apply (input:row): row = { var i = 0
while (I < exprarray.length) { Mutablerow (i) = Exprarray (i). Eval (input) ///based on input, or a row, to compute the generated row i + = 1 } Mutablerow//Return to New row }}
1.2. The detailed implementation of the filter filter is an eval calculation of input row by the incoming condition. Finally, a Boolean type is returned, assuming that the expression evaluates successfully. Returns true, the data for this partition will be saved, otherwise it will be filtered out.
Case Class Filter (Condition:expression, Child:sparkplan) extends Unarynode { override def output = Child.output Override Def execute () = Child.execute (). mappartitions {iter = Iter.filter (Condition.eval (_). asinstanceof[ Boolean]//Evaluate expression eval (input row) }}
1.3, sample sample operation is actually called the result of Child.execute (), the return is an RDD, the RDD call its sample function, the native method.
Case Class Sample (fraction:double, Withreplacement:boolean, Seed:long, Child:sparkplan) extends unarynode{ Override def output = Child.output //todo:how to pick seed? Override Def execute () = Child.execute (). Sample (Withreplacement, fraction, Seed)}
1.4. The Union union Operation supports the union of multiple subqueries, so the incoming child is a Seq[sparkplan] execute () method whose implementation is for all of its children, each of which is execute (). That is, the result collection of the Select query Rdd.
By calling the Sparkcontext Union method. Merges the results of all subqueries.
Case Class Union (Children:seq[sparkplan]) (@transient sqlcontext:sqlcontext) extends Sparkplan { //TODO: Attributes output by Union should is distinct for nullability purposes override def output = Children.head.output Override Def execute () = SqlContext.sparkContext.union (Children.map (_.execute ()))//Sub-query Results Union override Def Othercopyargs = SqlContext:: Nil}
1.5. Limit limit operation is also available in the native API of the RDD. That is take (). However, the implementation of limit is divided into 2 cases: the first is the limit as the end of the operator, that is, select xxx from yyy limit zzz. and is called by Executecollect, the Take method is used directly in the driver. The other is that limit is not the end operator. If there is a query behind the limit, then the limit is called in each partition, and then the repartition to a partition to calculate the global limit.
Case Class Limit (Limit:int, Child:sparkplan) (@transient sqlcontext:sqlcontext) extends Unarynode {//Todo:implement A partition local limit, and use a strategy to generate the proper limit plan://partition local limit, Exchange I Nto one partition-partition local limit again override def Othercopyargs = SqlContext:: Nil override def output = Child.output override Def executecollect () = Child.execute (). Map (_.copy ()). Take (limit)//call take override Def E directly in the driver Xecute () = {val Rdd = Child.execute (). mappartitions {iter = val Mutablepair = new Mutablepair[boolean, Row] ( ) Iter.take (limit). map (row = Mutablepair.update (false, Row))//Each partition calculates the limit} val part = new Hashpartition ER (1) Val shuffled = new Shuffledrdd[boolean, row, row, Mutablepair[boolean, Row]] (RDD, part)//need shuffle, to repartition Shuffled.setserializer (Sparksqlserializer (new sparkconf (false))) Shuffled.mappartitions (_.take (limit). Map (_._ 2)///finally a single partition to takeLimit}}
1.6. Takeordered takeordered is a sorted limit N, which is usually the limit after the sort by operator. Can be simply understood as the TOPN operator.
Case Class takeordered (Limit:int, Sortorder:seq[sortorder], Child:sparkplan) (@transient sqlcontext:sqlcontext) Extends Unarynode { override def Othercopyargs = SqlContext:: Nil override def output = Child.output @transie NT lazy val ordering = new Rowordering (sortOrder)//Here is a sort by rowordering to achieve the override Def executecollect () = Child . Execute (). Map (_.copy ()). takeordered (limit) (ordering) //Todo:terminal split should be implemented differently From Non-terminal split. Todo:pick Num splits based on |limit|. Override Def execute () = SqlContext.sparkContext.makeRDD (Executecollect (), 1)}
1.7, sort sort is also by rowordering this class to achieve sorting, child.execute () map each partition, each partition according to rowordering order to order, to generate a new ordered set.
It is also done by invoking the sorted method of the spark Rdd.
Case Class Sort ( Sortorder:seq[sortorder], Global:boolean, Child:sparkplan) extends Unarynode { override def requiredchilddistribution = if (global) ordereddistribution (SortOrder):: Nil Else Unspecifieddistribution:: Nil @transient lazy val ordering = new Rowordering (sortOrder)//Sort order override def execute () = Attachtree (this, "sort") { //todo:optimize sorting operation? Child.execute () . mappartitions ( iterator = Iterator.map (_.copy ()). toarray.sorted (Ordering). Iterator,//each partition calls the sorted method, passing in <span style= "font-family:arial, Helvetica, Sans-serif;" >ordering sorting rules, sorting </span> preservespartitioning = True) } override def output = Child.output }
1.8, Existingrddexistingrdd is
Object Existingrdd {def converttocatalyst (a:any): any = a match {case o:option[_] + o.ornull case S:seq[an Y] = S.map (converttocatalyst) Case p:product = new Genericrow (P.productiterator.map (converttocatalyst). ToArra Y) case, other, and other} def producttorowrdd[a <: Product] (Data:rdd[a]): Rdd[row] = {data.mappartitions { iterator = if (iterator.isempty) {Iterator.empty} else {val bufferediterator = Iterator.buf fered val mutablerow = new Genericmutablerow (bufferedIterator.head.productArity) bufferediterator.map {R =& Gt var i = 0 while (i < mutablerow.length) {Mutablerow (i) = Converttocatalyst (R.productelement (i)) i + = 1} Mutablerow}}} def fromproductrdd[a <: Product:typetag] (prod Uctrdd:rdd[a]) = {Existingrdd (Scalareflection.attributesfor[a], Producttorowrdd (Productrdd))}}
Second, join related Operators Hashjoin: Before the narration join related operator. It is necessary to know the trait in the Joins.scala file that is located under the execution package hashjoin.
Join operations mainly include
Broadcasthashjoin、
Leftsemijoinhash、
ShuffledhashjoinAll realized the Hashjoin this trait. The main class diagram is for example the following:
HashjoinThe main members of this trait are: Buildside is the left or right connection, there is a reference to the meaning. Leftkeys is the expressions of the left child, Rightkeys is the expressions of the right child. Left is the child's physical plan, right is the child's physical plan.
Buildsidekeygenerator is a projection that calculates the expression of buildside based on the incoming row object. Streamsidekeygenerator is a mutableprojection that calculates the expression of streamside based on the incoming row object. Here the Buildside hypothesis is left, can be understood as buildside is the right table, then to join the left table is streamside.
Hashjoin The key operation is joiniterators. In a nutshell, join two tables. Look at each of the tables Iterators[row]. Way: 1, first traverse buildside, calculate Buildkeys then use a hashmap, form (Buildkeys, Iterators[row]) format. 2, Traverse streamedside. Calculate Streamedkey, go to hashmap inside to match key, to join 3, finally generate a joinrow, this will 2 row docking.
See Code gaze:
Trait Hashjoin {val leftkeys:seq[expression] Val Rightkeys:seq[expression] Val buildside:buildside val Left:spark Plan Val Right:sparkplan lazy Val (Buildplan, Streamedplan) = buildside Match {//pattern match, Physical plan package is formed Tuple2, assuming Bui Ldleft. So that is (left,right), otherwise (right,left) case buildleft = (left, right) case buildright = = (right, left)} lazy Val (Buildkeys, Streamedkeys) = buildside Match {//pattern match, encapsulating expression <span style= "font-family:arial, Helvetica, sans- serif; " >Tuple2</span> Case buildleft = (Leftkeys, rightkeys) Case buildright = (Rightkeys, Leftkeys)} D EF output = left.output + right.output @transient lazy val buildsidekeygenerator = new Projection (Buildkeys, BUILDPLAN.O Utput)//Generate Buildsidekey to calculate row return results based on expression @transient lazy val streamsidekeygenerator =//<span style= " Font-family:arial, Helvetica, Sans-serif; > Generate </span><span style= "font-family:arial, Helvetica, Sans-serif;" >streamsidekeygeneraTor</span><span style= "font-family:arial, Helvetica, Sans-serif;" > To calculate row return results based on expression </span> () = new Mutableprojection (Streamedkeys, streamedplan.output) def Joinit Erators (Builditer:iterator[row], Streamiter:iterator[row]): iterator[row] = {//Iterator[row of the build table] and Streamiterator[row] JOIN operation returned after join Iterator[row]//Todo:use Spark ' s HASHMAP implementation. Val hashTable = new Java.util.hashmap[row, Arraybuffer[row]] ()//Match main use HashMap to implement var currentrow:row = null//creat e a mapping of Buildkeys and rows while (Builditer.hasnext) {///Now only iterates over build iterator, forming rowkey,rows, similar wordcount, but This is not a cumulative value, but a collection of row.
CurrentRow = Builditer.next () val RowKey = Buildsidekeygenerator (currentrow)//Calculate RowKey as HashMap key if (!r Owkey.anynull) {val existingmatchlist = Hashtable.get (RowKey) Val matchlist = if (existingmatchlist = = NULL {val newmatchlist = new Arraybuffer[row] () hashtable.put (RowKey, Newmatchlist)//(RowKey, Matchedrowl IST) Newmatchlist} else {existingmatchlist} matchlist + = Currentrow.copy ()//return M Atchlist}} New Iterator[row] {//Last use Streamedrow key to match buildside end of HashMap Private[this] Var currentstream Edrow:row = _ Private[this] var Currenthashmatches:arraybuffer[row] = _ Private[this] Var currentmatchposition : Int =-1//Mutable per row objects. Private[this] Val joinrow = new Joinedrow Private[this] val joinkeys = Streamsidekeygenerator () override final D EF Hasnext:boolean = (currentmatchposition! =-1 && currentmatchposition < curRenthashmatches.size) | | (Streamiter.hasnext && Fetchnext ()) override final Def next () = {val ret = buildside Match { Case buildright = Joinrow (Currentstreamedrow, Currenthashmatches (currentmatchposition))//right-connected, Streamedrow left. Match the row of key to the right table case buildleft = Joinrow (Currenthashmatches (currentmatchposition), currentstreamedrow)//LEFT JOIN connected Words, on the contrary.
} currentmatchposition + = 1 ret}/** * Searches the streamed iterator for the next row Have at least one match in Hashtable. * * @return True if the search is successful, and false if the streamed iterator runs out of * tuples. */Private Final def fetchnext (): Boolean = {currenthashmatches = null currentmatchposition =- 1 while (currenthashmatches = = null && streamiter.hasnext) {Currentstreamedrow = Streamiter.next ( ) if (!joinkeys (Currentstreamedrow). anynull) {currenthashmatches = Hashtable.get (Joinkeys.currentvalu e)//streamedrow from the Hashtable inside the Buildside to match Rowkey}} if (currenthashmatches = = null) {Fals e} else {currentmatchposition = 0 True}}}}
The implementation of the Joinrow achieves 2 row docking: It actually generates a new array, merging 2 arrays.
Class Joinedrow extends Row { Private[this] var row1:row = _ private[this] var row2:row = _ ... def copy () = { val totalsize = row1.size + row2.size val copiedvalues = new Array[any] (totalsize) var i = 0
while (I < totalsize) { copiedvalues (i) = apply (i) i + = 1 } new Genericrow (copiedvalues)// Returns a new merged row }
2.1, Leftsemijoinhash left semi join, not much said. Hive earlier version number replacedIn and existsThe version number. Place the join keys of the right table in the HashSet. The left table is then traversed to find out if the join key for the left table matches.
Case Class Leftsemijoinhash (Leftkeys:seq[expression], rightkeys:seq[expression], Left:sparkplan, RIGHT:SP Arkplan) extends Binarynode with Hashjoin {val buildside = Buildright//buildside is the right table as the benchmark override Def Requiredchilddist Ribution = Clustereddistribution (Leftkeys):: Clustereddistribution (Rightkeys):: Nil override def output = Left.outpu T def execute () = {Buildplan.execute (). Zippartitions (Streamedplan.execute ()) {(Builditer, streamiter) +//The physical meter of the right table After running, generate the RDD and use zippartitions to merge the partition. It is then implemented using the above method. Val hashSet = new Java.util.hashset[row] () var currentrow:row = null//Create a Hash set of Buildkeys whi Le (builditer.hasnext) {currentrow = Builditer.next () val RowKey = Buildsidekeygenerator (CurrentRow) if (!rowkey.anynull) {val keyexists = hashset.contains (RowKey) if (!keyexists) {Hashset.add (RowKey)} }} val Joinkeys = Streamsidekeygenerator () streAmiter.filter (current = {!joinkeys (current). Anynull && hashset.contains (joinkeys.currentvalue)}) } }}
2.2, Broadcasthashjoin name about: Broadcast Hashjoin, hehe. is the realization of innerhashjoin. This uses the concurrent concurrency in the future, asynchronous broadcast Buildplan the table after the run of the RDD.
Assuming you have received the broadcast table, use Streamedplan to match the broadcast table.
The implementation is the result of the mappartitions of the RDD and the final generation of the join in the Hashjoin joiniterators.
Case Class Broadcasthashjoin ( leftkeys:seq[expression], rightkeys:seq[expression], buildside: Buildside, Left:sparkplan, Right:sparkplan) (@transient sqlcontext:sqlcontext) extends Binarynode with Hashjoin { override def Othercopyargs = SqlContext:: Nil override def outputpartitioning:partitioning = Left.out Putpartitioning override def requiredchilddistribution = unspecifieddistribution:: unspecifieddistribution :: Nil @transient lazy val broadcastfuture = future { //Use Sparkcontext Broadcast table SqlContext.sparkContext.broadcast (Buildplan.executecollect ()) } def execute () = { val Broadcastrelation = Await.result (broadcastfuture, 5.minute) Streamedplan.execute (). mappartitions { Streamediter = joiniterators (broadcastRelation.value.iterator, Streamediter)//Call joiniterators on each partition map } }}
2.3, Shufflehashjoinshufflehashjoin as the name implies is the need to shuffle data, outputpartitioning is the left child's partitioning.
will be shuffle according to this partitioning.
Then use the Zippartitions method in Sparkcontext to zip each partition.
Here's the requiredchilddistribution. Is Clustereddistribution, and this will be matched in hashpartitioning.
About the partition here do not repeat, can go to org.apache.spark.sql.catalyst.plans.physical under the partitioning inside to see.
Case Class Shuffledhashjoin ( leftkeys:seq[expression], rightkeys:seq[expression], buildside: Buildside, Left:sparkplan, Right:sparkplan) extends Binarynode with Hashjoin { override Def outputpartitioning:partitioning = left.outputpartitioning override def requiredchilddistribution = Clustereddistribution (Leftkeys):: Clustereddistribution (Rightkeys):: Nil def execute () = { Buildplan.execute (). Zippartitions (Streamedplan.execute ()) { (builditer, streamiter) = Joiniterators ( Builditer, Streamiter)}}}
To be continued:)
Original articles, reproduced please specify:
Reprinted from: Oopsoutofmemory Shengli blog. Oopsoutofmemory
This article link address: http://blog.csdn.net/oopsoom/article/details/38274621
Note: This document is based on the attribution-NonCommercial use-prohibition of the deduction of the 2.5 Chinese mainland (CC by-nc-nd 2.5 CN) agreement. Welcome to reprint, forward and comment. However, please keep the author's attribution and article links. Please contact me if you need to consult with us for commercial purposes or in connection with your authorization.
The detailed implementation of the physical Plan to Rdd for Spark SQL source code Analysis