Spark SQL Source Analysis In-memory Columnar Storage's cache table

Source: Internet
Author: User
Tags rewind rowcount

/** Spark SQL Source Analysis series Article */

Spark SQL can cache data into memory, and we can see that by invoking the cache table TableName A table can be cached in memory to greatly improve query efficiency.

This involves the storage of data in memory, and we know that relational-based data can be stored as a row-based storage structure or a column-based storage structure, or a row-and-column-based hybrid storage, that is, row Based Storage, column Based Storage, PAX Storage.

How is the memory data of Spark SQL organized?

Spark SQL loads the data into memory as a column of the storage structure. Called In-memory columnar Storage.

Storing Java Object directly creates a significant memory overhead and is a row-based storage structure. Querying some columns is slightly slower, and the query is less efficient than the column-oriented storage structure, even though the data is loaded into memory.

Row-based Java object storage:

Memory overhead, and easy full GC, query by column is slow.


Column-based Bytebuffer storage (Spark SQL):

The memory overhead is small and the query is faster by column.


Spark SQL In-memory Columnar storage is located in the Org.apache.spark.sql.columnar package below the Spark column:

The core classes are Columnbuilder, Inmemorycolumnartablescan, Columnaccessor, ColumnType.

If the column has compression: The compression package has specific build columns and classes for access columns.

First, the Primer

When we call the Cache Table command in Spark SQL, a cachecommand is generated, and the command is a physical plan.

scala> val cached = SQL ("cache table src")
Cached:org.apache.spark.sql.SchemaRDD = schemardd[0] at the RDD at schemardd.scala:103== Query plan = = Physical Plan ==cach Ecommand SRC, True

Here print out TableName is SRC, and a Boolean flag whether to cache.

Let's look at the structure of the Cachecommand:

Cachecommand supports 2 kinds of operations, one is to load the data source in memory, and one is to unload the data source from memory.

Corresponds to the cachetable and Uncachetabele under SqlContext.

Case Class Cachecommand (Tablename:string, Docache:boolean) (@transient context:sqlcontext)  extends Leafnode with Command {  override Protected[sql] Lazy val sideeffectresult = {    if (docache) {      context.cachetable (tableName )//cache table to memory    } else {      context.uncachetable (tableName)//Remove data from memory for this table    }    Seq.empty[any]  }  Override Def execute (): Rdd[row] = {    sideeffectresult    context.emptyresult  }  override def output: Seq[attribute] = Seq.empty}
If Cached.collect () is called, the cache or Uncache operation is performed according to command commands, where we perform the cache operation.

Cached.collect () will call the Cachetable function under SqlContext:


First, through the catalog query relationship, constructs a schemardd.

  /** Returns The specified table as a Schemardd/  def table (tablename:string): Schemardd =    new Schemardd (this, CA Talog.lookuprelation (None, TableName))

Find the analyzed plan for the schema. Matching construction inmemoryrelation:

  /** Caches the specified table In-memory. */  def cachetable (tablename:string): Unit = {    val currenttable = table (tableName). queryexecution.analyzed// Constructs a SCHEMARDD and executes the Analyze plan operation    val asinmemoryrelation = currenttable Match {case      _: inmemoryrelation =// If it is already a inmemoryrelation, then return        Currenttable.logicalplan Case      _ = =/ If not (the default is empty when the cache is just), build a memory relationship inmemoryrelation        inmemoryrelation (usecompression, Columnbatchsize, Executeplan (currenttable). Executedplan)    }    //Register the built-in inmemoryrelation with the catalog.    catalog.registertable (None, TableName, asinmemoryrelation)  }

Second, inmemoryrelation

Inmemoryrelation inherits from Logicalplan, a new TreeNode in Spark1.1 Spark SQL and a plan in catalyst. Now TreeNode has become 4 kinds:

1, Binarynode two Yuan node

2. Leafnode leaf node

3. Unarynode Single child node

4. Inmemoryrelation Memory Relational Node


The class diagram is as follows:

It is worth noting that _cachedcolumnbuffers is a private field of type Rdd[array[bytebuffer]].

This package is a column-oriented storage bytebuffer. As mentioned earlier, compared with plain Java object Storage record, using Bytebuffer can significantly improve storage efficiency and reduce memory consumption. and query by column is fast.


Inmemoryrelation specific implementations are as follows:

Constructs a inmemoryrelation that requires the output Attributes of the relation, whether usecoompression is required to compress, the default is False, how many rows of data are processed at one time batchsize, child That is Sparkplan.

Private[sql] Case Class Inmemoryrelation (    Output:seq[attribute],//output properties, such as SRC table is [key,value]    Usecompression:boolean,//Whether compression is used when operating, default false    Batchsize:int,//batch size amount    Child:sparkplan)//spark Plan specific child

Can be set by:

Spark.sql.inMemoryColumnarStorage.compressed is true to set whether the in-memory columnstore needs to be compressed.

Spark.sql.inMemoryColumnarStorage.batchSize to set how many row to process at a time

Spark.sql.defaultSizeInBytes to set the default size of the bufferbytes of the initialized column, here is just one of the parameters.

These parameters can be set in the source code, both in SQL Conf

Private[spark] Object sqlconf {  val compress_cached = "Spark.sql.inMemoryColumnarStorage.compressed"  Val Column_batch_size = "Spark.sql.inMemoryColumnarStorage.batchSize"   val default_size_in_bytes = " Spark.sql.defaultSizeInBytes "

Go back to case class Inmemoryrelation:

_cachedcolumnbuffers is the storage handle that we finally put the table into memory, which is a rdd[array[bytebuffer].

Cache main Process:

1, determine whether _cachedcolumnbuffers is null, if not NULL, the current table has been cache, repeat cache does not trigger the cache operation.

2, child is Sparkplan, that is, execute Hive table scan, test I take SBT/SBT hive/console test in the SRC table as an example, the operation is to scan this table. This table has 2 characters of the key is int, value is string

3, get the output of child, here the output is key, value2 columns.

4. Perform mappartitions operation to manipulate data for each partition of the current RDD.

5, for each partition, the data inside the iteration generates a new iterator. Each iterator inside is Array[bytebuffer]

6. For each column of Child.output, a columnbuilder is generated, and the last combination is an array of columnbuilders.

7. Each commandbuilder in the array holds a Bytebuffer

8. Traverse the records of the original partition, convert the row to columns, and save the data to Bytebuffer.

9. Finally, this rdd is called the cache method, and the RDD is cached.

10. Assign cached to _cachedcolumnbuffers.

This summary is performed by performing a hive table scan operation, the returned Mappartitionsrdd the Mappartition method to which it is redefined, the row is transferred to the column, and the final cache is in memory.

All processes are as follows:

  If The cached column buffers were not passed in, we calculate them in the constructor.  As in Spark, the actual work of caching is lazy.          if (_cachedcolumnbuffers = = null) {//Determine if the current table Val output = child.output/** * Child.output is already cache Res65:seq[org.apache.spark.sql.catalyst.expressions.attribute] = ArrayBuffer (key#6, value#7) */Val cached = Child.execute (). mappartitions {baseiterator =/** * Child.execute () is a collection of row, iterating row * res66:a Rray[org.apache.spark.sql.catalyst.expressions.row] = Array ([238,val_238]) * * val row1 = Child.execute (). Tak  E (1) * Res67:array[org.apache.spark.sql.catalyst.expressions.row] = Array ([238,val_238]) * */* * Map each partition, map to generate a iterator[array[bytebuffer], corresponding to the Java iterator<list<bytebuffer>> * * */New Iter  Ator[array[bytebuffer]] {def next () = {//iterates through each column, first attribute is key Integertype, then attribute is value is string        Finally encapsulated as an array, index 0 is intcolumnbuilder, 1 is stringcolumnbuilder val columnbuilders = output.map {attr Ibute = val ColumnType = ColumnType (attribute.datatype) val initialbuffersize = Columntype.defau          Ltsize * batchsize columnbuilder (Columntype.typeid, Initialbuffersize, Attribute.name, useCompression)          }.toarray//src table row is [238,val_238] This row's length is 2 var row:row = null var rowCount = 0            BatchSize default (Baseiterator.hasnext && RowCount < batchsize) {//traverse each record row = Baseiterator.next () var i = 0//Here row length is 2,i value is 0 and 1 while (I < ROW.L Ength) {//Get columnbuilders, 0 is Intcolumnbuilder,//basiccolumnbuilder appendfrom/              /appends ' Row (ordinal) ' to the column builder. Columnbuilders (i). Appendfrom (row, i) i + = 1}//The line hasAfter inserting RowCount + = 1}//limit and rewind,returns the final columnar byte buffer. Columnbuilders.map (_.build ())} def hasnext = Baseiterator.hasnext}}.cache () Cached.setname (CHI ld.tostring) _cachedcolumnbuffers = cached}
Third, columnar Storage
Initialize Columnbuilders:

  Val columnbuilders = output.map {attribute =                val columntype = ColumnType (attribute.datatype)                Val Initialbuffersize = columntype.defaultsize * batchsize                columnbuilder (Columntype.typeid, InitialBufferSize, Attribute.name, usecompression)              }.toarray

An array is declared here that corresponds to the storage for each column, such as:



The parameters that are passed in when the type Builder is initialized are then:

Initialbuffersize: How is the Bytebuffer,bytebuffer initialization size calculated in the figure at the beginning of the article?

initialbuffersize = Column Type default length xbatchsize, default batchsize is 1000

Take int type example, initialbuffersize of Integertype = 4 * 1000

Attribute.name field name Age,name etc ...

ColumnType:

ColumnType encapsulates the type of typeId and the defaultsize of that type. The extract and Append\getfield methods are provided to append and retrieve data to buffer.

such as Integertype typeId 0, defaultsize 4 ...

Look at the class diagram in detail, drawing not very strict class diagram, mainly to show the current type system:


Columnbuilder:

The main responsibilities of Columnbuilder are to manage Bytebuffer, including initializing buffer, adding data to buffer, checking the remaining space, and applying for new space.

Initialize is responsible for initializing buffer.

Appendfrom is responsible for adding data.

Ensurefreespace ensure that the length of the buffer increases dynamically.

The class diagram is as follows:


Bytebuffer initialization process:

Initialization size initialsize: Take int, for example, in front of the builder initialization is passed 4xbatchsize=4*1000,initialsize is 4KB, if there is no incoming initialsize, then the default is 1024x1024.

The column name, whether it needs to be compressed, is required to be passed in.

The Bytebuffer declaration was reserved with 4 bytes, in order to put the column type ID, this is described in the ColumnType structure.

  Override Def Initialize (      initialsize:int,      columnname:string = "",      Usecompression:boolean = false) = {    Val size = if (InitialSize = = 0) default_initial_buffer_size Else initialsize//If there is no default 1024x1024 byte    this.columnname =  ColumnName    //Reserves 4 bytes for column type ID    buffer = bytebuffer.allocate (4 + size * columntype.defaultsize) The initialization length of buffer, which needs to be added with the 4byte type ID space.    Buffer.order (Byteorder.nativeorder ()). Putint (Columntype.typeid)//Sort by nativeorder, then first put TypeId  }

stored in the following way:

The type ID of int is 0, and the type ID of string is 7. The data is actually stored in the back.

Bytebuffer Write process:

The storage structure is described, and the table is finally started to scan, after the scan for each row of each partition to operate the traversal:

1. Read each row of each partition

2, get the value of each column, from the builders array to find the index i corresponding to the Bytebuffer, append to Bytebuffer.

  while (Baseiterator.hasnext && RowCount < batchsize) {            //traverse each record            row = Baseiterator.next ()            var i = 0            //Here row length is the value of 2,i is 0 and 1 Ps: or take src table to do the test, each row only 2 fields, key, value all the length of 2 while            (I < row.length) {              Get columnbuilders, 0 is Intcolumnbuilder,               //basiccolumnbuilder's appendfrom              //appends ' Row (ordinal) ' to the Column Builder.              Columnbuilders (i). Appendfrom (row, i)//append to corresponding Bytebuffer              i + = 1            }            //The line has been inserted            RowCount + = 1          }          //limit and rewind,returns the final columnar byte buffer.          Columnbuilders.map (_.build ())

Append process:

Based on the current builder type, the value is fetched from the row's corresponding index and appended to the builder's bytebuffer.

  Override Def Appendfrom (Row:row, ordinal:int) {    //ordinal is row index,0 is the first column value, 1 is the second column value, the Get column value is field    // Finally, put the value of the column into the buffer    val field = Columntype.getfield (row, ordinal)    buffer = ensurefreespace (buffer, Columntype.actualsize (field))//Dynamic expansion    columntype.append (field, buffer)  }

Ensurefreespace:

The main operation is buffer, if the data to append is larger than the remaining space, the buffer is enlarged.

  Make sure the remaining space is able to accommodate, if the remaining space is less than the size to be placed, then reassign a look memory space  Private[columnar] def ensurefreespace (orig:bytebuffer, size:int) = {    if (orig.remaining >= size) {//current buffer is larger than the data to append, do nothing, return itself      orig    } else {//otherwise expand      //grow in step  s of initial size      val capacity = Orig.capacity ()      val newSize = capacity + Size.max (CAPACITY/8 + 1)      val pos = Orig.position ()      orig.clear ()      Bytebuffer        . Allocate (NewSize)        . Order (Byteorder.nativeorder ())        . Put (Orig.array (), 0, POS)    }  }

......

Finally call Mappartitionsrdd.cache () to cache the RDD and add it to the spark cache management.

So far, we've cached a spark SQL table in the Spark's JVM.

Iv. Summary

For the storage structure of data, we often focus on the persistence of the storage structure, and in a long period of time there are many kinds of efficient structure.

However, under the requirement of real-time, the memory database is more and more concerned, and how to optimize the storage structure of the memory database is a key point and a difficulty.

For column storage in Spark SQL and Shark is an optimization scheme that improves the speed of column queries in relational queries and reduces memory consumption. But in the storage mode is relatively simple, there is no additional metadata and index to improve the efficiency of the query, I hope to learn more in-memory Storage.

--eof--

Original articles, reproduced please specify from: http://blog.csdn.net/oopsoom/article/details/39525483

Spark SQL Source Analysis In-memory Columnar Storage's cache table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.