LevelDB SSTable-1

Source: Internet
Author: User
Tags assert function prototype

Create a sstable file

Knowing the storage format of the sstable file and the organization of the data block, you can analyze how to create the Sstable file. The relevant code is in table_builder.h/.cc and block_builder.h/.cc (building block).

6.4.1 Tablebuilder Class

The class that constructs the sstable file is Tablebuilder, which provides several limited methods to add k/v pairs, flush into a file, and so on, and it relies on blockbuilder to build blocks.

Several interfaces of the Tablebuilder are explained below:

> void Add (const slice& key, const slice& value), adds a new {key, value} pair to the table currently being built, requiring the comparator specified by option, Key must be located after all the previously added keys;

> void Flush (), flush all current cached k/v to the file, an advanced method that most of the client does not need to call the method directly;

> void Finish () to end the build of the table, after which the method is called and will no longer use the incoming writablefile;

> void Abandon (), ends the build of the table and discards the current cached content, after which the method is called and will no longer use the incoming writablefile; "Just set closed to true, no other action"

Once the Finish ()/abandon () method is called, the flush or add operation cannot be performed again.

Here's a look at the classes involved, shown in 6.3-1.

Figure 6.3-1

Where Writablefile is the same as OP log, the memory-mapped files are used. Options is an option that some callers can set.

Tablebuilder has only one member variable rep* rep_, in fact the member of the rep struct is tablebuilder all member variables, and it may be intended to hide its internal details. The rep definition is also in the. cc file and is transparent to the outside.

Explain briefly the meaning of the member:

[CPP]View Plaincopy
  1. Options options; //Data block options
  2. Options index_block_options; //Index block options
  3. writablefile* file; //sstable file
  4. uint64_t offset; //To write data block offset in the sstable file, initial 0
  5. Status status; //Current status-initial OK
  6. Blockbuilder Data_block; //Data block for current operation
  7. Blockbuilder Index_block; //sstable index block
  8. Std::string Last_key; //The last k/v key of the current data block
  9. int64_t num_entries; //Current data block number, initial 0
  10. BOOL closed; //Call the finish () or Abandon (), Initial false
  11. Filterblockbuilder*filter_block; //quickly locate key in block based on filter data
  12. BOOL Pending_index_entry; //See the Add function below, initial false
  13. Blockhandle Pending_handle; information for the data block added to index block
  14. Std::string compressed_output; //Compressed data block, temporary storage, after writing is emptied

The filter block is a stored filtering information that stores {key, which corresponds to the offset value of the data block at sstable}, and is not necessarily completely accurate to quickly locate whether a given key is in the data block.

The following is an analysis of how to add k/v pairs to sstable to create and persist sstable. Other functions are relatively simple and skip over. In addition, for abandon, simple setting closed=true is returned.

6.4.2 Add k/v to

This is done by means of add (constslice& key, const slice& value), with no return value. The following analysis of the logic of the function:

S1 first ensure that the file is not close, that is, there is no call to Finish/abandon, and to ensure that the current status is OK, if there is currently a cached kv pair, the new key is guaranteed to be the largest.

[CPP]View Plaincopy
    1. rep* r = rep_;
    2. ASSERT (!r->closed);
    3. if (!ok ()) return;
    4. if (r->num_entries > 0) {
    5. ASSERT (R->options.comparator->compare (Key, Slice (R->last_key)) > 0);
    6. }

S2 If the token r->pending_index_entry is true, indicating that the first k/v of the next data block is encountered, the R->last_key is adjusted according to Key, This is done through the findshortestseparator of comparator.

[CPP]View Plaincopy
    1. if (r->pending_index_entry) {
    2. ASSERT (R->data_block.empty ());
    3. R->options.comparator->findshortestseparator (&r->last_key,key);
    4. Std::string handle_encoding;
    5. R->pending_handle. Encodeto (&handle_encoding);
    6. R->index_block. ADD (R->last_key, Slice (handle_encoding));
    7. R->pending_index_entry =false;
    8. }

Next, add Pending_handle to index block {r->last_key, R->pending_handle ' sstring}. Finally, set the R->pending_index_entry to False.

It is worth talking about the meaning of this pending_index_entry, see code Comment:

It was not until the first key of the next Databock was encountered that we generated the index entry for the previous datablock, which had the advantage that a shorter key could be used for index, such as the last key of the previous data block k/v Quick Brown Fox, whose successor data block's first key is "The Who", we can use a shorter string "the R" as the key to the index block entry of the previous data block.

In short, the last data block was added to the index block by Leveldb when the next DataBlock was started. Mark Pending_index_entry is to do this, the corresponding data block index entry information is stored in (blockhandle) pending_handle.

S3 If the filter_block is not empty, add the key to the Filter_block.

[CPP]View Plaincopy
    1. if (r->filter_block! = NULL) {
    2. R->filter_block->addkey (key);
    3. }

S4 Set R->last_key = key, add (key, value) to R->data_block, and update the number of entry.

[CPP]View Plaincopy
    1. R->last_key.assign (Key.data (), key.size ());
    2. r->num_entries++;
    3. R->data_block. ADD (Key,value);

S5 if the number of data blocks exceeds the limit, flush to the file immediately.

[CPP]View Plaincopy
    1. Const SIZE_TESTIMATED_BLOCK_SIZE = R->data_block. Currentsizeestimate ();
    2. if (estimated_block_size >=r->options.block_size) Flush ();
6.4.3 Flush File

The logic of the function is relatively simple, see the code directly as follows:

[CPP]View Plaincopy
  1. rep* r = rep_;
  2. ASSERT (!r->closed); //First guarantee not closed, and status OK
  3. if (!ok ()) return;
  4. if (R->data_block.empty ())return; //Data block is empty
  5. Ensure that the Pending_index_entry is false, that the add of data block is complete
  6. ASSERT (!r->pending_index_entry);
  7. Writes the data block and sets its index entry information-blockhandle object
  8. WriteBlock (&r->data_block, &r->pending_handle);
  9. Writes successfully, then flush the file, and sets R->pending_index_entry to True,
  10. To adjust the key-of index entry to the first key of the next data block is R->last_key
  11. if (ok ()) {
  12. R->pending_index_entry =true;
  13. R->status =r->file->flush ();
  14. }
  15. if (r->filter_block! = NULL) { //Add the data block in Sstable to the filter block
  16. R->filter_block->startblock (R->offset); //And specify start of new data block
  17. }
6.4.4 WriteBlock function

When you flush the file, the WriteBlock function is called to write the data block to the file, which also sets the index entry information for the data block. The prototypes are:

void WriteBlock (blockbuilder* block, blockhandle* handle)

This function does some preprocessing work, serializes the data block to be written, compresses it as needed, and the real write logic is in the Writerawblock function. The processing logic for the function is analyzed below.

S1 obtains the block's serialized data slice, determines whether to compress according to the configuration parameter, and compresses the data content according to the compressed format. For snappy compression, if the compression rate is too low <12.5%, it is stored as uncompressed content.

The finish () function of the Blockbuilder serializes the data block into a slice.

[CPP]View Plaincopy
  1. rep* r = rep_;
  2. Slice raw = Block->finish (); //Get the serialized string of data block
  3. Slice block_contents;
  4. Compressiontype type =r->options.compression;
  5. Switch (type) {
  6. Case knocompression:block_contents= Raw; Break ; //Do not compress
  7. Case ksnappycompression: { //Snappy compression format
  8. std::string* Compressed =&r->compressed_output;
  9. if (port::snappy_compress (Raw.data (), raw.size (), compressed) &&
  10. Compressed->size () < Raw.size ()-(Raw.size ()/8u)) {
  11. Block_contents =*compressed;
  12. } else { //If snappy is not supported, or if the compression rate is less than 12.5%, it is still treated as uncompressed storage
  13. Block_contents = raw;
  14. Type = knocompression;
  15. }
  16. Break ;
  17. }
  18. }

S2 writes the data content to the file, resets the block to the initialized state, and empties the Compressedoutput.

[CPP]View Plaincopy
    1. Writerawblock (Block_contents,type, handle);
    2. R->compressed_output.clear ();
    3. Block->reset ();
6.4.5 Writerawblock function

After the WriteBlock has done the preparation, it can be written to the sstable file. Look at the function prototype:

void Writerawblock (const slice& data, Compressiontype, Blockhandle*handle);

The function logic is simple, see the code.

[CPP]View Plaincopy
  1. rep* r = rep_;
  2. Handle->set_offset (R->offset); //Set data block's handle information for index
  3. Handle->set_size (Block_contents.size ());
  4. Nbsp;r->status =r->file->append (block_contents); //Write data block content
  5. if (R->status.ok ()) {//write 1byte type and 4bytes crc32
  6. Chartrailer[kblocktrailersize];
  7. Trailer[0] = type;
  8. uint32_t CRC = Crc32c::value (Block_contents.data (), block_contents.size ());
  9. CRC = Crc32c::extend (CRC, Trailer, 1); //Extend CRC Tocover block type
  10. EncodeFixed32 (Trailer+1, Crc32c::mask (CRC));
  11. R->status =r->file->append (Slice (trailer, kblocktrailersize));
  12. if (R->status.ok ()) { //write successful update offset-write offset for next data block
  13. R->offset +=block_contents.size () + kblocktrailersize;
  14. }
  15. }

6.4.6 Finish function

Call the finish function to indicate that the caller persisted all added k/v to sstable and closed the sstable file.

The function logic is clear and can be divided into 5 parts.

S1 first calls Flush, writes the last piece of data block, and then sets the close flag closed=true. Indicates that the sstable is closed and no more k/v pairs can be added.

[CPP]View Plaincopy
    1. rep* r = rep_;
    2. Flush ();
    3. ASSERT (!r->closed);
    4. r->closed = true;

Blockhandle Filter_block_handle,metaindex_block_handle, Index_block_handle;

S2 Write filter block to file

[CPP]View Plaincopy
    1. if (OK () &&r->filter_block! = NULL) {
    2. Writerawblock (R->filter_block->finish (), knocompression,&filter_block_handle);
    3. }

S3 Write meta index block to file

If Filterblock is not NULL, then join from the filter. Name "Mapping to the filter data location. With the meta index block, you can quickly navigate to the data area of the filter based on the filter name.

[CPP]View Plaincopy
  1. if (ok ()) {
  2. Blockbuildermeta_index_block (&r->options);
  3. if (R->filter_block!=null) {
  4. //Add from "filter. Name "Mapping to the filter data location
  5. std::string key ="filter.";
  6. Key.append (R->options.filter_policy->name ());
  7. Std::string handle_encoding;
  8. Filter_block_handle. Encodeto (&handle_encoding);
  9. Meta_index_block. ADD (key,handle_encoding);
  10. }
  11. //TODO (postrelease): Add stats and other metablocks
  12. WriteBlock (&meta_index_block, &metaindex_block_handle);
  13. }

S4 Write to index block, if you successfully flush the data block, you need to set the index block for the last data block and add it to the index block.

[CPP]View Plaincopy
  1. if (ok ()) {
  2. if (r->pending_index_entry) { //flush is set to True
  3. R->options.comparator->findshortsuccessor (&r->last_key);
  4. Std::string handle_encoding;
  5. R->pending_handle. Encodeto (&handle_encoding);
  6. R->index_block. ADD (R->last_key, Slice (handle_encoding)); //Add to index block
  7. R->pending_index_entry =false;
  8. }
  9. WriteBlock (&r->index_block, &index_block_handle);
  10. }

S5 write Footer.

[CPP]View Plaincopy
    1. if  (ok ())  {  
    2.   footer footer;  
    3.   footer.set_metaindex_ Handle (metaindex_block_handle);   
    4.   footer.set_index_handle (index_block_handle);   
    5. &NBSP;&NBSP;STD::STRING&NBSP;FOOTER_ENCODING;&NBSP;&NBSP;
    6.    footer. Encodeto (&footer_encoding);   
    7. &NBSP;&NBSP;
    8.   r->status  =r->file->append (footer_encoding);   
    9.    if  (R->status.ok ())  {  
    10.     r->offset +=footer_encoding.size ();   
    11. &NBSP;&NBSP;}&NBSP;&NBSP;
    12. }  

The entire write process is analyzed, and the operations for DataBlock and filter blocks are analyzed separately in the data block and filter block, and the following reads are the same.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.