Leveldb source code analysis-16

Source: Internet
Author: User
Document directory
  • 10.1 version Interface
  • 10.2 version: additerators ()
  • 10.3 version: levelfilenumiterator class
  • 10.4 version: Get ()
10 version Analysis 1

Let's first analyze leveldb's sstable file management for a single version, mainly in the version class. The functions and members of the version class are described in section 10.4 above. The function interfaces and code implementation are analyzed here.
Version does not modify the sstable file it manages, but only the read operation. 10.1 version Interface

Let's take a look at the interface functions of the version class, and then analyze them one by one.

// Append a series of iterator to @ * iters. The content of this version will be generated when merge is merged. // requirement: version is saved (see versionset: saveto) void additerators (constreadoptions &, STD: vector <iterator *> * iters); // specify @ key to find the value. If the value is saved in @ * val, Return OK. // Otherwise, non-OK is returned and @ * stats is set. // requirement: no hold lock struct getstats {filemetadata * seek_file; int seek_file_level;}; status get (constreadoptions &, const lookupkey & Key, STD: string * val, getstats * stats); // Add @ stats to the current State. If you need to trigger a new compaction, return true // requirement: Hold lock bool updatestats (constgetstats & stats ); void getoverlappinginputs (intlevel, const internalkey * begin, // null indicates const internalkey * end before all keys, // Null indicates STD: vector <filemetadata *> * inputs) After all keys; // if some files in the level and [* smallest_user_key, * returns true if largest_user_key] is overlapped. // @ Smallest_user_key = NULL indicates a key that is smaller than all keys in the database. // @ largest_user_key = NULL indicates a key that is larger than all keys in the database. bool overlapinlevel (INT level, const slice * smallest_user_key, const slice * largest_user_key); // return the level at which we should place the new memtable compaction, // This compaction covers the range [smallest_user_key, largest_user_key]. int picklevelformemtableoutput (const slice & smallest_user_key, const slice & largest_user_key); int numfiles (INT level) const {return files _ [level]. size ();} // Number of sstables of the specified level

10.2 version: additerators ()

The function is finally called in the DB: newiterators () interface at the following call levels:
Dbimpl: newiterator ()-> dbimpl: newinternaliterator ()-> Version: additerators ().
The function creates a two level iterator for all sstable in this version to traverse sstable content.
For sstable files with level = 0, they are directly created through the tablecache: newiterator () interface. This will directly load the sstable file to the memory cache.
For sstable files with level> 0, use the newtwoleveliterator () function to create a twoleveliterator, which uses the lazy open mechanism.
The following code is used to analyze the function:
S1 directly loads the cache for level = 0 sstable files. The sstable files of level0 may overlap and merge is required.

  for (size_t i = 0; i <files_[0].size(); i++) {   iters->push_back(vset_->table_cache_->NewIterator(// versionset::table_cache_            options,files_[0][i]->number, files_[0][i]->file_size));  }

S2 does not overlap the lazy open mechanism for level-0 sstable files.

  for (int ll = 1; ll <config::kNumLevels; ll++) {    if(!files_[ll].empty()) iters->push_back(NewConcatenatingIterator(options,level));  }

The newconcatenatingiterator () function directly returns a twoleveliterator object:
Return newtwoleveliterator (New levelfilenumiterator (vset _-> ICMP _, & files _ [level]),
& Getfileiterator, vset _-> table_cache _, options );
The Level 1 iterator is a levelfilenumiterator, and the Level 2 iteration function is getfileiterator. We will analyze it separately.
Getfileiterator is a static function that directly returns tablecache: newiterator (). Function declaration:
Static iterator * getfileiterator (void * Arg, const readoptions & options, constslice & file_value)

Tablecache * cache = reinterpret_cast <tablecache *> (ARG); If (file_value.size ()! = 16) {// error returnnewerroriterator (Status: uption ("XXX");} else {returncache-> newiterator (options, decodefixed64 (file_value.data ()), // filenumber decodefixed64 (file_value.data () + 8); // filesize}

The file_value here is taken from the value of levelfilenumiterator. Its value () function compresses the file number and size into a server Load balancer object in the form of fixed 8 byte and returns the result. 10.3 version: levelfilenumiterator class

This is also a subclass of the successor iterator, an internal iterator. A version/level pair is given to generate the file information within the level. For a given entry, key () returns the largest key contained in the file, and value () returns | file number (8 bytes) | file size (8 bytes) | string.
Its constructor accepts two parameters: internalkeycomparator &, used for key comparison; vector <filemetadata *> *, pointing to the list of all sstable files of the version.
Levelfilenumiterator (const internalkeycomparator & ICMP,
Const STD: vector <filemetadata *> * flist)
: ICMP _ (ICMP), flist _ (flist), index _ (flist-> size () {}// marks as invalid
Let's take a look at its interface implementation. All of them are listed.

The valid, seektoxx, And next/Prev functions are simple. After all, the container is a vector. The seek function calls findfile, which will be analyzed later.

Virtual void seek (constslice & target) {index _ = findfile (ICMP _, * flist _, target);} virtual void seektofirst () {index _ = 0 ;} virtual void seektolast () {index _ = flist _-> Empty ()? 0: flist _-> size ()-1;} virtual void next () {assert (valid (); index _ ++;} virtual void Prev () {assert (valid (); If (index _ = 0) index _ = flist _-> size (); // marks as invalid else index _--;} slice key () const {assert (valid (); Return (* flist _) [index _]-> largest. encode (); // return the largest key} slice value () const contained in the current sstable {// compress assert (valid () according to the format of | Number | size | fixed int ()); encodefixed64 (value_buf _, (* flist _) [index _]-> Number); encodefixed64 (value_buf _ + 8, (* flist _) [index _]-> file_size ); return slice (value_buf _, sizeof (value_buf _));}

Looking at findfile, this is actually a binary lookup function. Because the input sstable file list is ordered, you can use the binary lookup algorithm. The code is no longer listed.

10.4 version: Get ()

The search function is called directly in dbimpl: Get (). The function prototype is:
Status version: Get (const readoptions & options, constlookupkey & K, STD: string * value, getstats * stats)
If this get operation contains more than one seek file (only at level 0), the first file to be searched is saved in stats. If stat returns data, it indicates that other unnecessary searches are performed before the sstable file containing the key is searched. This result will be used in updatestats.
This function logic is still somewhat complicated. Let's look at the code.
S1 first, obtain necessary information and initialize several temporary variables.

Slice Ikey = K. internal_key (); slice user_key = K. user_key (); const comparator * ucmp = vset _-> ICMP _. user_comparator (); Status s; stats-> seek_file = NULL; stats-> seek_file_level =-1; filemetadata * last_file_read = NULL; // when you find> 1 file, record the last int last_file_read_level =-1 when reading; // This only happens when level 0 STD: vector <filemetadata *> TMP; filemetadata * tmp2;

S2 traverses all levels from 0 and searches for them in sequence. Because the entry does not span the level, if an entry is found in a level, it does not need to be searched in the subsequent level.

For (INT level = 0; level <config: knumlevels; level ++) {size_t num_files = files _ [level]. size (); If (num_files = 0) continue; // if no file exists at the current layer, skip this step and retrieve the list of all sstable files at the level, search for filemetadata * const * files = & files _ [level] [0];

All subsequent logics are in the for loop body.
S3 traverses the list of sstable files under the level and searches for them. Note that for processing sstable files with level = 0 and> 0, keys of level 0 files may overlap, therefore, the processing logic is different from the level greater than 0.
S3.1 for level 0, the files may overlap. Find all the files that overlap with user_key and process them in chronological order.

TMP. reserve (num_files); For (uint32_t I = 0; I <num_files; I ++) {// traverses all sstable files under level 0, filemetadata * f = files [I]; if (ucmp-> compare (user_key, F-> smallest. user_key ()> = 0 & ucmp-> compare (user_key, F-> largest. user_key () <= 0) TMP. push_back (f); // The sstable file contains user_key overlapped} If (TMP. empty () continue; STD: Sort (TMP. begin (), TMP. end (), newestfirst); // sort files = & TMP [0]; num_files = TMP. size (); // point to the TMP pointer and size

S3.2 for level> 0, leveldb ensures that sstable files do not overlap. Therefore, the processing logic is different from level 0. You can directly locate the sstable File Based on the Ikey.

// Perform binary search. Find the file index uint32_t Index = findfile (vset _-> ICMP _, files _ [level], Ikey) of the first largest key> = Ikey ); if (index> = num_files) {// not found, file does not exist files = NULL; num_files = 0;} else {tmp2 = files [Index]; if (ucmp-> compare (user_key, tmp2-> smallest. user_key () <0) {// all the keys of the file found are greater than user_key, which means that the file does not exist files = NULL; num_files = 0;} else {files = & tmp2; num_files = 1 ;}}

S4 traverses the files found, which exist in files and the number of files is num_files.
For (uint32_t I = 0; I <num_files; ++ I ){
The subsequent logic is in this loop. As long as a k/V pair is found in a file, it jumps out of the for loop.
S4.1 if more than one file is searched and recorded for this read, this will only happen at level 0.

If (last_file_read! = NULL & stats-> seek_file = NULL) {// more than one seek file is read this time, and the first stats-> seek_file = last_file_read is recorded; stats-> seek_file_level = last_file_read_level;} filemetadata * f = files [I]; last_file_read = f; // records the read level and file last_file_read_level = level;

S4.2 calls tablecache: Get () to get {Ikey, value}. If OK is returned, it enters s4.3. Otherwise, it returns directly. The passed callback function is savevalue ().

Saver saver; // initialize saver. state = knotfound; saver. ucmp = ucmp; saver. user_key = user_key; saver. value = value; S = vset _-> table_cache _-> get (options, F-> Number, F-> file_size, ikey, & saver, savevalue); If (! S. OK () return S;

S4.3 is determined based on the saver status. If it is not found, the next earlier sstable file is searched down, and other values are returned.

Switch (saver. state) {Case knotfound: break; // continue to search for the next earlier sstable file case kfound: Return s; // locate case kdeleted: // deleted S = Status :: notfound (slice (); // return s with an empty error string for efficiency; Case kmeanupt: // data corruption S = Status: uption ("incluupted key ", user_key); Return s ;}

The above is the code logic of version: Get (). If there are too many sstable files at level 0, the reading speed will be affected, which is also the reason for compaction.
In addition, there is a saver function passed to tablecache: Get (). The following is a simple analysis. This is a static function: static void savevalue (void * Arg, const slice & Ikey, const slice & V ). It uses the structure saver internally:
Struct saver {
Saverstate state;
Const comparator * ucmp; // User Key Comparator
Slice user_key;
STD: string * value;
};
The logic of the savevalue function is very simple. First, parse the internalkey passed in by the table, and then judge whether the user key is the user key to be searched based on the specified comparator. If the type is ktypevalue, it is set to saver: * value and kfound is returned. Otherwise, kdeleted is returned. The Code is as follows:

Saver * s = reinterpret_cast <saver *> (ARG); parsedinternalkey parsed_key; // parse Ikey to parsedinternalkey if (! Parseinternalkey (Ikey, & parsed_key) S-> state = kw.upt; // parsing failed else {If (S-> ucmp-> compare (parsed_key.user_key, S-> user_key) = 0) {// compare User Key S-> state = (parsed_key.type = ktypevalue )? Kfound: kdeleted; If (S-> state = kfound) S-> value-> assign (v. data (), V. size (); // locate and save the result }}

The following functions are more or less related to compaction.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.