LEVELDB Source Analysis--cache and get lookup process

Source: Internet
Author: User
Tags mutex

This is going to analyze the concept of version related, but in the process of preparing to see the Versionset table_cache_ this variable to remember that there is still such a module has not been analyzed, the tradeoff is that LEVELDB version relative to the cache is relatively complex , and version is very close to other functions for the whole leveldb, but conceptually it is relatively weak and a bit of feeling is the added function. So from the introduction of the system should first pay attention to the whole system concept of the integrity of the point of view, or first analyze the cache-related functions.

Let's look at the basic framework data for the cache:

structLruhandle {void* VALUE;//The object handle of the cache, table is Table&file, block is (table&file) _offset  void(*deleter) (ConstSlice&,void* value);//callback functionLruhandle* Next_hash;//Hash table conflict resolution pointersLruhandle* Next;//LRU double-linked table pointerlruhandle* prev;//LRU double-linked table pointersize_t charge;//TODO (OPT): Only allow uint32_t?size_t Key_length;    uint32_t refs;                                uint32_t Hash; //The hash value calculated by key is implemented in hash.cc  Charkey_data[1];//encode after the File_num};

Handletable is a simple Hashtable chain implementation with the following members:

handletable{  uint32_t length_;  uint32_t Elems_;  Lruhandle* * list_;};

The LRUCache contains a Hashtable (handletable) and a doubly-linked header and a capacity, usage, and mutex, which are inserted into both Hashtable and doubly linked lists whenever the insert occurs.

lrucache{  size_t capacity_;  Port::mutex mutex_;  size_t Usage_;  Lruhandle lru_;  Handletable Table_;} ;

The basic diagram between LRUCache and Handletable and Lruhandle (the green box on the way) can be described as follows:

For the simplification of the graphics, where the relationship between the address and the object is not fully displayed, Lru_ is the object and the other green boxes should be the address of the representation, the approximate figure is easy to understand, the specific relationship please refer to the source code.

Shardedlrucache structure is more simple, is a LRUCAHCE array, so that the simple collection encapsulates multiple LRUCache to achieve fragmentation to reduce the granularity of the lock increase the degree of concurrency.

After understanding the basic relationship to understand the code is very simple, here is no longer listed, the only thing to illustrate is the handletable hash automatic growth mode, when the element in Handletable When it is larger than the hash array size, the array is resize to a multiple of 4 greater than the current element number, and the old element is migrated to the new hash array.

if (Elems_ > length_)         Resize ();
voidResize () {uint32_t new_length=4;  while(New_length <Elems_) {New_length*=2; } lruhandle* * New_list =NewLruhandle*[new_length];//New Hash Array     for(Uint32_t i =0; i < length_; i++) {//migrating the original element to a new hash arraylruhandle* h =List_[i];  while(H! =NULL) {Lruhandle* Next = h->Next_hash; uint32_t Hash= h->Hash; Lruhandle* * ptr = &new_list[hash & (New_length-1)]; H->next_hash = *ptr; *ptr =h; H=Next; Count++; }} delete[] List_;//Delete on array, use new tableList_ =new_list; Length_=new_length; }};

The above is the basic structure of the LRUCache in Leveldb, but leveldb in the process of using a number of variations and encapsulation, such as Tablecache and Dbimpl in a block_cache. Let's start by combing these two concepts: the code shows that Tablecache caches only a reference to a Table object and a Randomaccessfile object. And from the Table::open function can know that this table object only saved basic management information (including the content of the previous article has been elaborated, please carefully verify), so the actual data in table is not cached therein. So where does the actual data be cached? Here Leveldb used another option in the Shardedlrucache, of course, the cache because it is in the option so that can be changed, you can according to your business objectives to design a self. You may be a little dizzy here, leveldb. What is the process of getting a KV pair in the cache? What is the reason for this design?

The first question: the cache in LEVELDB is also implemented layered, first cache a sstable basic information, rather than the entire sstable read to the memory, and then a cache to sstable the next level of actual block data. Then the data can be obtained by the basic information to obtain the approximate sstable, get the sstable handle, and then according to the basic information in the sstable cache to get to which block of information, and then according to the block handle (sstable handle + Block handle) to get the actual data in the block cache.

The second problem: because layering rather than the whole sstable one-time cache to memory, then the resulting long is obvious, can reduce the memory consumption.

Let's take a look at the functions of each function in Table_cache, first see get

Status Tablecache::get (Constreadoptions&options, uint64_t file_number,//file handleuint64_t File_size,Constslice& K,//Find the key                       void*Arg,void(*saver) (void*,ConstSlice&,Constslice&)) {Status s= Findtable (File_number, File_size, &handle);//find the cache handle  if(S.ok ()) {Table* t = reinterpret_cast<tableandfile*> (Cache_->value (handle))table; S= t->Internalget (options, K, ARG, saver); Cache_-Release (handle); }  returns;}

See Findtable again:

Status tablecache::findtable (uint64_t file_number, uint64_t file_size, Cache::handle**handle) {EncodeFixed64 (buf, file_number);//according to the File_num Group A keySlice Key (BUF,sizeof(BUF)); *handle = Cache_->lookup (key);//find out if the current sstable information is already in Table_cache  if(*handle = = NULL) {//if it is not, open sstable,STD::stringFName =tablefilename (dbname_, File_number); S= Env_->newrandomaccessfile (fname, &file);//try the ldb suffix    if(!S.ok ()) {std::stringOld_fname = Ssttablefilename (dbname_, File_number);//try SST suffix      if(Env_->newrandomaccessfile (Old_fname, &file). OK ()) {s=Status::ok (); }    }    if(S.ok ()) {s= Table::open (*options_, file, File_size, &table);//read file management information to generate a Table object    }    if(!S.ok ()) {    } Else{//cache content in Tablecachetableandfile* tf =NewTableandfile; TF->file =file; TF->table =table; *handle = Cache_->insert (key, TF,1, &deleteentry); }  }  returns;}

Look at table Internalget, its function is to find the key in Sstable

Status Table::internalget (Constreadoptions& options,Constslice&K,void*Arg,void(*saver) (void*,ConstSlice&,Constslice&) ) {Iterator* Iiter = Rep_->index_block->newiterator (rep_->options.comparator); Iiter->seek (k);//find the block where key may exist  if(iiter->Valid ()) {    if(Filter! = NULL &&handle. Decodefrom (&handle_value). OK () &&!filter->keymaymatch (Handle.offset (), K)) {//based on Bloomfilter to determine if the block//Not found}Else{Iterator* Block_iter = Blockreader ( This, Options, iiter->value ()); Block_iter->seek (k);//read the block content and look in the block      if(block_iter->Valid ()) {        (*saver) (ARG, Block_iter->key (), block_iter->value ()); } s= block_iter->status ();    Delete Block_iter; }  }  if(S.ok ()) {s= iiter->status ();  } Delete iiter; returns;}

In addition, there is a newiterator function in the Tablecache, as the name implies that he is to generate a traversal of the cache sstable iterator, he is basically a simple call table Newiterator. Leveldb iterator design is also more subtle, such as newtwoleveliterator this thing, we will have space to introduce later, here you just need to know that he is to generate a traversal sstable iterator on it.

The basic data relationship is cleared, and finally it's time to introduce the Get function of the db_impl that we are most concerned about, because Db_impl is a subclass of the DB virtual class, so when the user calls the get of the DB, the function is actually called.

Status Dbimpl::get (Constreadoptions&options,Constslice&Key, std::string*value) {  //locks, Sequence,version,ref, and a series of settings  {    //find out if in memtable, this is the latest data    if(Mem->get (Lkey, Value, &s)) {// Done}Else if(IMM! = NULL && imm->get (Lkey, Value, &s)) {//find out if in imutable memtable, this is the second new data}Else{s= Current->get (options, Lkey, Value, &stats);//Otherwise, find in sstable in the current versionHave_stat_update =true; }  }  //some other treatment, as well as judging whether or not to depart compaction  returns;}

Find in current version:

Status Version::get (Constreadoptions&options,Constlookupkey&K, std::string*value, GetStats*stats) {  //starting from Level 0, first level search, Level 0 newest, 1 level new, in turn older,//so when looking for the time to skip the level, find the latest data after the old data is not needed    for(intLevel =0; Level < Config::knumlevels; level++) {size_t num_files=files_[level].size (); if(Num_files = =0)Continue; //Get file handlefilemetadata*Const* files = &files_[level][0]; if(Level = =0) {      //Level-0 need to find all the files because there may be overlapping in level-0       for(Uint32_t i =0; i < num_files; i++) {Filemetadata* F =Files[i]; if(Ucmp->compare (User_key, F->smallest.user_key ()) >=0&&ucmp->compare (User_key, F->largest.user_key ()) <=0) {tmp.push_back (f); }      }      //Sort by file new and old, new in frontStd::sort (Tmp.begin (), Tmp.end (), Newestfirst); Files= &tmp[0]; Num_files=tmp.size (); } Else {      //Locate the first largest key >= Ikey file (sstable).uint32_t index = findfile (vset_->Icmp_, Files_[level], ikey); TMP2=Files[index]; if(Ucmp->compare (User_key, Tmp2->smallest.user_key ()) <0) {          //not in this fileFiles =NULL; Num_files=0; } Else{Files= &TMP2; Num_files=1; }    }     //find specific sstable later in the sstable cache for specific lookupss = vset_->table_cache_->get (options, F->number, f->file_size, Ikey,&saver, Savevalue); Switch(saver.state) { CaseKnotfound: Break;//continue searching, knowing that all files found are found         CaseKfound:returns;  Casekdeleted:s= Status::notfound (Slice ());//has been deleted and returned directly          returns;  Casekcorrupt:s= Status::corruption ("corrupted key for", User_key); returns; }    }  }  returnStatus::notfound (Slice ());//Use a empty error message for Speed}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.