LEVELDB Source Analysis-memtable

Source: Internet
Author: User
Tags assert prev

Memtable (db/memtable.h db/memtable.cc db/skiplist.h)

The portion of the KV data stored in memory in LEVELDB is stored in the memtable, and the data in memtable is actually stored with a skip table. Memtable uses arena for memory management and provides interfaces for adding, finding, and iterators, which are actually implemented by invoking the add and iterator interfaces of Skiplist.

Memtable class

Declaration of the Memtable class:

Class memtable{public://...//increase reference count.    void Ref () {++refs_;}  Drop reference count.    Delete If no more references exist.        void Unref () {--refs_;        ASSERT (refs_ >= 0);        if (refs_ <= 0) {Delete this;    }}//...//Return an iterator that yields the contents of the memtable.  The caller must ensure, the underlying memtable remains live/While the returned iterator is live. The keys returned by this//iterator is internal keys encoded by Appendinternalkey in the//Db/format.    {H,CC} module.    Iterator *newiterator ();    Add an entry into memtable that maps key to value at the//specified sequence number and with the specified type.    Typically value would be empty if type==ktypedeletion.    void Add (SequenceNumber seq, ValueType type, const Slice &key, const Slice &value); If memtable contains a value for key, StOre it in *value and return true.    If Memtable contains a deletion for key, the store a NotFound () error//In *status and return true.    Else, return False.  BOOL Get (const LookupKey &key, std::string *value, Status *s);        Private://... struct Keycomparator {const Internalkeycomparator comparator; Explicit Keycomparator (const internalkeycomparator &c): Comparator (c) {} int operator () (const char *a, const C    Har *b) const;    };    Friend class Memtableiterator;    Friend class Memtablebackwarditerator;    typedef skiplist<const CHAR *, keycomparator> Table;    Keycomparator Comparator_;    int refs_;    Arena Arena_;    Table Table_; // ...};
Member variables for memtable

The Table_ variable is a skip table that actually stores the KV data, Arena_ for memory management, REFS_ for reference counting, and comparator_ for comparisons between keys.

The simplest of these is ref_, which calls the ref () function to increase the reference count, calls the Unref () function to reduce the number of references, and when the number of references is 0 o'clock, the Unref () function deletes the object itself.

Arena class

And then the Arena_. The Arena class is:

class Arena{  public:    // ...    // Return a pointer to a newly allocated memory block of "bytes" bytes.    char *Allocate(size_t bytes);    // Allocate memory with the normal alignment guarantees provided by malloc    char *AllocateAligned(size_t bytes);    // ...  private:    char *AllocateFallback(size_t bytes);    char *AllocateNewBlock(size_t block_bytes);    // Allocation state    char *alloc_ptr_;    size_t alloc_bytes_remaining_;    // Array of new[] allocated memory blocks    std::vector<char *> blocks_;    // Total memory usage of the arena.    port::AtomicPointer memory_usage_;    // ...};

Alloc_ptr_ points to the starting position of the memory that can be allocated (in a block), Alloc_bytes_remaining_ is the size of the memory currently allocated, Blocks_ contains pointers to each block, Memory_ Usage_ is the total amount of memory used, and atomicpointer can be used to implement atomic operations through a memory barrier or a C + + atomic interface.

The Arena class provides two main methods, one is allocate (size_t bytes):

inline char *arena::allocate (size_t bytes) {//The semantics of what to return is a bit messy if we allow//0-byte    Allocations, so we disallow them here (we don ' t need//them for our internal use).    Assert (bytes > 0);        if (bytes <= alloc_bytes_remaining_) {char *result = alloc_ptr_;        Alloc_ptr_ + = bytes;        Alloc_bytes_remaining_-= bytes;    return result; } return Allocatefallback (bytes);} Char *arena::allocatefallback (size_t bytes) {if (bytes > KBLOCKSIZE/4) {//Object is more than a quarte  R of our block size.        Allocate it separately//to avoid wasting too much space in leftover bytes.        Char *result = allocatenewblock (bytes);    return result;    }//We waste The remaining space in the current block.    Alloc_ptr_ = Allocatenewblock (kblocksize);    Alloc_bytes_remaining_ = kblocksize;    char *result = alloc_ptr_;    Alloc_ptr_ + = bytes;    Alloc_bytes_remaining_-= bytes; return result;}Char *arena::allocatenewblock (size_t block_bytes) {char *result = new Char[block_bytes];    Blocks_.push_back (result); Memory_usage_.    Nobarrier_store (reinterpret_cast<void *> (memoryusage () + block_bytes + sizeof (char *))); return result;}

The memory allocated by the Allocate (size_t bytes) function is not guaranteed to be aligned. First check that the currently allocated memory space is sufficient, if enough, directly to the location from Alloc_ptr_ point to the beginning of bytes bytes assigned to the caller, and move Alloc_ptr_ point to the new location, adjust Alloc_bytes_remaining_. If not enough, call the Allocatefallback (size_t bytes) function. The Allocatefallback (size_t bytes) function first checks if the bytes size exceeds 1/4 blocks, and if it is exceeded, calls Allocatenewblock (size_t block_bytes). The function assigns a new block to the caller, and the new block will not be used except for the current bytes byte, while the alloc_ptr_ point is unchanged, that is, the memory currently allocated will remain unchanged, since the remaining space is at least 1/ 4 blocks, so the design can reduce chunks of memory fragmentation. If there are no more than 1/4 blocks, a new block is allocated, and the alloc_ptr_ points to the remaining space in the new block.

There is also a memory allocation function that is allocatealigned (size_t bytes):

char *Arena::AllocateAligned(size_t bytes){    const int align = (sizeof(void *) > 8) ? sizeof(void *) : 8;    assert((align & (align - 1)) == 0); // Pointer size should be a power of 2    size_t current_mod = reinterpret_cast<uintptr_t>(alloc_ptr_) & (align - 1);    size_t slop = (current_mod == 0 ? 0 : align - current_mod);    size_t needed = bytes + slop;    char *result;    if (needed <= alloc_bytes_remaining_)    {        result = alloc_ptr_ + slop;        alloc_ptr_ += needed;        alloc_bytes_remaining_ -= needed;    }    else    {        // AllocateFallback always returned aligned memory        result = AllocateFallback(bytes);    }    assert((reinterpret_cast<uintptr_t>(result) & (align - 1)) == 0);    return result;}

This function ensures that the memory is aligned. First, the following code adjusts the position of the alloc_ptr_ so that it aligns:

const int align = (sizeof(void *) > 8) ? sizeof(void *) : 8;assert((align & (align - 1)) == 0); // Pointer size should be a power of 2size_t current_mod = reinterpret_cast<uintptr_t>(alloc_ptr_) & (align - 1);size_t slop = (current_mod == 0 ? 0 : align - current_mod);size_t needed = bytes + slop;

The remainder of the code is consistent with allocate (size_t bytes).

Skiplist class

The last is the most important hop table, the jump table is implemented by the Skiplist class:

Template <typename Key, class Comparator>class skiplist{private:struct Node;    Public://...//Insert key into the list.    Requires:nothing that compares equal to key are currently in the list.    void Insert (const Key &key);    Returns true iff an entry, compares equal to key are in the list.    BOOL Contains (const Key &key) const;  Iteration over the contents of a skip list class Iterator {//...};    Private://...//immutable after construction Comparator const compare_; Arena *const Arena_;    Arena used for allocations of nodes Node *const Head_;  Modified only by Insert ().    Read racily by readers, but stale//values is OK. Port::atomicpointer Max_height_;    Height of the entire list//...    Node *newnode (const Key &key, int height);  ...//Return true if key is greater than the data stored in "n" bool Keyisafternode (const key &key, Node *n)    Const Return the EARliest node, comes at or after key.    Return nullptr If there is no such node. If prev is non-null, fills Prev[level] with pointer to previous//node at "level" for every level in [0..max_    HEIGHT_-1].    Node *findgreaterorequal (const Key &key, node **prev) const;    Return the latest node with a key < key.    Return Head_ If there is no such node.    Node *findlessthan (const Key &key) const;    Return the last node in the list.    Return Head_ if list is empty.    Node *findlast () const; // ...};

There are four main members in the Jump table, one is compare_ for key comparison, one is Arena_ for memory management, one is Head_ point to the beginning of the Jump table node, the last is the Max_height_ to jump table maximum height.

Of course, there are two classes and structs in the jump table, one is the iterator class for skipping the table and the other is the node structure, which is used to represent the nodes in the Jump table.

Node structure

The first is the node structure:

template <typename Key, class Comparator>struct SkipList<Key, Comparator>::Node{    // ...    Key const key;    // Accessors/mutators for links.  Wrapped in methods so we can    // add the appropriate barriers as necessary.    Node *Next(int n)    // ...    void SetNext(int n, Node *x)    // ...    // No-barrier variants that can be safely used in a few locations.    Node *NoBarrier_Next(int n)    // ...    void NoBarrier_SetNext(int n, Node *x)    // ...  private:    // Array of length equal to the node height.  next_[0] is lowest level link.    port::AtomicPointer next_[1];};

Node key stores the actual stored kv values in the nodes, Next_ to a variable-length array (the last element of the struct, if it is an array, is variable in length), to store a pointer to the next node on the linked list of the node at different heights.

Creating node uses a very magical function:

template <typename Key, class Comparator>typename SkipList<Key, Comparator>::Node *SkipList<Key, Comparator>::NewNode(const Key &key, int height){    char *mem = arena_->AllocateAligned(        sizeof(Node) + sizeof(port::AtomicPointer) * (height - 1));    return new (mem) Node(key);}

This function first allocates a piece of aligned memory, and then new (mem) Node(key) creates a node through this statement, which I don't quite understand.

Iterator class

Next is the iterator class:

 //iteration over the contents of a skip list class Iterator {public:///.         .        Returns true iff the iterator is positioned at a valid node.        BOOL Valid () const;        Returns the key at the current position.        Requires:valid () const Key &key () const;        Advances to the next position.        Requires:valid () void Next ();        Advances to the previous position.        Requires:valid () void Prev ();        Advance to the first entry with a key >= target void Seek (const key &target);        Position at the first entry in list.        Final state of Iterator was Valid () IFF list is not empty.        void Seektofirst ();        Position at the last entry in list.        Final state of Iterator was Valid () IFF list is not empty.      void Seektolast ();        Private:const skiplist *list_;        Node *node_; // ...    };

This class has two members, a list_ that points to an iteration, and node_ points to the node of the current iteration.

The traversal of a jump table requires an iterator of type iterator, and then iterates through the table by calling the interface function provided by the iterator class. The interface functions provided by the iterator class are actually implemented by invoking the private interface functions provided by the Skiplist class.

The private interface functions of the Skiplist class are as follows:

Template <typename Key, class Comparator>typename Skiplist<key, Comparator>::node *skiplist<key,    Comparator>::findgreaterorequal (const Key &key, node **prev) const{node *x = Head_;    int level = Getmaxheight ()-1;        while (true) {Node *next = X->next (level);        if (Keyisafternode (key, Next)) {//Keep searching in this list x = Next;            } else {if (prev! = nullptr) Prev[level] = x;            if (level = = 0) {return next;            } else {//Switch to next list level--; }}}}template <typename Key, Class Comparator>typename Skiplist<key, Comparator>::node *SkipList&lt ;    Key, Comparator>::findlessthan (const Key &key) const{Node *x = Head_;    int level = Getmaxheight ()-1; while (true) {assert (x = = Head_ | | compare_ (X-&GT;KEY,Key) < 0);        Node *next = X->next (level);                if (next = = Nullptr | | compare_ (NEXT-&GT;KEY, key) >= 0) {if (level = = 0) {            return x;            } else {//Switch to next list level--;        }} else {x = next; }}}template <typename Key, Class Comparator>typename Skiplist<key, Comparator>::node *SkipList<Key, Com    Parator>::findlast () const{Node *x = Head_;    int level = Getmaxheight ()-1;        while (true) {Node *next = X->next (level);            if (next = = nullptr) {if (level = = 0) {return x;            } else {//Switch to next list level--;        }} else {x = next; }    }}

The implementation of these three functions is relatively simple, similar to the traversal of a linked list, do not do a specific analysis.

In addition, the Jump table has two interface functions for inserting and checking whether a key is included, and the implementation of the two functions also uses the private interface function of the Skip table:

template <typename Key, class Comparator>void SkipList<Key, Comparator>::Insert(const Key &key)template <typename Key, class Comparator>bool SkipList<Key, Comparator>::Contains(const Key &key) const
Memtable member functions

After analyzing the above, we begin to analyze the interface functions provided by memtable:

 void Memtable::add (SequenceNumber s, ValueType type, const Slice &key, const Slice &value) {//Format of an entry is concatenation of://Key_size:varint32 of Internal_key.size () Key Bytes:char[internal_key.size ()]//Value_size:varint32 of Value.size ()//value Bytes:char[va Lue.size ()] size_t key_size = Key.size (); size_t val_size = Value.size (); size_t internal_key_size = key_size + 8; Const size_t Encoded_len = Varintlength (internal_key_size) + internal_key_size + varintlength (val_size) + VA L_size; Char *buf = Arena_. Allocate (Encoded_len); Char *p = EncodeVarint32 (buf, internal_key_size); memcpy (P, Key.data (), key_size); p + = key_size; ENCODEFIXED64 (P, (S << 8) | type); p + = 8; p = EncodeVarint32 (P, val_size); memcpy (P, Value.data (), val_size); ASSERT (p + val_size = = buf + Encoded_len); Table_. Insert (BUF);} 

The

Add function consists of a key and a value Key_size | sequencenumber | type | key | value_size | value in such a form, and then calls the Insert function provided by the Skiplist class to insert the jump table.

BOOL Memtable::get (const LookupKey &key, std::string *value, Status *s) {Slice Memkey = Key.memtable_key ();    Table::iterator iter (&table_); Iter.    Seek (Memkey.data ()); if (ITER). Valid ()) {//Entry format is://klength Varint32//UserKey char[klength]//t AG UInt64//vlength Varint32//value char[vlength]//Check that it belongs to same  User key. We don't check the//sequence number since the Seek () call above should has skipped//all entries with O        verly large sequence numbers.        const char *entry = Iter.key ();        uint32_t key_length;        const char *KEY_PTR = GETVARINT32PTR (entry, entry + 5, &key_length); if (Comparator_.comparator.user_comparator ()->compare (Slice (Key_ptr, key_length-8), Ke Y.user_key ()) = = 0) {//Correct user key const uint64_t tag = DECODEFIXED64 (Key_ptr + key_le NGTH-8);                Switch (static_cast<valuetype> (tag & 0xff)) {case Ktypevalue: {                Slice v = getlengthprefixedslice (key_ptr + key_length);                Value->assign (V.data (), v.size ());            return true;                } case ktypedeletion: *s = Status::notfound (Slice ());            return true; }}} return false;}

The Get function first finds the node that contains the key with the iterator of the jump table, and then decodes the data in the node and encapsulates it as slice into value.

Memtableiterator class

The memtable also encapsulates an iteration of the Memtableiterator class for memtable:

 class Memtableiterator:public iterator{public:explicit memtableiterator (memtable::table *table): Iter_ (tabl e) {} virtual bool Valid () const {return iter_. Valid (); } virtual void Seek (const Slice &k) {iter_. Seek (Encodekey (&tmp_, k)); } virtual void Seektofirst () {iter_. Seektofirst (); } virtual void Seektolast () {iter_. Seektolast (); } virtual void Next () {iter_. Next (); } virtual void Prev () {iter_. Prev (); } Virtual Slice key () const {return Getlengthprefixedslice (Iter_.key ());} Virtual Slice value () const {Slice Key_slice = Getlengthprefixedslice (Iter_.key ()); Return Getlengthprefixedslice (Key_slice.data () + key_slice.size ()); } Virtual Status status () const {return Status::ok ();} Private:memtable::table::iterator Iter_; Std::string Tmp_; For passing to Encodekey//No copying allowed memtableiterator (const memtableiterator &); void operator= (const memtableiterator &); 

Traversing memtable requires an iterator to the Memtableiterator class, which is then traversed by invoking the interface functions provided by the Memtableiterator class. The Memtableiterator class actually implements the encapsulation of a hopping table iterator, the member is an iterator Iter_, and the interface function is implemented by invoking the interface function of the skip table iterator.

227 Love u

LEVELDB Source Analysis-memtable

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.