Tair source Analysis--leveldb Storage engine use

Source: Internet
Author: User

After analysis of Leveldb, the next time to prepare the team Tair source for reading and analysis. We just finished analyzing leveldb. LEVELDB is one of its major storage engines in Tair, so we'll start by looking at how Tair uses and modifies leveldb in this distributed storage engine to persist area and bucket to storage, and convenient for the processing of buckets and area.

We first look at the structure of key in Tair, we confirm the processing of key by roughly combing the process of storing and querying a kv. TAIR_CLIENT_API::p UT calls Tair_client_impl directly: the key and value (data) are judged first in:p Ut,impl, and then called Get_server_ The ID obtains the server_id of the current key and sends the assembled REQUEST_PUT to the server side. After the server receives the message Handlepacketqueue decodes and then calls the corresponding process function, here is request_processor::p rocess (Request_put *, ...) This function. This function will do some plugin and whether it is stored locally and so on some processing and judgment, we just need to focus on its call to Tair_manager::p ut, this function can be handled in the face of a certain

Data_entry Mkey = key;//Key merged with areaMkey.merge_area (area);intBucket_number =Get_bucket_number (key);voidMerge_area (int_area) {         if(has_merged) {return; }         if(Size <0)return; //Now should check was alloc by me. I have 2 extra head.         if(m_true_size==size+2) {m_true_data[0]= (_area &0xFF); m_true_data[1]= (_area >>8) &0xFF); Size=m_true_size; Data=M_true_data; }Else{            //some re-allocating space and then processing similar to the IF branch         }       }

Here the area code is placed in the top two of the Key_entry, next look at ldbinstance::p ut, inside processing get cdate, Mdate, Edate, and then get a

Ldbkey Ldb_key (Key.get_data (), Key.get_size (), Bucket_number, edate);//and look at Ldbkey's handling.Ldbkey (Const Char* Key_data, int32_t key_size, int32_t bucket_number, uint32_t expired_time =0): Data_ (NULL), Data_size_ (0), ALLOC_ (false)        {          Set(Key_data, Key_size, Bucket_number, expired_time); } Inlinevoid Set(Const Char*Key_data, int32_t key_size, int32_t bucket_number, uint32_t expired_time) {          if(Key_data! = NULL && key_size >0) {free (); Data_size_= Key_size +ldb_key_meta_size; Data_=New Char[Data_size_];            Build_key_meta (Data_, Bucket_number, expired_time); memcpy (Data_+ldb_key_meta_size, Key_data, key_size); ALLOC_=true; }        }
        staticvoid build_key_meta (char0)        {          encode_fixed32 (buf, expired_time);           + ldb_expired_time_size, bucket_number);        }
        staticvoid encode_bucket_number (charint  bucket_number)        {            for (int0; i < ldb_key_bucket_num_size; + +i)//ldb_key_bucket_num_ SIZE = 3           {            1] = (bucket_number >> (i*80xFF;          }        }

As can be seen from the code, this time to the front of key is encoded 4 bytes of Expired_time and 3 bytes of bucket_no, by this time the entire composition of the can become

Expire (4B) | Bucket_no (3B) | Area (2B) | User_key

You can see that there will be db_version_care_ and put version_care judgment, here is not described in detail, the main function is if set db_version_care_ Then each time you put the need to query whether there has been inserted data to remove it, after the insertion of the version_care to determine whether the insertion of the incoming version and the current database is stored in the version is equal, if not equal to this insertion failure, If the equivalent can be inserted (update), and will change the version of key 1, after the corresponding set of these settings can be inserted into the corresponding KV leveldb. What needs to be explained here is that a pure memory storage Memdb is implemented in the Tair storage engine, and Tair's leveldb has been modified to add a Db_cache concept. Here you can choose to use Memdb as a cache of leveldb to speed up the query.

We will then insert the re-assembled KV pair into the leveldb, but have you found any problems? Yes, that is, the key here is to start with expire, and we recall that LEVELDB internal default use of comparator is bytewisecomparator, If the usage rate of this comparator words leveldb inside will first according to expire time to sort. From the processing of data expiration, sort by expire then we can easily get rid of expire key, but we know that Leveldb is going to be migrated, When we need to migrate an area or bucket of data to another machine, wouldn't we be scanning the entire database? How can I determine the expiry time of this key when I am looking for phase two? If you are not sure, you cannot get the corresponding value. Don't worry, our leveldb is not comparator this thing, using the default bytewisecomparator can not meet the requirements, So tair himself realized a comparator called Ldbcomparator to meet our needs. Let's take a look at its compare code.

intLdbcomparatorimpl::compare (Constleveldb::slice& A,Constleveldb::slice& b)Const { //ldb_compare_skip_size = Ldb_expired_time_size = sizeof (uint32_t);  ASSERT (A.size ()      > Ldb_compare_skip_size && b.size () >ldb_compare_skip_size); Const intMin_len = (A.size () < B.size ())? A.size ()-ldb_compare_skip_size:b.size ()-ldb_compare_skip_size; intR = memcmp (A.data () + ldb_compare_skip_size, B.data () +ldb_compare_skip_size, Min_len); if(r = =0)        {          if(A.size () <b.size ()) {R= -1; }          Else if(A.size () >b.size ()) {R= +1; }        }        returnR; }

Here Compare first skips 4 bytes and then memcmp, and we know that 4 bytes is the length of expire.

So we finally came to the conclusion that Tair was refactoring the key when using LEVELDB, and the key structure passed into the refactoring was

Expire (4B) | Bucket_no (3B) | Area (2B) | User_key

And the use of their own implementation of the comparator skip the beginning of the Expire4 bytes, so that the final data stored in the entire LEVELDB to bucket_no sorted first, then the area, and finally the installation User_key, After such processing, we can only install Bucket_no to generate the corresponding interval scan key when we migrate in buckets.

This paper introduces the key storage structure of LEVELDB, hoping to clarify tair internal storage from this point, and then further analyze the implementation of Tair in this context.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.