Intset of Redis underlying data structure

Source: Internet
Author: User

Recently, I wanted to learn Redis by using Redis's source code. Although the usual work is not used much, but the Redis is still more interested in, after all, its performance is good. Redis is an open source project that we can use to understand Redis through source code. I will later through their own study, write some about the Redis source code posts. The main content of the post is the analysis of the code design, and does not explain the source code in detail. If there is a wrong place, please correct me. Source code is Reids 3.0.3 version.


Intset


First, INTSET data structure

Intset, which is the data structure used to store integer collections. Set is characterized by the absence of duplicate elements, which can be unordered. However, Redis's intset is an orderly data structure of elements.

First look at the definition of Intset:

typedef struct INTSET {uint32_t encoding;    Integer encoding type uint32_t length;  Intset size, number of elements int8_t contents[]; Data store} Intset;

First, combined with the relevant operation of Intset, then to talk about the characteristics of intset. The following part of the code to illustrate intset behavior characteristics.


II. related operation functions provided by Inetset

Intset External exposure function:

Intset *intsetnew (void); Create an empty Intsetintset *intsetadd (intset *is, int64_t value, uint8_t *success);  Insert Element Intset *intsetremove (intset *is, int64_t value, int *success);  Delete element uint8_t Intsetfind (intset *is, int64_t value);   Find element int64_t Intsetrandom (Intset *is);    Randomly selected elements uint8_t intsetget (Intset *is, uint32_t pos, int64_t *value); Gets the element uint32_t Intsetlen (Intset *is) at the specified position;   Gets the number of Intset elements size_t Intsetbloblen (Intset *is); Get the amount of space used by Intset

1. Integer-coded

/* note that these encodings are ordered, so: * intset_enc_int16  < intset_enc_int32 < intset_enc_int64. */#define  INTSET_ENC_INT16  ( sizeof (int16_t)) #define  INTSET_ENC_INT32  (sizeof (int32_t)) #define  INTSET_ENC_INT64  (sizeof (int64_t)) /* return the required encoding for the provided value. */static  uint8_t _intsetvalueencoding (int64_t v)  {    if  (v <  int32_min | |  v > int32_max)         return INTSET_ENC_INT64;     else if  (v < int16_min | |  v > int16_max)         return INTSET_ENC_INT32;     else        return intset_enc_int16;}


Supports int16_t, int32_t, int64_t four types of storage. The value for each type is its space size, so that the definition can satisfy Intset_enc_int16 < Intset_enc_int32 < Intset_enc_int64, and the range of the latter type contains the preceding domain values.

Note: The _intsetvalueencoding function is declared as static and is a function inside the file that is not visible externally.


2. Get the element at the specified position

/* return the value at pos, given an encoding. */static  Int64_t _intsetgetencoded (Intset *is, int pos, uint8_t enc)  {     int64_t v64;    int32_t v32;    int16_t v16     //to parse an array of data elements according to the encoded type passed in     //intset is stored on a small side, so it is possible to convert the acquired element to a small end     if  (Enc == intset_enc_int64)  {         memcpy (&v64, ((int64_t*) is->contents) +pos,sizeof (v64));         memrev64ifbe (&v64);        return v64;     } else if  (Enc == intset_enc_int32)  {         memcpy (&v32, ((int32_t*) is->contents) +pos,sizeof (v32));         Memrev32ifbe (&AMP;V32);        return v32;     } else {        memcpy (&v16, (int16_t*) is-> Contents) +pos,sizeof (V16));         memrev16ifbe (&AMP;V16);         return v16;    }}/* Return the  Value at pos, using the configured encoding. */static int64_t _ Intsetget (intset *is, int pos)  {    //Pass the encoding type of the Intset record into the function      return _intsetgetencoded (Is,pos,intrev32ifbe (is->encoding));}

Although Intset internally encodes integers, int64_t types are exposed externally.

Internally, it is possible to save storage space by encoding data for storage.

Because the intset is stored on a small side, the size-to-end conversion is required when reading or writing data. But at the moment I do not understand the need to do so, because I think Intset is only the internal data structure, you can not require hard to use the size of the storage side.


3. Inserting elements (may cause intset recombination)


/* insert an integer in the intset */intset *intsetadd (Intset *is ,  int64_t value, uint8_t *success)  {    uint8_t valenc  = _intsetvalueencoding (value);    uint32_t pos;    if  ( Success)  *success = 1;    /* Upgrade encoding if  necessary. if we need to upgrade, we know that      * this value should be either appended  (if > 0)  or  prepended  (if < 0),      * because it lies  outside the range of existing values. */    if  ( Valenc > intrev32ifbe (is->encoding))  {        // If the encoding type of value is greater than the encoding of the current IntsetThe type is large, the element must not be in the Intset,         //needs to be inserted, and the encoding type of the intset needs to be extended. When upgrading the encoding type of Intset,         //dynamically requests a larger space to store the expanded length+1 elements,         //See  intsetupgradeandadd implementation          /* this always succeeds, so we don ' T need to curry  *success. */        return intsetupgradeandadd (IS, value);     } else {        /* abort  if the value is already present in the set.          * This call will populate  "POS"  with  the right position to insert         *  the value when it cannot  be found. */        //if the element already exists, do not insert          if  (Intsetsearch (is,value,&pos))  {             if  (Success)  *success = 0;             return is;         }        //otherwise inserted into the correct position, the correct position is the position where the array remains orderly after insertion.         is = intsetresize (Is,intrev32ifbe (is->length) +1);         //need to move the elements behind to make room          if  (Pos < intrev32ifbe (is->length))  intsetmovetail (is,pos,pos+1);        }    _intsetset (is,pos,value);     is- >length = intrev32ifbe (Intrev32ifbe (Is->length) +1);     return is;} 

It is important to note that if the encoded type of the data being inserted is larger than the encoding type of the current intset, in order for the new element to be stored, the intset needs to be type-expanded. Extending the type will extend all the elements in the Intset to the new type, which requires a reorganization of the intset, which will take a little time when the original number of elements is much more reorganized. However, the Intset will be reorganized up to two times. Because it is possible to reorganize, this is one of the reasons why Intset is not suitable for storing large numbers of elements. When the number of Intset elements in Redis is 512 by default, additional data structures are used to store them.


4. Find elements

/* search for the position of  "value".  return 1 when the  value was found and * sets  "POS"  to the position of  the value within the intset. return 0 when * the value  is not present in the intset and sets  "POS"  to the  position * where  "Value"  can be inserted. */static uint8_t  Intsetsearch (Intset *is, int64_t value, uint32_t *pos)  {     Int min = 0, max = intrev32ifbe (is->length) -1, mid = -1;     int64_t cur = -1;    /* the value can  never be found when the set is empty */     if  (Intrev32ifbe (IS-&Gt;length)  == 0)  {        if  (POS)  *pos =  0;        return 0;    } else  {        /* check for the case where  we know we cannot find the value,          * but do know the insert position. */         if  (Value > _intsetget (IS,INTREV32IFBE (is->length)-1))  {            if  (POS)  *pos =  intrev32ifbe (is->length);             return 0;        } else if  (value < _ Intsetget (is,0))  {            if  (POS)  *pos = 0;             return 0;         }    }    //Two-point find     while (max  >= min)  {        mid =  ((unsigned int) min  +  (unsigned int) max)  >> 1;        cur  = _intsetget (Is,mid);        if  (value >  cur)  {            min = mid+1;         } else if  (value < cur)  {             max = mid-1;         } else {            break;         }    }    if  (value  == cur)  {        if  (POS)  *pos =  mid;        return 1;    } else {         if  (POS)  *pos = min;         return 0;    }}/* determine whether a  value belongs to this set */uint8_t intsetfind (Intset *is, int64 _t value)  {    uint8_t valenc = _intsetvalueencoding (value);     //if the encoding type of value is greater than the encoding type of intset, then value must not exist in Intset     return  Valenc <= intrev32iFBE (is->encoding)  && intsetsearch (is,value,null);} 

The time complexity of the lookup is O (Logn), which is acceptable for n comparison hours, but when n is large, such as 1000000, the number of comparisons can be 20 times.


5. Delete element (does not cause intset reorganization)

/* delete integer from intset */intset *intsetremove (Intset *is, int64 _t value, int *success)  {    uint8_t valenc = _ Intsetvalueencoding (value);    uint32_t pos;    if  (Success)  *success = 0;    if  (Valenc <= intrev32ifbe (is-> Encoding)  && intsetsearch (is,value,&pos))  {         uint32_t len = intrev32ifbe (is->length);         /* We know we can delete */         if  (Success)  *success = 1;        /*  Overwrite value with tail and update length */         if  (pos <  (len-1))  intsetmovetail (is,pos+1,pos);         is  = intsetresize (is,len-1);     //Dynamic adjustment Space          is->length = intrev32ifbe (len-1);     }    return  is;}

Intset does not perform an element's data type check when deleting an element to downgrade an integral type of intset, and Intset does not provide additional capabilities. The reason I think is roughly: a. All elements need to be scanned to determine whether encoding type demotion is currently required. B. Downgrading the encoding type causes a reorganization. C. The downgrade may require a further upgrade, if the unfortunate occurrence of frequent escalation degraded reorganization, performance instability.


Combined with the insertion and deletion function, the intsetresize is used, and the zrealloc is used to increase or decrease the memory space. When you increase space, you either reassign a new space or expand it at the end of the original space. To reduce space, only the original space can be reduced at the end. This is not a very easy way to generate memory fragmentation.


Third, the characteristics (advantages and disadvantages need to be compared to show, because there is no comparison, the advantages and disadvantages are not obvious, so here only to list the characteristics):


1. An ordered array, the storage elements are tight, the space utilization is high, and it is not easy to create memory fragmentation due to frequent insertions and deletions.

2. Integer encoding is supported, and the storage type of all data elements in Intset is consistent. When new data is inserted, if the data type is greater than the current Intset data type, the stored integer type can be expanded (but Intset does not provide the ability to reduce the encoding type). Space is saved on a certain program, but all accesses need to be considered for type, increasing the complexity of the code.

3. There is a length field to record the number of elements, the number of elements operation is a constant.

4. Ordered array, find element O (logn)

5. When inserting an element, you may need to move up to n elements, or you can upgrade the encoding type and reorganize the Intset. Due to these two features, Intset is not suitable for frequent insertion and storage of a large number of elements.

6. When deleting an element, you may need to move up to N-1 elements.

7. Store data in small-ended form, including encoding,length and data elements.


This article is from the "Chhquan" blog, make sure to keep this source http://chhquan.blog.51cto.com/1346841/1770574

Intset of Redis underlying data structure

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.