I studied intset this time. During the study, I couldn't see it at one time, but I bit my teeth and thought it was the case. I just calmed down and thought it was not impetuous.
Redis has made a lot of Optimizations in storage for efficiency. For example, intset is the data structure customized by the author to save memory, including the compressed list to be read later.
Intset is an ordered Integer Set. It provides interfaces for adding, deleting, and searching. It provides conversion of different codes for uint16_t uint32_t uint64_t (strictly speaking, only type improvement is required)
First, let's take a look at its structure definition:
typedef struct intset { uint32_t encoding; uint32_t length; int8_t contents[]; } intset;
Encoding: There are several types of encoding:
#define INTSET_ENC_INT16 (sizeof(int16_t))#define INTSET_ENC_INT32 (sizeof(int32_t))#define INTSET_ENC_INT64 (sizeof(int64_t))
In fact, using a uint8_t storage is enough.
Length: the number of integers in the current Integer Set.
Contents []: The storage location. Here, a byte is used as the storage unit to facilitate high-type addressing.
Let's take a look at its external interfaces:
intset *intsetNew(void); intset *intsetAdd(intset *is, int64_t value, uint8_t *success); intset *intsetRemove(intset *is, int64_t value, int *success); uint8_t intsetFind(intset *is, int64_t value); int64_t intsetRandom(intset *is);uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value); uint32_t intsetLen(intset *is);size_t intsetBlobLen(intset *is);
A Data Structure must provide interfaces such as insert, query, and delete. In addition, do not expose internal interfaces. The interfaces provided here are analyzed in detail.
Initialization interface:
/* Create an empty intset. */intset *intsetNew(void) { intset *is = malloc(sizeof(intset)); is->encoding = intrev32ifbe(INTSET_ENC_INT16); is->length = 0; return is; }
There is no difficulty. Note that the minimum 2-byte storage is used by default.
/* Insert an integer in the intset */intset *intsetAdd(intset *is, int64_t value, uint8_t *success) { uint8_t valenc = _intsetValueEncoding(value); uint32_t pos; if (success) *success = 1; /* Upgrade encoding if necessary. If we need to upgrade, we know that * this value should be either appended (if > 0) or prepended (if < 0), * because it lies outside the range of existing values. */ if (valenc > intrev32ifbe(is->encoding)) { /* This always succeeds, so we don't need to curry *success. */ return intsetUpgradeAndAdd(is,value); } else { /* Abort if the value is already present in the set. * This call will populate "pos" with the right position to insert * the value when it cannot be found. */ if (intsetSearch(is,value,&pos)) { if (success) *success = 0; return is; } is = intsetResize(is,intrev32ifbe(is->length)+1); if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1); } _intsetSet(is,pos,value); is->length = intrev32ifbe(intrev32ifbe(is->length)+1); return is;}
This interface is difficult to analyze:
1. First, determine whether the encoding of the value to be added is greater than the current encoding. If the encoding is greater than the current encoding, the type is upgraded and the value is added.
2. If the value is smaller than the current encoding, the system first queries whether the data exists. If the value exists, the system returns the value. If the value does not exist, the system sets the insertion position POS.
3. re-allocate the memory size
4. Moving data. Moving all data back is a little complicated.
5. Insert data and set the Data Count
The interface for raising the type and inserting the value is as follows:
/* Upgrades the intset to a larger encoding and inserts the given integer. */static intset *intsetUpgradeAndAdd(intset *is, int64_t value) { uint8_t curenc = intrev32ifbe(is->encoding); uint8_t newenc = _intsetValueEncoding(value); int length = intrev32ifbe(is->length); int prepend = value < 0 ? 1 : 0; /* First set new encoding and resize */ is->encoding = intrev32ifbe(newenc); is = intsetResize(is,intrev32ifbe(is->length)+1); /* Upgrade back-to-front so we don't overwrite values. * Note that the "prepend" variable is used to make sure we have an empty * space at either the beginning or the end of the intset. */ while(length--) _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc)); /* Set the value at the beginning or the end. */ if (prepend) _intsetSet(is,0,value); else _intsetSet(is,intrev32ifbe(is->length),value); is->length = intrev32ifbe(intrev32ifbe(is->length)+1); return is;}
We can see that the process of type improvement is as follows:
1. Because the Integer Set is ordered, first determine whether the number to be added is positive or negative. The positive number is added at the end, and the negative number is added at the header.
2. Increase the memory size.
3. Moving data is linked to the first step, and the moving process is hard to understand. First, extract the data according to the original encoding, and then insert the data according to the new encoding.
4. insert data in the header or tail
5. modify data count
In addition, the mobile data interface is as follows:
static void intsetMoveTail(intset *is, uint32_t from, uint32_t to) { void *src, *dst; uint32_t bytes = intrev32ifbe(is->length)-from; uint32_t encoding = intrev32ifbe(is->encoding); if (encoding == INTSET_ENC_INT64) { src = (int64_t*)is->contents+from; dst = (int64_t*)is->contents+to; bytes *= sizeof(int64_t); } else if (encoding == INTSET_ENC_INT32) { src = (int32_t*)is->contents+from; dst = (int32_t*)is->contents+to; bytes *= sizeof(int32_t); } else { src = (int16_t*)is->contents+from; dst = (int16_t*)is->contents+to; bytes *= sizeof(int16_t); } memmove(dst,src,bytes);}
Because it is continuous memory, find the starting position of the movement, and then memmove (), bingo !!!
Data query interface implementation:
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) { int min = 0, max = intrev32ifbe(is->length)-1, mid = -1; int64_t cur = -1; /* The value can never be found when the set is empty */ if (intrev32ifbe(is->length) == 0) { if (pos) *pos = 0; return 0; } else { /* Check for the case where we know we cannot find the value, * but do know the insert position. */ if (value > _intsetGet(is,intrev32ifbe(is->length)-1)) { if (pos) *pos = intrev32ifbe(is->length); return 0; } else if (value < _intsetGet(is,0)) { if (pos) *pos = 0; return 0; } } while(max >= min) { mid = ((unsigned int)min + (unsigned int)max) >> 1; cur = _intsetGet(is,mid); if (value > cur) { min = mid+1; } else if (value < cur) { max = mid-1; } else { break; } } if (value == cur) { if (pos) *pos = mid; return 1; } else { if (pos) *pos = min; return 0; }}
Still a binary search, niubility !!! I personally feel that the efficiency of this data structure is reflected here, because it is ordered, so the search is fast, because it is an array, so insertion, deletion, is a continuous memory copy, but also very fast
Some time, I suddenly want to see how STL vector is implemented. How is its insert implemented?
Redis source code analysis-Memory Data Structure intset