標籤:redis 源碼 儲存 二分尋找
這次研究了一下intset,研究的過程中,一度看不下過去,但是還是咬牙挺過來了,看懂了也就是那麼回事,靜下心來,切莫浮躁
Redis為了追求高效,在儲存下做了很多的最佳化,像intset就是作者為了節約記憶體定製的資料結構,包括後面將要閱讀的壓縮列表。
intset是一個有序的整數集,提供了增加,刪除,尋找的介面,針對uint16_t uint32_t uint64_t,提供了不同編碼的轉換(嚴格的說只是類型的提升)
首先,看一下它的結構定義:
typedef struct intset { uint32_t encoding; uint32_t length; int8_t contents[]; } intset;
encoding:有如下幾種編碼
#define INTSET_ENC_INT16 (sizeof(int16_t))#define INTSET_ENC_INT32 (sizeof(int32_t))#define INTSET_ENC_INT64 (sizeof(int64_t))
實際上這裡使用一個uint8_t儲存就夠了
length:當前整數集有多少個整數
contents[]:具體儲存的位置,這裡以一個位元組為儲存單元,方便對高類型進行定址
看一下它對外提供的介面:
intset *intsetNew(void); intset *intsetAdd(intset *is, int64_t value, uint8_t *success); intset *intsetRemove(intset *is, int64_t value, int *success); uint8_t intsetFind(intset *is, int64_t value); int64_t intsetRandom(intset *is);uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value); uint32_t intsetLen(intset *is);size_t intsetBlobLen(intset *is);
一種資料結構,必然要提供類似插入,查詢,刪除這樣的介面,另外不要暴露內部使用的介面,這裡提供的介面,我們具體分析幾個
初始化介面:
/* Create an empty intset. */intset *intsetNew(void) { intset *is = malloc(sizeof(intset)); is->encoding = intrev32ifbe(INTSET_ENC_INT16); is->length = 0; return is; }
沒什麼難的,注意預設使用最低的2位元組儲存
/* Insert an integer in the intset */intset *intsetAdd(intset *is, int64_t value, uint8_t *success) { uint8_t valenc = _intsetValueEncoding(value); uint32_t pos; if (success) *success = 1; /* Upgrade encoding if necessary. If we need to upgrade, we know that * this value should be either appended (if > 0) or prepended (if < 0), * because it lies outside the range of existing values. */ if (valenc > intrev32ifbe(is->encoding)) { /* This always succeeds, so we don't need to curry *success. */ return intsetUpgradeAndAdd(is,value); } else { /* Abort if the value is already present in the set. * This call will populate "pos" with the right position to insert * the value when it cannot be found. */ if (intsetSearch(is,value,&pos)) { if (success) *success = 0; return is; } is = intsetResize(is,intrev32ifbe(is->length)+1); if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1); } _intsetSet(is,pos,value); is->length = intrev32ifbe(intrev32ifbe(is->length)+1); return is;}
這個介面比較有難度,具體分析:
1、首先判斷要增加的值的編碼是否大於當前編碼,大於則進行型別提升,並加入value
2、如果小於當前編碼,首先查詢資料是否存在,存在則返回,不存在則設定插入位置pos
3、重新分配記憶體大小
4、移動資料,所有資料往後移動,複雜度有點高啊
5、插入資料,設定資料個數
其中,型別提升並插入value的介面如下:
/* Upgrades the intset to a larger encoding and inserts the given integer. */static intset *intsetUpgradeAndAdd(intset *is, int64_t value) { uint8_t curenc = intrev32ifbe(is->encoding); uint8_t newenc = _intsetValueEncoding(value); int length = intrev32ifbe(is->length); int prepend = value < 0 ? 1 : 0; /* First set new encoding and resize */ is->encoding = intrev32ifbe(newenc); is = intsetResize(is,intrev32ifbe(is->length)+1); /* Upgrade back-to-front so we don't overwrite values. * Note that the "prepend" variable is used to make sure we have an empty * space at either the beginning or the end of the intset. */ while(length--) _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc)); /* Set the value at the beginning or the end. */ if (prepend) _intsetSet(is,0,value); else _intsetSet(is,intrev32ifbe(is->length),value); is->length = intrev32ifbe(intrev32ifbe(is->length)+1); return is;}
可以看到,型別提升的過程如下:
1、因為整數集是有序的,所以首先判斷要加入的數是正數還是負數,正數就在尾部添加,負數則在頭部添加
2、增加記憶體大小
3、移動資料,這裡和第一步掛鈎,而且移動的過程比較難以理解,首先根據原來編碼取出資料,然後根據新的編碼插入資料
4、插入資料,在頭部還是尾部插入
5、修改資料個數
另外移動資料的介面如下:
static void intsetMoveTail(intset *is, uint32_t from, uint32_t to) { void *src, *dst; uint32_t bytes = intrev32ifbe(is->length)-from; uint32_t encoding = intrev32ifbe(is->encoding); if (encoding == INTSET_ENC_INT64) { src = (int64_t*)is->contents+from; dst = (int64_t*)is->contents+to; bytes *= sizeof(int64_t); } else if (encoding == INTSET_ENC_INT32) { src = (int32_t*)is->contents+from; dst = (int32_t*)is->contents+to; bytes *= sizeof(int32_t); } else { src = (int16_t*)is->contents+from; dst = (int16_t*)is->contents+to; bytes *= sizeof(int16_t); } memmove(dst,src,bytes);}
因為是連續的記憶體,找到移動的起始位置,然後memmove(),bingo!!!
尋找資料的介面實現:
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) { int min = 0, max = intrev32ifbe(is->length)-1, mid = -1; int64_t cur = -1; /* The value can never be found when the set is empty */ if (intrev32ifbe(is->length) == 0) { if (pos) *pos = 0; return 0; } else { /* Check for the case where we know we cannot find the value, * but do know the insert position. */ if (value > _intsetGet(is,intrev32ifbe(is->length)-1)) { if (pos) *pos = intrev32ifbe(is->length); return 0; } else if (value < _intsetGet(is,0)) { if (pos) *pos = 0; return 0; } } while(max >= min) { mid = ((unsigned int)min + (unsigned int)max) >> 1; cur = _intsetGet(is,mid); if (value > cur) { min = mid+1; } else if (value < cur) { max = mid-1; } else { break; } } if (value == cur) { if (pos) *pos = mid; return 1; } else { if (pos) *pos = min; return 0; }}
還是個二分尋找,niubility!!!個人感覺這種資料結構的高效就體現在這裡,因為是有序,所以尋找快速,因為是數組,所以插入,刪除,是連續記憶體拷貝,也很快
有時間突然想去看一下STL Vector的實現了,它的insert是如何?的?
Redis源碼分析-記憶體資料結構intset