An integer set description An Integer Set intset is used to store multiple integer values in an orderly and non-repeating manner. It is automatically used to save elements based on the value of the element in the set. For example: if int32_t can be used to save the integer with the largest absolute value in intset, all elements in the entire intset are saved using int32_t. If the current intset is of the type
An integer set description An Integer Set intset is used to store multiple integer values in an orderly and non-repeating manner. It is automatically used to save elements based on the value of the element in the set. For example: if int32_t can be used to save the integer with the largest absolute value in intset, all elements in the entire intset are saved using int32_t. If the current intset is of the type
Integer collection
An integer set intset is used to store multiple integer values in an ordered and non-repeated manner. Based on the value of the element in the set, it is automatically selected to use the integer type to save the element. For example: if int32_t can be used to save the integer with the largest absolute value in intset, all elements in the entire intset are saved using int32_t.
If the type used by the current intset cannot save a new element to be added to the intset, You need to upgrade the intset. For example, the type of the new element is int64_t, and the type of the current intset is int32_t, then, the upgrade will first convert all intset elements from int32_t to int64_t, and then insert new elements.
For int8_t, int32_t, and int64_t, my personal understanding should correspond to char, int, long, and use int8_t, int32_t, and int64_t to differentiate platform differences, for details, see stdint. h file.
Data Structure of an Integer Set
Typedef struct intset {uint32_t encoding; // length of the type used, 4 \ 8 \ 16 uint32_t length; // int8_t contents []; // Save the array of elements} intset;
The value of encoding is one of the following three constants:
# Define INTSET_ENC_INT16 (sizeof (int16_t ))
# Define INTSET_ENC_INT32 (sizeof (int32_t ))
# Define INTSET_ENC_INT64 (sizeof (int64_t ))
The contents array is used to actually store data. The features of elements in the array: No repeated elements; the elements are arranged progressively in the array.
Introduction to APIs related to integer Sets
Function Name |
Function |
Complexity |
_ IntsetValueEncoding |
Obtains the encoding type of a given integer. |
O (1) |
_ IntsetGet |
Returns an integer Based on the index. |
O (1) |
_ IntsetSet |
Set integer value based on Index |
O (1) |
IntsetNew |
Create intset |
O (1) |
IntsetResize |
Re-allocate memory for a given intset |
O (1) |
IntsetSearch |
Checks whether the given integer is in intset. |
O (logN) |
IntsetUpgradeAndAdd |
Upgrade intset first and insert element |
O (N) |
IntsetAdd |
Add element directly |
O (N) |
IntsetMoveTail |
Offset element in intset |
O (N) |
IntsetRemove |
Delete Element |
O (N) |
IntsetRandom |
Returns an element of an intset randomly. |
O (1) |
IntsetLen |
Number of elements in intset |
O (1) |
IntsetBlobLen |
Intset bytes |
O (1) |
Simple parsing of important API source code intsetAdd
// Add an integer intset * intsetAdd (intset * is, int64_t value, uint8_t * success) {uint8_t valenc = _ intsetValueEncoding (value); // obtain the type length uint32_t pos; if (success) * success = 1;/* Upgrade encoding if necessary. if we need to upgrade, we know that * this value shoshould be either appended (if> 0) or prepended (if <0), * because it lies outside the range of existing values. * /// if the upgrade is required, update and insert the new value if (valenc> intrev32ifbe (is-> encoding) {/* This always succeeds, so we don't need to curry * success. */return intsetUpgradeAndAdd (is, value);} else {// otherwise/* Abort if the value is already present in the set. * This call will populate "pos" with the right position to insert * the value when it cannot be found. * /// if the value already exists in the Set, if (intsetSearch (is, value, & pos) {if (success) * success = 0; return is;} is = intsetResize (is, intrev32ifbe (is-> length) + 1); // offset all values after the pos position to a position backward, if (pos <intrev32ifbe (is-> length) intsetMoveTail (is, pos, pos + 1);} _ intsetSet (is, pos, value ); // Add the new element is-> length = intrev32ifbe (intrev32ifbe (is-> length) + 1); return is ;}
When the intsetAdd function adds an element value, it first compares the number of bytes of the value with the encoding of the current intset, and analyzes whether the intset needs to be upgraded. If yes, it calls the intsetUpdateAndAdd function for processing, otherwise, if the value already exists in the intset directly pass and does not exist, first resize and then offset all elements after the insertion position to add the value.
IntsetMoveTail
/** Use memmove to offset the set backward. The subscript starts from 0 and has been Resize for example: front | 1 | 2 | 3 | 4 | 5 | 6 | from = 1, to = 3 length = 6 src = | 2 | 3 | 4 | 5 | 6 | dst = | 4 | 5 | 6 | bytes = 5 * sizeof (...) after | 1 | 2 | 3 | 2 | 3 | 4 | 5 | 6 | before the offset, you must use the intsetResize function to scale up. If you do not understand the changes, we recommend that you check the memmove source code. here we need to consider the memory coverage problem, that is, why memmove must be used instead of memcpy */static void intsetMoveTail (intset * is, uint32_t from, uint32_t) {void * src, * dst; uint32_t bytes = intrev32ifbe (is-> length)-from; uint32_t encoding = intrev32ifbe (is-> encoding); if (encoding = bytes) {src = (int64_t *) is-> contents + from; dst = (int64_t *) is-> contents + to; bytes * = sizeof (int64_t );} else if (encoding = INTSET_ENC_INT32) {src = (int32_t *) is-> contents + from; dst = (int32_t *) is-> contents +; bytes * = sizeof (int32_t);} else {src = (int16_t *) is-> contents + from; dst = (int16_t *) is-> contents +; bytes * = sizeof (int16_t);} memmove (dst, src, bytes );}
IntsetUpdateAndAdd
// Upgrade the encoding type. O (n) // the value to be inserted is either greater than the maximum value in the current set or smaller than the minimum value in the set, otherwise, you do not need to upgrade it // It is larger or smaller than the maximum value. You only need to judge static intset * intsetUpgradeAndAdd (intset * is, int64_t value) based on the positive and negative values) {uint8_t curenc = intrev32ifbe (is-> encoding); // The current encoding type uint8_t newenc = _ intsetValueEncoding (value ); // The new encoding type int length = intrev32ifbe (is-> length); int prepend = value <0? 1: 0; // determines where the new value is inserted (1 indicates the header, 0 indicates the end) /* First set new encoding and resize */is-> encoding = intrev32ifbe (newenc); // set the encoding type is = intsetResize (is, intrev32ifbe (is-> length) + 1); // resize/* Upgrade back-to-front so we don't overwrite values. * Note that the "prepend" variable is used to make sure we have an empty * space at either the beginning or the end of the intset. * /// use _ intsetGetEncoded to obtain the integer value of the position before the upgrade. // set the value of the original Integer Set. If prepend = 1, the new value is inserted in the header, the original values are all offset backward while (length --) _ intsetSet (is, length + prepend, _ intsetGetEncoded (is, length, curenc )); /* Set the value at the beginning or the end. */if (prepend) // insert _ intsetSet (is, 0, value) in the header; else // insert _ intsetSet (is, intrev32ifbe (is-> length) at the end ), value); is-> length = intrev32ifbe (intrev32ifbe (is-> length) + 1); return is ;}
IntsetRemove
// Delete an integer intset * intsetRemove (intset * is, int64_t value, int * success) {uint8_t valenc = _ intsetValueEncoding (value); uint32_t pos; if (success) * success = 0; // value in the original set if (valenc <= intrev32ifbe (is-> encoding) & intsetSearch (is, value, & pos )) {uint32_t len = intrev32ifbe (is-> length);/* We know we can delete */if (success) * success = 1; /* Overwrite value with tail and update length * // If the pos is not the end of is, delete the integer directly by overwriting memmove memory. // if it is the end, directly resize Delete if (pos <(len-1) intsetMoveTail (is, pos + 1, pos); is = intsetResize (is, len-1 ); // reduce the space is-> length = intrev32ifbe (len-1);} return is ;}
Flowchart of adding intset Elements
Summary
Intset is used to store multiple integer values in an ordered and non-repeated manner. Based on the value of an element, intset automatically selects the Length Integer type to save the element;
When adding a new element, You need to determine whether the encoding type of the current intset can save the new element. If not, you need to upgrade the intset, the elements in the upgraded intset will increase the number of bytes it occupies, but the value does not change;
Intset only supports upgrading and does not support downgrading. Therefore, memory is wasted;
The elements in intset are ordered, so the time complexity of semi-query is O (logN ).
Finally, I would like to thank Huang jianhong (huangz1990) for its Redis design and implementation and other comments on the Redis2.6 source code for my help in studying the Redis2.8 source code.