Redis Design and Implementation [Part 1] data structure and Object-C source code reading (1)

Last Update:2018-05-26 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Simple Dynamic string SDS keywords: Space pre-allocation, inert space release, binary Security C string is not easy to change, so Redis uses the C string in some places that do not need to modify the string value, as a string literal (Stringliteral), such as printing logs: redisLog (REDIS_WARING, "Redisisnowreadyto

I. Simple Dynamic string SDS keywords: Space pre-allocation, inert space release, binary Security C string is not easy to change, so Redis uses the C string in some places that do not need to modify the string value, string literal, such as printing logs: redisLog (REDIS_WARING, "Redis is now ready

1. Simple Dynamic string SDS

Keywords: Space pre-allocation, inert space release, binary Security

The C String is not easy to change, so Redis uses the C String in some places that do not need to be modified, as the String literal (String literal), such as printing the log:
RedisLog (REDIS_WARING, "Redis is now ready to exit, bye ...");

In Redis databases, key-value pairs containing strings are implemented by SDS at the underlying layer.

SDS is also used as a buffer: AOF buffer in the AOF module, and the input buffer in the client status, which is also implemented by SDS.

Source code

The structure of SDS is defined in sds. h:

/** Save the structure of the string object */struct sdshdr {// The length of occupied space in the buf int len; // The length of available space in the buf, that is, no space int free is used; // The data space char buf [];};

The complexity of getting an SDS length is O (1). the SDS length is automatically set and updated by the sds api during execution, and no manual length modification is required when SDS is used.

Space allocation

The space allocation policy of SDS is: when the sds api needs to modify the SDS, the API first checks whether the space of the SDS meets the requirements required for the modification. If not, the API automatically expands the space of SDS to the size required for modification, and then performs the actual modification operation to eliminate the possibility of buffer overflow.

Through space not used, SDS implements two optimization policies: Space pre-allocation and inert space release:

Space pre-allocation

Space pre-allocation is used to reduce the number of memory allocations required for consecutive string growth operations.
Using this pre-allocation policy, SDS reduces the number of times of memory reallocation required for consecutive string growth from a certain N times to a maximum of N times.
The number of unused space allocated is determined by the following formula:

1. if the SDS length (I .e. the len attribute value) is smaller than 1 MB after the SDS is modified, the unused space of the same size as the len attribute will be allocated, that is, the len property value is the same as the free property value. 2. if the SDS length is greater than or equal to 1 MB after modification, 1 MB of unused space will be allocated.

Inert space release

Memory reallocation for optimizing the SDS string shortening operation: When the sds api needs to shorten the strings saved by the SDS, the program does not immediately use memory redistribution to recycle the shortened bytes. Instead, it uses the free attribute to record the number of these bytes and waits for future use.

SDS APIs are binary-safe. All SDS APIs process the data stored in the buf array in binary mode, the program does not impose any restrictions, filters, or assumptions on the data, such as what the data is written and what the data is read.

Redis uses the buf array of SDS to store binary data instead of characters.

SDS is compatible with some C string functions.

Ii. Linked List

Keywords: Polymorphism

When a list key contains a large number of elements, or all the elements in the list are long strings, Redis uses the linked list as the underlying implementation of the list key.

The underlying implementation of the integers list key is a linked list. each node in the linked list stores an integer.

In addition to the linked list, the chain list is also used for publishing and subscription, slow query, and monitor functions. The Redis server itself also uses the linked list to save the status information of multiple clients, and use the linked list to build the client output buffer ).

Source code

The linked list structure is defined in adlist. h:

/*-Double-ended linked list node */typedef struct listNode {// pre-node struct listNode * prev; // post-node struct listNode * next; // value of the node void * value ;} listNode;/** double-ended linked list iterator */typedef struct listIter {// currently iterated node listNode * next; // iteration direction int direction;} listIter; /*-double-ended linked list structure */typedef struct list {// listNode * head of the header node; // listNode * tail of the End Node of the table; // node value replication function void * (* dup) (void * ptr); // node value release function void (* free) (void * ptr ); // node Value Comparison function int (* match) (void * ptr, void * key); // number of nodes contained in the linked list unsigned long len;} list;

The list structure provides the head, tail, and len of the table length counter for the linked list. The dup, free, and match members are the type-specific functions required to implement the multi-state linked list:

The dup function is used to copy the value saved by the linked list node.
The free function is used to release the value saved by the linked list node;
The match function is used to compare whether the saved value of the linked list node is equal to that of another input value.

The implementation features of the Redis linked list are as follows:

Double-ended, no-ring, table head pointer and table tail pointer, counter with linked list length, Polymorphism

3. Dictionary

Keywords: polymorphism, progressive rehash, murmurhash2

Redis databases use dictionaries as the underlying implementation. addition, deletion, modification, and query of databases are also built on dictionary operations.

The dictionary is also one of the underlying implementations of the hash key. When a hash key contains many key-value pairs, or the elements in the key-value pairs are long strings, redis uses the dictionary as the underlying implementation of the hash key.

The Redis dictionary uses a hash table as the underlying implementation. A hash table can have multiple hash table nodes, and each hash table node stores a key-value pair in the dictionary.

Source code

The hash table used by the dictionary is defined in dict. h:

/** Hash table * Each dictionary uses two hash tables to implement progressive rehash. */Typedef struct dictht {// hash table array. Each element in the array is a pointer dictEntry ** table pointing to the dictEntry structure. // The hash table size is unsigned long size; // hash table size mask, used to calculate the index value // always equal to size-1 unsigned long sizemask; // number of existing nodes in the hash table unsigned long used;} dictht;

The table attribute is an array. Each element in the array is a pointer to the dictEntry structure. Each dictEntry structure stores a key-value pair.
The size attribute records the size of the hash table, that is, the size of the table array.
The used attribute records the existing nodes in the hash table (number of key-value pairs)
The sizemask attribute and the hash value determine which index A key should be placed on the table array.

/** Hash table node */typedef struct dictEntry {// key void * key; // value union {void * val; uint64_t u64; int64_t s64;} v; // point to the next hash table node to form the chain table struct dictEntry * next;} dictEntry;

The key property stores the key in the key-value pair.
The v attribute saves the value in the key-value pair. The value in the key-value pair can be a pointer, A uint64_t integer, or an int64_t integer.
The next attribute is a pointer to another hash table node and uses the link address method to solve the key conflict problem.

/** Dictionary */typedef struct dict {// type-specific function dictType * type; // Private Data void * privdata; // hash table dictht ht [2]; // rehash index // when rehash is not in progress, the value is-1 int rehashidx; /* rehashing not in progress if rehashidx =-1 * // The number of currently running security iterators int iterators;/* number of iterators currently running */} dict;

The type and privdata attributes are for different types of key-value pairs and are set to create a multi-state dictionary:

The type attribute is a pointer to the dictType structure. Each dictType structure stores a cluster of functions used to operate key-value pairs of specific types, redis sets different types of specific functions for different dictionaries.
The privdata property saves the optional parameters that need to be passed to specific functions of the type.

/** Dictionary type specific function */typedef struct dictType {// The unsigned int (* hashFunction) (const void * key) function for calculating the hash value ); // function void * (* keyDup) (void * privdata, const void * key) of the replication key; // function void * (* valDup) (void * privdata, const void * obj); // int (* keyCompare) (void * privdata, const void * key1, const void * key2) of the comparison key function ); // destroy the key function void (* keyDestructor) (void * privdata, void * key); // destroy the value of the function void (* valDestructor) (void * privdata, void * obj);} dictType;

The ht attribute is an array containing two items. Each item in the array is a dictht hash table. Generally, the dictionary only uses the ht [0] hash table, the ht [1] hash table is only used for rehash of the ht [0] hash table.
The rehashidx attribute records the current rehash progress. If rehash is not currently in progress, its value is-1.

/** Dictionary iterator *-if the safe attribute value is 1, during the iteration, the program can still execute dictAdd, dictFind, and other functions, modify the dictionary. *-If safe is not set to 1, the program will only call dictNext to iterate the dictionary without modifying the dictionary. */Typedef struct dictIterator {// iterated dictionary dict * d; // table: Number of the hash table being iterated. The value can be 0 or 1. // Index: The index position of the hash table currently pointed to by the iterator. // Safe: identifies whether the iterator is safe: int table, index, safe; // entry: pointer to the node currently iterated to // nextEntry: the next node of the current iteration node // The node to which the entry points may be modified when the security iterator is running, // Therefore, An Extra pointer is required to save the location of the next node. // This prevents pointer loss dictEntry * entry, * nextEntry; long fingerprint; /* unsafe iterator fingerprint for misuse detection */} dictIterator;

Hash

The method for calculating the hash value and index value in Redis is as follows:

// Use the hash function set in the dictionary to calculate the hash value of the key hash = dict-> type-> hashFunction (key); // use the sizemask attribute and hash value of the hash table, calculate the index value // according to different situations, ht [x] can be ht [0] Or ht [1] index = hash & dict-> ht [x]. sizemask;

/* ------------------------- hash functions ------------------------------ *//* Thomas Wang's 32 bit Mix Function */unsigned int dictIntHashFunction(unsigned int key){    key += ~(key << 15);    key ^=  (key >> 10);    key +=  (key << 3);    key ^=  (key >> 6);    key += ~(key << 11);    key ^=  (key >> 16);    return key;}/* Identity hash function for integer keys */unsigned int dictIdentityHashFunction(unsigned int key){    return key;}static uint32_t dict_hash_function_seed = 5381;void dictSetHashFunctionSeed(uint32_t seed) {    dict_hash_function_seed = seed;}uint32_t dictGetHashFunctionSeed(void) {    return dict_hash_function_seed;}/* MurmurHash2, by Austin Appleby * Note - This code makes a few assumptions about how your machine behaves - * 1. We can read a 4-byte value from any address without crashing * 2. sizeof(int) == 4 * * And it has a few limitations - * * 1. It will not work incrementally. * 2. It will not produce the same results on little-endian and big-endian *    machines. */unsigned int dictGenHashFunction(const void *key, int len) {    /* 'm' and 'r' are mixing constants generated offline.     They're not really 'magic', they just happen to work well.  */    uint32_t seed = dict_hash_function_seed;    const uint32_t m = 0x5bd1e995;    const int r = 24;    /* Initialize the hash to a 'random' value */    uint32_t h = seed ^ len;    /* Mix 4 bytes at a time into the hash */    const unsigned char *data = (const unsigned char *)key;    while(len >= 4) {        uint32_t k = *(uint32_t*)data;        k *= m;        k ^= k >> r;        k *= m;        h *= m;        h ^= k;        data += 4;        len -= 4;    }    /* Handle the last few bytes of the input array  */    switch(len) {    case 3: h ^= data[2] << 16;    case 2: h ^= data[1] << 8;    case 1: h ^= data[0]; h *= m;    };    /* Do a few final mixes of the hash to ensure the last few     * bytes are well-incorporated. */    h ^= h >> 13;    h *= m;    h ^= h >> 15;    return (unsigned int)h;}/* And a case insensitive hash function (based on djb hash) */unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len) {    unsigned int hash = (unsigned int)dict_hash_function_seed;    while (len--)        hash = ((hash << 5) + hash) + (tolower(*buf++)); /* hash * 33 + c */    return hash;}

When the dictionary is used as the underlying implementation of the database or the underlying implementation of the hash key, Redis usesMurmurHash2Hash Value of the algorithm computing key:

The advantage of this algorithm is that even if the Input key is regular, the algorithm can still provide a good random distribution, and the algorithm computing speed is also very fast.

To keep the load factor of the hash table within a reasonable range, when the number of key-value pairs stored in the hash table is too large or too small, the program needs to expand or contract the size of the hash table.

Load Factor Calculation Formula for the hash table: load_factor = ht [0]. used/ht [0]. size

Rehash

To expand or contract a hash table, you can perform the rehash (re-hash) operation. The rehash procedure of Redis to execute the dictionary hash table is as follows:

Allocate space for the dictionary ht [1] hash table. The size of the hash table depends on the operation to be executed, and the number of key-value pairs currently contained in ht [0] (that is, ht [0]. used attribute value)
1. If an extended operation is executed, the size of ht [1] is the first 2 ^ n (the Npower of 2) equal to or greater than ht [0]. used * 2 );
2. If a contraction operation is performed, the size of ht [1] is the first 2 ^ n equal to or greater than ht [0]. used.
Rehash all key-value pairs stored in ht [0] to ht [1]. rehash indicates re-calculation of the key hash value and index value, then, place the key-value pair to the specified position of the ht [1] hash table.
When all key-value pairs in ht [0] are migrated to ht [1] (ht [0] becomes an empty table), release ht [0], set ht [1] to ht [0], and create a blank hash table in ht [1] to prepare for the next rehash.

When any of the following conditions is met, the program automatically begins to expand the hash table:

The server is currently not executing the BGSAVE or BGREWRITEAOF command, and the load factor of the hash table is greater than or equal to 1.
The server is currently executing the BGSAVE or BGREWRITEAOF command, and the load factor of the hash table is greater than or equal to 5

During the execution of the BGSAVE or BGREWRITEAOF command, Redis needs to create sub-processes of the current server process, and most operating systems use copy-on-write (copy-on-write) technology to optimize the efficiency of sub-processes. Therefore, when a sub-process exists, the server will increase the load factor required to perform the expansion operation, so as to avoid hash table expansion during the existence of sub-processes as much as possible, which avoids unnecessary memory write operations and maximizes memory savings.

When the load factor of the hash table is less than 0.1, the program automatically starts to contract the hash table.

Progressive rehash

To avoid the impact of rehash on server performance, the server does not rehash all key-value pairs in ht [0] to ht [1] at a time. instead, the key-value pairs in ht [0] are gradually rehash to ht [1] multiple times.

The detailed steps for progressive rehash of a hash table are as follows:

Allocate space for ht [1] so that the dictionary can hold both the ht [0] and ht [1] hash tables.
Maintain an index counter variable rehashidx In the dictionary. The value is set to 0, indicating that rehash is officially started.
During rehash, each time you add, delete, search, or update a dictionary, all key-value pairs of the ht [0] hash table on the rehashidx index are rehash to ht [1]. After rehash is completed, the program adds the value of the rehashidx attribute.
As dictionary operations continue, all key-value pairs of ht [0] will be rehash to ht [1] at a certain time point, this is when the program sets the value of the rehashidx attribute to-1, indicating that the rehash operation has been completed.

Progressive rehash adopts a divide-and-conquer approach to evenly distribute the calculation work required by the rehash key-value pair to each addition, deletion, search, and update operation on the dictionary, this avoids the huge amount of computing caused by centralized rehash.

During incremental rehash, the dictionary uses both ht [0] and ht [1] hash tables. Therefore, during incremental rehash, dictionary deletion, search, and update will be performed on two hash tables. For example, if the dictionary is found in ht [0] But not found, the dictionary will be searched in ht [1 ].

During incremental rehash execution, all key-value pairs added to the dictionary will be saved in ht [1], while ht [0] will not add any more, in this way, the number of key-value pairs contained in ht [0] is reduced without increasing. As the rehash operation is executed, it becomes an empty table.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More